We're very sorry for an accident happened during (about) 2015-10-08 16:00 UTC to 2015-10-09 21:00 UTC.
During this period, our main domain easycron.com got a "Registrar-Hold" (ClientHold) status due to inaccurate whois data. This hold was set by the domain registrar enom (enom.com).
A "Registrar-Hold" (ClientHold) status means that all the resolution of the domain will be (and have to be) disabled by the registrar. Our direct domain registrar is namecheap.com, and its upstream provider is enom.com. The status change was done by enom.com, and namecheap was asked to stop the resolution of our domain.
We started to observe the abnormality at 2015-10-08 17:00 UTC. From that time, our website and API failed to respond, and our monitoring system sent out alert, though our cron job executing engine is independent to the DNS and still working fine.
One of our quick suspicions is, our DNS got a problem. But after a detailed check in it, we found nothing. So we fired several requests to our hosting (OVH) and registrar (namecheap) to consult what could be the problem.
We also noticed that easycron's whois data on namecheap is "ok" (actually we were badly misdirected by this info, the status on namecheap's whois query page is a "cache", not info in real time). So we didn't think about it's a domain issue.
After several hours of investigation, checking, testing, and chatting with namecheap, one of its colleagues said that he noticed that our domain was in "clientHold" status. I then check again with http://www.verisign.com/en_US/whois/index.xhtml and http://www.enom.com/whois/default.aspx and found that our domain is really under "Registrar-Hold" (ClientHold) status.
After knowing the root of the problem, we submitted a ticket to namecheap, and they said they have no control to the domain status and need to request enom to update that. After several back and forth, we still could not get the domain reactivated.
Seeing the inefficiency of waiting namecheap acting as intermediate between us and enom, we contacted email@example.com directly via email. It's 2015-10-09 17:00 UTC.
firstname.lastname@example.org responded with more useful progress than namecheap's ticket system. They (email@example.com) replied our emails in 2 hours, 1 hour. And at last, after we provided all needed statement/proof (electricity bill) of our data's accuracy/validity in our last email to them, it took another 3 hours to approve our new whois data and remove "Registrar-Hold" status. That means, if we provide all needed documents in the first email, we could get our domain back to live in 3 hrs. More optimistically, if we send an email with the proof to firstname.lastname@example.org directly right after we observed the problem, our domain may be back in 3 hrs, not 30 hrs.
Lessens learned (hope that they could help people to get their domains out of "Registar-Hold or ClientHold" quickly):
- 1) check whois data with the tool on the website of the root registrar (in our case, enom) of your domain, not with the tool caching whois data result (in our case, namecheap).
- 2) contact the root registrar directly (with all possible methods).
- 3) in the first email, prepare all things, like updating detailed whois data, attaching electricity bill, etc.. Provide all information you think that is useful for approving your whois data. It will reduce the email rounds between you and the registrar, which means saving several hrs of time.
- 4) in daily routines, check every email from whois email address immediately, respond and treat them seriously.
In our case, we received whois update request from enom about 15 days ago, and we updated the data. But we didn't get any email (denial or approval) since that. Until 2015-10-07 20:00, they sent the denial email. This email is with "ticket" in its title. We don't have ticket system in our service, so we put the ticket emails (most of ticket emails reach our email box are generated by users' ticket systems responding to our email alerts) to secondary priority and handle them only at night. Without starting to handle these emails, the domain issue exploded! After 10 hrs of its last email alert, enom held our domain.
- 5) publish the current status or issue fixing progress on twitter/facebook, etc.. This will help people understand what's going on, what to expect. And, you'll get supports from your users. They're more patient than you think and will make allowance for the situation. After experiencing these, you'll be thankful and keen to provide best products and services to all of your users.
In the end of the post, we have to say sorry again to all users of EasyCron. We'll try our best to prevent this kind of problem happen again.