EasyCron Official Blog: 2019

Aug 12, 2019

Slack Notifications for Cron Jobs

Considering many developers and cron job users use Slack heavily as communication tool in their daily work, we added Slack notification to our service.

Setting up Slack notifications for your cron jobs is easy. Here are the steps it will take:

In the cron job creating/editing form -> tab "Notifications" -> "Slack", choose the notification timings and the notifying sensitivity,
following the guide at Slack to get a URL (with a format of https://hooks.slack.com/services/xxx/yyy/zzz), and paste this URL into the "Slack URL" field. The Slack notification part is done.

Below is a screenshot for Slack notification interface:

Slack Notifications for Cron Job

Feb 3, 2019

Service malfunctioning during 2019-02-01 23:05 UTC to 2019-02-02 07:24 UTC

During 2019-02-01 23:05 UTC to 2019-02-02 07:24 UTC, we had an error with one of our core servers, which had caused our executor servers failing to execute cron jobs. The problem has been solved at 2019-02-02 07:25 UTC, and the system started working again since that.

We're investigating the root cause of the problem, and will enhance the whole system from the bottom up once the thorough investigation is done.

In the preliminary inspection, we found that the failure is related to partition space shortage caused by irrational disk partitioning of a pretty old CentOS. While Redis doing BIGSAVE to the partition, there was no enough space in the partition, so Redis kept doing BIGSAVE (as it's triggered by AOF file size). Finally both partition space and RAM were exhausted, and the server could only partly function during the failure time.

As a quick repair, we moved one of our Redis log servers to a new dedicated system with 4 times of RAM and 10 times of disk space.

Any missed cron jobs that should be run during the failure time have been executed (for one time) when the system was back to working again.

We're really sorry for the malfunctioning of the service. We will further investigate the whole failure and publish more information if necessary.