At approximately 20:00 UTC 29 October 2017, there was an outage with one of our service providers in a redundant/backup data center. As a result, during the period ~20:00 UTC 29 October 2017 to ~17:00 UTC 30 October 2017, we experienced some service degradation whilst the (redundant) system recovered.
The impact of this incident could have been much more severe and, fortunately, our database cluster and systems reacted as they should have and automatically ensured that our processes functioned as normal. Despite this, we did witness several periods during which automated campaigns (behavioral and transactional emails) were delayed intermittently.
We have been able to confirm that the most severe of these periods was around 04:00 UTC on 30 October – during this period, emails were delayed for up to an hour (in the worst cases). Throughout the rest of the affected period, delays were typically smaller and occurred in short bursts.
We are still conducting an investigation to determine how we could improve the performance of our processing in scenarios where our cluster is running at reduced capacity. We have already deployed some changes in this direction, and these appear to have restored stability even whilst we wait for the full cluster to normalise again.
Whilst we reported delays in processing via the Components section of this page, we have had some feedback that, in the future, raising an Incident would be a better approach. We will adjust our procedures in the future.
Since 17:00 UTC 30 October it appears that things have been running smoothly, in line with changes we made to improve performance.
If you have any questions, please let us know via firstname.lastname@example.org
Thanks and have a great day.