Yesterday evening, Vero experienced a major outage that prevented newsletters, transactional and behavioural emails from being sent on time for approximately two and a half (2.5) hours.
The affected period lasted from approximately 17:05 GMT 19 January and lasted until around 19:15 GMT. During this period:
Full speed was restored by 20:15 GMT. Our aim is to provide greater than 99.9% uptime on email deliverability, so we consider this a serious impact. Below, we've given an overview of the issue and laid out how we're going to prevent future of this nature.
At 17:05 GMT our engineering team were alerted to an issue with the platform after the error rates on behavioural emails had significantly increased. We did an initial investigation which revealed that behavioral campaigns were having trouble loading user properties from our databases for insertion into templates, preventing emails from being sent.
Over the last two weeks we have begun migrating our user properties to a new data infrastructure. This has so far been a seamless process and should not affect our users as we are operating the datastores in parrallel for testing. This upgrade will allow us to continue to handle rapid growth throughout 2016. An unexpected problem arose at scale yesterday evening that prevented the email workers from successfully finding their related user data and, ultimately, API workers from saving new user data.
In order to fix this issue promptly we made the decision to take our email and log workers offline for all campaign types, ultimately affecting newsletter, behavioral and transactional emails. This allowed us to deploy a fix that and the code and data structure that was causing the issue.
We have spent the last eight hours monitoring our changes and all has been operating smoothly and in real time since 20:15 GMT.
If you would like further details on emails delayed in your account or similar, please get in touch via support@getvero.com.
Thanks for working with us at Vero! Looking forward to a great 2016 here on the Engineering Team – we can't wait to deploy these changes live!