Web Application Outage and Email Processing Delays
Incident Report for Vero
Postmortem

Between 18:30 and 19:20 UTC today, Vero's web application was unavailable.

Whilst our API remained fully operational during this period, we also experienced delays processing Newsletters and all automated campaigns.

One of our service providers was conducting known, scheduled maintenance to one of our production data stores, responsible for managing company authentication, amongst other things. This maintenance followed a pattern of maintenance that has occurred before and we anticipated no key risks with the maintenance.

After the maintenance was complete, our application could not get a connection to our primary, or replica, data stores hosted at this provider. Our Operations Team intervened immediately but was not able to find a stable solution until 19:20.

At this time, Vero's web application began operating as normal and we processed the backlog of campaigns that had built up. Full realtime processing was restored by 19:40 UTC.

As a result of this issue, we will be conducting a review of our alerting and rollover procedures, and consider ways in which we might leverage alternative service providers to increase redundancy for this particular infrastructure component.

We want to confirm that no data was lost and, ultimately, all emails were processed.

If you have any questions, please let us know via email at support@getvero.com. Many thanks.

Posted Sep 18, 2018 - 15:28 AEST

Resolved
We were able to work through the backlog very quickly and can now confirm that both API and email processing is real time. We want to clarify that no API calls were lost at all during this outage but processing was delayed.

As mentioned we'll be following up with a more detailed post-mortem once we are able to gather more information from our service provider. Please don't hesitate to email support@getvero.com or message via our #verocommunity Slack channel if you have any questions at all about your account.
Posted Sep 18, 2018 - 05:42 AEST
Monitoring
We can confirm that Vero is now operational again.

We are currently working through the API and email processing backlogs. These should be back to real time in around 20 minutes. We can also confirm that no API calls were lost during this time.

This failure was a result of an error in response to routine maintenance on the hardware behind one of our primary data stores by a service provider.

We are confident there will be no lingering issues and we will provide a post mortem once we have collected some further information.

Sorry for the inconvenience caused. If you have any questions at all, please email us via support@getvero.com.
Posted Sep 18, 2018 - 05:23 AEST
Update
We are continuing to investigate this issue.
Posted Sep 18, 2018 - 04:50 AEST
Investigating
We are currently experiencing a major outage affecting all available services and are investigating the root cause with urgency. We will be posting further updates as soon as we have more information. Please email support@getvero.com if you have any questions regarding your account and apologies for the inconvenience.
Posted Sep 18, 2018 - 04:47 AEST
This incident affected: Vero Cloud: Ingestion API, Vero Cloud: Newsletter processing, Vero Cloud: Segment calculation, Vero Cloud: Reports data availability (Reports data availability (Vero default and Mailgun integrations)), Vero Cloud: Automated email processing (Transactional emails, Behavioral emails), and Vero Cloud: UI (Logs page activity, CSV Imports and Exports).