Slowdowns across newsletter sends
Incident Report for Vero
Postmortem

From 15:00 UTC 24 March we observed that our send rate across all customers newsletters was approximately half of what it usually is.

Our team responded to this issue immediately by increasing the resources available to our newsletter queues as a temporary solution whilst we investigated the root cause. Unfortunately this did not lead to a full recover in speed, though did mean that newsletters continued to send at a reasonable rate.

Over the following five to six hours we ran a number of tests and discovered that the root cause stemmed from two things. The first was increased latency in requests made to one of our external data providers, the second was a series of low-level database locks that were related to ensuring users are sent email in the correct order and with the correct variation – these were operating in a non-performant manner.

From 15:00 UTC and 20:30 UTC in particular you would have experienced these scenarios:

  1. A delay in newsletter speeds, particularly for sends with complex templates or external data requirements.
  2. Delays in webhooks, as we throttled webhook performance to more fairly allocate resources.

By 21:00 UTC all email delivery had returned to normal. We have monitored this for a further ten hours to ensure things are operating completely normal before closing this issue.

Email delivery and email speed are the two key technical challenges related to getting your emails securely and reliably to your customers that matter at Vero. Necessarily, we have taken this very seriously. We appreciate your patience.


I'd like to finish with a little more context on this issue. Since November 2015 we have seen extremely consistent send speeds. We made a big upgrade at that time, and it's made Vero more usable for all our customers.

If there is any irony in last night's situation it is that this weekend was slated for the release of our next core upgrade to email delivery which should see send speeds increase by up to four times. Again, a huge win for everyone on the Vero platforms – particularly in light of the power of our external data and personalization tools.

We are proceeding today preparing for that launch and, all tests going well, will introduce it as planned over the next 24 to 48 hours. There should be no noticable difference other than your newsletters sending faster than ever before! We look forawrd to bringing this to you.

Have a great weekend and a great Easter.

James

--
CTO, Vero
jamesl@getvero.com

Posted Mar 25, 2016 - 18:44 AEDT

Resolved
This has been stable for the last ten hours and we're now going to close this issue.

Postmortem to follow.
Posted Mar 25, 2016 - 18:39 AEDT
Investigating
We have made some ground in resolving this issue entirely but are still seeing far slower than normal sends for most customers across our network. To clarify: emails are being sent and, for many customers, the impact is minimal.

Some earlier changes have not improved the situation as dramatically as we had anticipated and we are currently working through a number of other scenarios as quickly as possible.

This is of course our highest priority and our team has been investigating through the last several hours. We will provide a full post mortem later today once we are comfortable things are operating as usual.
Posted Mar 25, 2016 - 06:30 AEDT
Monitoring
We're continuing to see slower than usual send rates. The team has increased the number of servers to work through the backlog of emails, and will provide updates as the situation progresses.
Posted Mar 25, 2016 - 04:13 AEDT
Identified
We are seeing some of our email queues perform slower than usual this morning (US).

We're currently investigating the cause and hope to have these back to full speed shortly 🚄.. We wanted to keep you in the loop and will post any further updates here.
Posted Mar 25, 2016 - 02:09 AEDT