Vero frontend down
Incident Report for Vero
Postmortem

At approximately 19:30 UTC on 17 March we began receiving alerts that some customers were experiencing issues accessing Vero's web application (app.getvero.com).

Initial investigations did not reveal an underlying cause and we continued to see intermittent failures until approximately 20:15 UTC. By this time, access to Vero's front end was failing for all customers.

By 20:45 UTC we had ascertained that one of our production databases supporting the front end was experiencing locking due to some erroneous queries being run against individual customer records. After identifying and releasing this lock, full access was returned.

We are in the process of determining what has caused these locks and will put a plan in place to ensure these locks cannot repeat.

During the outage, access to Vero's frontend was completely disabled. During this period, API and data collection was not affected. The sending of newsletters and triggered emails was slowed during this period as resources were temporarily redirected to ensure we could resolve this issue as quickly as possible.

The system is now back online and operating fully. Thank you for your patience. We will also be reviewing our alerting process. We take transparency of infrastructure challenges very seriously and, in our haste to resolve this issue, the on call engineering team acknowledges they did not post a status update quickly enough (taking 30 minutes to alert you).

Thank you for your patience and we look forward to continuing to build a better and more robust Vero. If you have any questions, please email us at suppot@getvero.com.

James

-- CTO, Vero jamesl@getvero.com

Posted Mar 18, 2016 - 11:04 AEDT

Resolved
We're now marking this resolved as we have not seen any issues for the last 90 minutes.

We're writing up a post-mortem as well. Thanks all.
Posted Mar 18, 2016 - 11:00 AEDT
Identified
There are currently issues accessing Vero's application front end.

We can confirm that this is affecting most customers. We've been investigating for the past 15 to 20 minutes and should have things back online shortly.

Thanks for your patience. We will provide more updates as we go along and once we are back to normal.
Posted Mar 18, 2016 - 07:25 AEDT