At approximately 19:30 UTC on 17 March we began receiving alerts that some customers were experiencing issues accessing Vero's web application (app.getvero.com).
Initial investigations did not reveal an underlying cause and we continued to see intermittent failures until approximately 20:15 UTC. By this time, access to Vero's front end was failing for all customers.
By 20:45 UTC we had ascertained that one of our production databases supporting the front end was experiencing locking due to some erroneous queries being run against individual customer records. After identifying and releasing this lock, full access was returned.
We are in the process of determining what has caused these locks and will put a plan in place to ensure these locks cannot repeat.
During the outage, access to Vero's frontend was completely disabled. During this period, API and data collection was not affected. The sending of newsletters and triggered emails was slowed during this period as resources were temporarily redirected to ensure we could resolve this issue as quickly as possible.
The system is now back online and operating fully. Thank you for your patience. We will also be reviewing our alerting process. We take transparency of infrastructure challenges very seriously and, in our haste to resolve this issue, the on call engineering team acknowledges they did not post a status update quickly enough (taking 30 minutes to alert you).
Thank you for your patience and we look forward to continuing to build a better and more robust Vero. If you have any questions, please email us at suppot@getvero.com.
James
-- CTO, Vero jamesl@getvero.com