Delays in API processing
Incident Report for Vero

At 01:17 UTC we observed delays across all of our key queues.

We began investigating this and ultimately uncovered that one of our data caches was not responding as normal: throughput was significantly down on normal processing.

We were able to identify the root cause in approximately 30 minutes, and deemed that we had to roll over to a recovery server. This process took around 10 minutes.

Due to heavy load this morning, the backlog of jobs that accumulated grew quickly. It took around an hour to process the backlog and get fully up to speed.

At this time all processing has returned to normal. We are currently reviewing our diagnostic documentation to determine if we could have identified the core issues faster. Whilst recovery was swift, improving this would lead to a faster recovery.

We apologise for the inconvenience. As ever we are working hard to improve Vero's resilience. If you have any questions or would like to discuss this issue, please email us at – we are here for your feedback.

Thanks for working with us 🙇.

Posted 7 months ago. Sep 01, 2017 - 13:59 AEST

Just confirming that the backlog was cleared and all processing has been operating at normal capacity for more than 90 minutes now
Posted 7 months ago. Sep 01, 2017 - 13:58 AEST
Our API processing is nearly realtime again – we are finalising the processing of the backlog that accumulated earlier.

We will provide details of the root cause shortly. Our priority is processing this backlog.

Posted 7 months ago. Sep 01, 2017 - 11:44 AEST
We have identified the cause and have made changes in response to this.

Unfortunately we are not yet confident that the system is operating as normal – we are actively working on this issue.
Posted 7 months ago. Sep 01, 2017 - 11:16 AEST
We are currently seeing delays in API processing. We are investigating the situation and will provide updates.
Posted 7 months ago. Sep 01, 2017 - 10:17 AEST