Earlier today we experienced errors with our API processing.
We apologise for the inconvenience caused. We understand the critical importance of ensuring all API requests are processed without error. As a result, we wanted to share a more detailed post-mortem regarding this incident.
The following key points provide the timeline and details:
• From approximately 13:30 UTC 1 May we began to see irregular spikes in the number of errors being returned by our tracking API.
• By approximately 15:20 UTC, these errors had escalated to the point that our API was returning 502 and 504 errors 50% of the time.
• Error rates persisted at around 50% between 15:20 and 16:15 UTC.
• During this time, API requests that received an error response were not processed by Vero unless they were explicitly retried. Segment.com and a portion of our API libraries automatically handle retries.
• We have determined this issue was the result of a configuration issue, resulting in select servers in our API cluster running out of memory.
• We are in the process of fixing this regression and should have the required changes live within the next several days.
• In the interim we have updated our alerting procedures to ensure there are no further repeats of this issue.
If you have any questions, please email us at support@getvero.com. As always, thank you for your business.