Note: we are trying a new format for our post-mortems.
*What was the impact* API processing, workflows (including transactional messages and workflows) and newsletters were intermittently offline for a brief period (15:00-15:30) and then delayed for several hours between 15:00 and 23:30 UTC as we caught up processing.
*What caused the impact* A datastore used by our queueing system ran out of memory.
*Why'd it happen* After QA testing, we deployed a configuration change to our queueing system yesterday to improve the way processing API and workflow jobs are balanced between customers.
This change led to unexpected growth in memory in our queueing system, causing it to fail. Whilst we were alerted the memory growth rate was unprecedently large.
*What changes have we made* - Investigated the cause of the memory growth and patched it. - Adjusted our alerting on memory growth so we are alerted earlier to give us more time to fix this issue, in the unlikely event it occurs again.
*Any other information* This particular part of our infrastructure has been frustratingly brittle over the last year or so, due to inefficient data storage. As a result we elected to migrate to a data store better-suited for these workloads (DynamoDB) at the start of 2025. That work is nearing completion. It will improve queue processing by 10x at least, giving us much-needed headroom.
The team is continuing to monitor system stability and performance.
Posted Jul 30, 2025 - 18:25 UTC
Monitoring
The team is continuing to monitor system stability and performance.
Posted Jul 30, 2025 - 15:32 UTC
Identified
We are currently experiencing delays with our API and message processing systems.
Our team has implemented a temporary fix and performance is improving, but you may still encounter slower response times or failed requests to Vero's Track API. Message delivery across all channels—including workflows and newsletters—is currently delayed.
We're working to fully resolve these issues and will update you as soon as normal service is restored.
Posted Jul 30, 2025 - 14:57 UTC
This incident affected: Vero 1.0: Ingestion API, Vero 2.0: Imports, Vero 2.0: Newsletter processing and Vero 1.0: Automated email processing (Transactional emails, Workflows, Behavioral emails).