Intermittent Service Degradation Week of 2023/10/16 - 2023/10/20
We believe we have addressed the underlying configuration issues when led to last week's major service degradation.
Over the past few months we've been working on some important optimizations and security patches to several of our key data stores and infrastructure. These changes were meant to be rolling changes with no production impact and many of the changes have been.
Unfortunately, some of the changes we made led to unanticipated issues, causing degraded performance across the majority of Vero's services.
One of the database shards on which we store and read user properties has been falling out of sync with the primary data store. Unfortunately, the services that rely on this particular data source include those that personalize email content and evaluate automated campaigns and workflows. These services will wait for eventual consistency but have a low tolerance for delays in order to deliver the speed our customers expect. This has caused periodic, significant delays.
We've been able to uncover the source of these issues and have made configuration changes that have resolved the synchronization lag. We are monitoring to ensure there are no further issues.
Separately, we recently made some important optimizations to our event storage to improve performance. Despite significant planning, these changes led to unexpected degredation in some of the queries related to automated email delays.
Our platform team has been hard at work optimizing these queries and we've seen major improvements. Processing is back to BAU at this time. We are monitoring and are continuing to make further optimizations until we've seen the net performance improvements we're targeting.
It has taken us longer than we would like to roll forward and adjust the configuration to return performance to the needed levels. We have been monitoring and have seen all services operating as expected. We will continue monitoring until we are certain we have restored the systems to their full capacity.
We are continuing to monitor the situation and make adjustments to improve performance. We will share a post mortem when we are satisfied that the situation is resolved
We apologize for the service degradation experienced intermittently throughout this last week.
These have been caused by some large, necessary infrastructure improvements we are making to our system.
The primary changes have now been made and we expect to see improvements to any remaining performance issues over the next several hours.
We will continue to monitor the situation over the weekend, and provide a more detailed post-mortem afterwards.
Thanks for your patience. If you have any questions, please emails us via email@example.com.
This incident affected: Vero Cloud: Ingestion API, Vero Cloud: Newsletter processing, Vero Connect: Newsletter processing and Vero Cloud: Automated email processing (Transactional emails, Behavioral emails, Workflows).