|
Root Cause Analysis (RCA) Date of Report: June 19, 2025 SummaryOn June 16, 2025, Riva Cloud experienced elevated Redis/Valkey errors in the US-EAST-1 (USE1) region. Customers in that region may have observed delays or degraded Insight performance. The issue began at approximately 08:00 EST and was mitigated over the course of the day through internal failover mechanisms and close coordination with our infrastructure partner, AWS. AWS later confirmed a broader service issue within their ElastiCache Serverless platform that was the root cause of the incident. Incident Timeline
Root CausesThe root cause was a software update deployed by AWS to their ElastiCache Serverless platform in the US-EAST-1 region. The update introduced latency and connection timeout issues that affected both Redis and Valkey services. These issues led to intermittent degradation for Riva tenants using Valkey. AWS mitigated the issue failing over to Redis. Actions Taken
ConclusionRiva Cloud's sync platform was impacted by an external infrastructure issue originating from a software change by AWS. Our response actions, combined with AWS’s actions, resulted in full service restoration. While the issue was not caused by Riva infrastructure, we take customer impact seriously and are implementing improvements to reduce the time to detection and mitigation for future incidents. We appreciate your patience and continued trust in Riva. |
2025-06-16 5:00 PM EST
Status: Resolved
Impact: Update: Insight Application Availability – US Data Center
Update: Insight Application Availability – US Data Center
Thank you for your continued patience.
The AWS team identified and resolved the root cause of the caching issues impacting Riva Insight. The issue stemmed from recent changes in serverless logic.
In collaboration with AWS, we’ve transitioned to an alternative caching service and updated the affected US data center. Since this update, no additional login errors or application issues have been reported.
We’ll continue to monitor the situation alongside our AWS partners, but the Insight application is now fully available in the US data center.
If you’re still encountering login issues or errors, please reach out to the Support team.
025-06-16 4:30 PM EST
Status: Ongoing
Impact: Our team is actively investigating the cause of timeouts affecting users in the US data center.
In collaboration with our AWS partners, we have transitioned to an alternative caching service and updated the affected US data center accordingly. Since implementing this change, no further login issues or errors have been reported.
We will continue to closely monitor the situation with AWS, but at this time, the Insight application is fully available in the US data center.
If you're still experiencing errors or problems logging in, please contact us our support team.
2025-06-16 3:10 PM EST
Status: Ongoing
Impact: Our team is actively investigating the cause of timeouts affecting users in the US data center.
We believe we have isolated the issue, which appears to be related to caching services provided by AWS. We are currently working closely with our partners at AWS towards a resolution.
Please note: Riva Sync services are unaffected and continue to operate normally.
Another update will be posted no later than 5:00 PM EST, thank you for your continued patience.
2025-06-16 1:47 PM EST
Status: Ongoing
Impact: Our team continues to investigate the cause of the the timeouts for users on the US data center.
Users on the US data center are still being impacted by timeouts. This is causing problems with the load balancer, which is generating the errors users are seeing with Riva Insight.
The team is still investigating the root cause and remediation steps.
Please note: Riva Sync services are unaffected and continue to operate normally.
Another update will be posted no later than 4:00 PM EST, thank you for your continued patience.
2025-06-16 12:44 PM EST
Status: Ongoing
Impact: Users on the US data center are still being impacted by timeouts. This is causing problems with the load balancer, which is generating the errors users are seeing with Riva Insight. The team is still investigating the root cause and remediation steps.
Another update will be posted at 2:00 PM EST, thank you for your continued patience.
Status: Ongoing
Start Time: June 16, 2025 - 08:00 EST
The issue impacting Riva Insight access for users in the US data center remains unresolved. Our team is actively investigating and treating this as a critical priority.
Please note: Riva Sync services are unaffected and continue to operate normally.
We will share further updates as soon as more information becomes available.
Thank you for your continued patience.
2025-06-16 0937 EST
Start Time: June 16, 2025 - 08:00 EST
Status: Investigating
Impact: Users hosted in the US data center are currently unable to access or use Riva Insight.
We are actively investigating the issue and will provide an update as soon as more information is available.