Increased data latencies across many networks

Incident Report for Synoptic

Postmortem

On May 9–10, Synoptic experienced elevated realtime data latency caused by an unexpected interaction between database maintenance activities and production data processing systems. During this period, some (not all) realtime observations were delayed by up to approximately 20 minutes.

The issue originated during planned database optimization and migration work that produced higher-than-expected load on portions of the production environment, impacting write performance and downstream data processing workflows. While automated monitoring alarms were triggered, response was not prioritized for the following combination of reasons:

  • Ongoing maintenance work was expected to trigger certain alarms
  • Alarm thresholds are set to just above normal operating levels, meaning triggering points are not generally impactful to the system
  • Alarms were not configured to escalate as alarmed conditions worsened

Once the issue was identified, Synoptic engineering teams immediately implemented mitigation measures, including halting the affected replication processes, restarting impacted services, and scaling database resources. Realtime processing performance returned to normal following remediation activities.

We are conducting a full post-incident review and are already implementing several improvements, including:

  • Enhanced monitoring and alert escalation procedures
  • Improved visibility into realtime system latency and customer-impacting conditions
  • Stronger operational review processes for infrastructure changes with potential production impact
  • Additional safeguards and monitoring around database replication and migration activities

We apologize for the disruption and appreciate our customers’ patience while the issue was resolved.

Posted May 15, 2026 - 14:11 UTC

Resolved

We are confident in our remediations and plan to share a postmortem this week.
Posted May 10, 2026 - 17:36 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted May 10, 2026 - 06:22 UTC

Identified

Synoptic engineering teams are addressing an issue which has increased the latency of observations flowing through our system. Currently many datasets are 20 minutes or more later than expected.
Posted May 10, 2026 - 04:04 UTC
This incident affected: Data platform (Data storage and processing).