Navigation
Getting Started
Guides
Integrations
Concepts
Alert Correlation
How multi-location correlation and OTel trace linking reduce noise and speed up root cause analysis.
Alert Correlation
Synthetic monitors generate a lot of signals. Not every failure is a real outage -- network glitches, regional ISP issues, and transient errors produce false positives. Yorker uses multi-location correlation and consecutive failure thresholds to separate real incidents from noise, and OTel trace linking to get you from alert to root cause in one click.
The noise problem
A single-location failure usually means nothing. A DNS resolver in Frankfurt hiccups for 200ms. A CDN edge node in Sydney drops a connection. If you alert on every individual failure, you get paged for problems your users never notice.
The question is not "did one check fail?" but "is the service actually down?"
Multi-location correlation
The multi_location_failure condition answers that question. It requires N of M monitoring locations to report failure within a time window before triggering an alert.
For example, if your check runs from 6 locations and you configure minLocations: 3, the alert only fires when at least 3 locations fail in the same window. A single location flaking does not page you.
alerts:
- name: Homepage Down
conditions:
- type: multi_location_failure
minLocations: 3
channels:
- "@pagerduty-oncall"This eliminates geographic noise. If only Tokyo fails but Ashburn, London, Frankfurt, Singapore, and Sydney are all passing, the problem is regional -- not an outage.
Consecutive failure thresholds
The consecutive_failures condition handles a different class of noise: transient blips. A single timeout or 503 that resolves on the next check interval is not worth alerting on.
alerts:
- name: API Degraded
conditions:
- type: consecutive_failures
count: 5
channels:
- "@ops-slack"This alert only fires after 5 checks in a row fail. A one-off timeout is silently recorded in the check history but does not trigger a notification.
Multi-tier alerting
Combine both conditions to build alert tiers that match your incident response workflow:
alerts:
# Critical: multiple locations confirm the outage
- name: Service Outage
conditions:
- type: multi_location_failure
minLocations: 3
channels:
- "@pagerduty-oncall"
# Warning: persistent failures from any location
- name: Service Degraded
conditions:
- type: consecutive_failures
count: 5
channels:
- "@ops-slack"
# Info: SSL certificate expiring soon
- name: SSL Expiry Warning
conditions:
- type: ssl_expiry
daysBeforeExpiry: 14
channels:
- "@on-call-email"Critical alerts go to PagerDuty because multiple locations confirm the service is down. Warning alerts go to Slack because the issue is persistent but might be localized. Info alerts go to email for non-urgent action items.
OTel trace linking
When a check fails, the trace ID from that execution links directly to the distributed trace in your observability backend. The flow looks like this:
- Runner executes the check and injects a
traceparentheader. - Your backend processes the request and records the trace.
- The check fails (assertion failure, timeout, 5xx response).
- Yorker creates an alert with the trace ID attached.
- You click the trace link in the alert notification.
- Your observability backend shows the full distributed trace: the synthetic request, your API handler, the database query that timed out, the error.
This collapses the "what broke?" investigation from minutes of log searching to a single click. The synthetic check and the backend error are part of the same trace.
The alerts dashboard shows all active, acknowledged, and recovered alerts across your monitors.
Alert lifecycle
Alerts follow a state machine:
| State | Meaning |
|---|---|
| ACTIVE | The alert condition is met. Notifications have been sent. |
| ACKNOWLEDGED | A team member has acknowledged the alert. No repeat notifications. |
| RESOLVED | A team member manually resolved the alert. |
| RECOVERED | The check started passing again. The alert auto-resolves. |
When a check that triggered an ACTIVE alert starts passing again, the alert transitions to RECOVERED and a recovery notification is sent to the same channels. This closes the loop without manual intervention.
Acknowledged alerts suppress repeat notifications but remain visible in the dashboard until the underlying issue is resolved or the check recovers.