Navigation
Getting Started
Guides
Integrations
Guides
Set Up Alerts
How to configure alert rules — consecutive failures, multi-location correlation, SSL alerts, and SLO burn-rate alerts.
Set Up Alerts
Alerts notify you when monitors detect problems. Each alert rule combines one or more conditions (what triggers the alert) with one or more channels (where the notification goes).
Define alert channels
To send notifications, first define your channels in the alertChannels block at the top of yorker.config.yaml. Each channel has a name (the key) and a type-specific configuration.
alertChannels:
ops-slack:
type: slack
webhookUrl: "{{secrets.SLACK_WEBHOOK_URL}}"
on-call-email:
type: email
addresses:
- [email protected]
- [email protected]
pagerduty:
type: webhook
url: "{{secrets.PAGERDUTY_WEBHOOK_URL}}"
method: POST
headers:
Authorization: "Token token={{secrets.PD_TOKEN}}"Channel types
| Type | Required fields | Description |
|---|---|---|
slack | webhookUrl | Posts to a Slack incoming webhook. |
email | addresses (array, at least one) | Sends email to the listed addresses. |
webhook | url | Sends an HTTP request. method defaults to POST. Optional headers for auth. |
Reference channels in alerts
To attach a channel to an alert, reference it with the @channel-name syntax:
monitors:
- name: API Health
type: http
url: https://api.example.com/health
alerts:
- conditions:
- type: consecutive_failures
count: 3
channels:
- "@ops-slack"
- "@on-call-email"Alert conditions
Each alert must have at least one condition. Multiple conditions on the same alert are combined with AND logic — all conditions must be met for the alert to trigger.
consecutive_failures
Triggers after a monitor fails a specified number of times in a row.
- type: consecutive_failures
count: 3 # default: 2, min: 1response_time_threshold
Triggers when response time exceeds a threshold.
- type: response_time_threshold
maxMs: 5000 # millisecondsmulti_location_failure
Triggers when a monitor fails from multiple locations within a time window. This reduces false positives from localized network issues.
- type: multi_location_failure
minLocations: 2 # default: 2, min: 2
windowSeconds: 300 # default: 300 (5 minutes)ssl_expiry
Triggers when an SSL certificate is approaching expiration.
- type: ssl_expiry
daysBeforeExpiry: 14 # default: 14, min: 1
severity: warning # optional: critical | warning | infossl_certificate_changed
Triggers when the leaf certificate's fingerprint changes between runs — useful for catching unexpected cert rotations and possible man-in-the-middle conditions.
- type: ssl_certificate_changed
severity: criticalssl_self_signed
Triggers when the endpoint presents a self-signed (or otherwise untrusted-root) certificate.
- type: ssl_self_signed
severity: criticalssl_protocol_deprecated
Triggers when the TLS handshake negotiates a protocol older than minProtocol.
- type: ssl_protocol_deprecated
minProtocol: TLSv1.2 # default: TLSv1.2 (allowed: TLSv1.2, TLSv1.3)
severity: warningburn_rate
Triggers when an SLO's error budget is burning faster than a threshold across a short window AND a long window (the Google SRE multi-window burn-rate alerting pattern). Requires an existing SLO — reference it by ID.
- type: burn_rate
sloId: slo_abc123
burnRateThreshold: 14.4 # burn rate multiple (e.g. 14.4 = budget exhausted in ~2 days at a 30d SLO)
longWindowMinutes: 60 # minimum 60
shortWindowMinutes: 5 # minimum 5, MUST be less than longWindowMinutesBurn-rate alerts are automatically wired up when you set burnRateAlerts: true on an SLO (the default). Use a manual burn_rate condition only if you need custom threshold/window combinations beyond the built-in ones. See Define SLOs for the simpler path.
baseline_anomaly
Triggers when a performance metric drifts away from its learned baseline for several consecutive runs. Baselines are stored per (check, location, hour-of-day, day-of-week) bucket so a monitor that's slower on Monday mornings doesn't trip the alert every Monday.
- type: baseline_anomaly
metric: response_time # required
sigmaThreshold: 3 # default: 3 (min: 2, max: 10)
consecutiveCount: 3 # default: 3 (min: 2, max: 20, integer)
direction: above # default: above (allowed: above | below | both)
severity: warning # default: warningSupported metrics. HTTP: response_time, dns_lookup, tls_handshake, ttfb, content_transfer. Browser: lcp, fcp, cls.
How the chain works. On each result ingestion the engine reads the last N runs for this check+location, regardless of status. The alert fires only if all N are successful AND each deviates by more than sigmaThreshold·σ from its own time-bucketed baseline in the configured direction. Any non-success run inside the window breaks the chain, so this alert stays scoped to drift-style regressions rather than outages. Failures are not skipped over to reach earlier successes: the window simply slides forward until it again contains N successes.
Pick a reasonable threshold. 3σ / 3 consecutive is a conservative starting point: under the normal assumption (and assuming run-to-run independence), the per-run false-positive rate at 3σ is ≈1-in-740 for one-sided checks (direction: above or below, the default) and ≈1-in-370 for two-sided (direction: both). Across 3 consecutive runs that compounds to ≈1-in-400-million one-sided or ≈1-in-50-million two-sided. In practice runs sharing a time bucket carry correlated noise (network conditions, regional perturbations), so treat the compounded figure as a theoretical ceiling. Tightening to 4σ / 5 consecutive buys near-zero false positives; loosening to 2σ / 2 consecutive is effectively a point-anomaly detector.
Direction. above catches slowdowns (the common case for response-time metrics). below catches suspiciously-fast responses, which often indicate the runner short-circuiting past the real work (stale cache hits, 304 storms, redirect chains being skipped). both is useful for CLS-style vitals where either side is a UX regression.
Severity
All SSL-related conditions (including ssl_expiry), mcp_schema_drift, and baseline_anomaly accept an optional severity field with value critical, warning, or info. Severity is stored on the resulting alert instance and surfaces in the alerts dashboard: use it to distinguish "nice to know" rotations from genuine outages. mcp_schema_drift and baseline_anomaly default to warning (set by the shared schema); SSL conditions have no schema default and fall back to critical via the evaluator.
Cascading alerts
Alerts follow the same cascade as other monitor settings: defaults -> group -> monitor. Define alerts at any level:
defaults:
alerts:
- conditions:
- type: consecutive_failures
count: 2
channels:
- "@ops-slack"
groups:
- name: Critical APIs
alerts:
- conditions:
- type: consecutive_failures
count: 1
channels:
- "@ops-slack"
- "@pagerduty"
monitors:
- name: Payments API
type: http
url: https://api.example.com/paymentsWhen a monitor defines its own alerts, those replace the inherited alerts entirely. To clear inherited alerts, set alerts: [] on the monitor.
Multi-tier alerting
To escalate alerts based on severity, define multiple alert rules with different conditions and channels:
monitors:
- name: Checkout Flow
type: browser
script: ./monitors/checkout.ts
alerts:
# Tier 1: Slack for initial failures
- name: checkout-warning
conditions:
- type: consecutive_failures
count: 2
channels:
- "@ops-slack"
# Tier 2: PagerDuty for persistent multi-location failures
- name: checkout-critical
conditions:
- type: consecutive_failures
count: 5
- type: multi_location_failure
minLocations: 3
channels:
- "@pagerduty"
- "@on-call-email"
# SSL expiry: early warning
- name: checkout-ssl
conditions:
- type: ssl_expiry
daysBeforeExpiry: 30
severity: warning
channels:
- "@ops-slack"
# SSL rotation detection
- name: checkout-ssl-rotation
conditions:
- type: ssl_certificate_changed
severity: info
channels:
- "@ops-slack"OTel trace linking
When an alert fires, Yorker includes the OpenTelemetry trace ID in the notification payload. If your application propagates the W3C traceparent header, you can jump directly from an alert to the distributed trace in your observability backend (e.g., HyperDX, Jaeger, Grafana Tempo) to identify root cause.
Web UI
To create alerts through the dashboard:
- Navigate to a monitor's detail page.
- Click Add Alert Rule.
- Select one or more conditions and configure thresholds.
- Choose notification channels (create them in Settings > Notification Channels if needed).
- Click Save.
Alert rules created in the Web UI and the CLI are the same underlying resource. The CLI's yorker deploy command will detect and diff against rules created through the UI, and abort on drift unless you pass --force or --accept-remote.
You can also view all alerts across monitors from the Alerts page in the dashboard.
CLI alert management
In addition to defining alerts in yorker.config.yaml, you can manage alert instances directly from the command line.
List active alerts
yorker alerts listInclude resolved and recovered alerts with --all, or filter by monitor:
yorker alerts list --monitor "Homepage" --allAcknowledge and resolve
yorker alerts ack ainst_abc123
yorker alerts resolve ainst_abc123View alert history
yorker alerts history --since 7dCreate alert rules imperatively
yorker alerts rules create \
--monitor "Homepage" \
--condition "consecutive_failures >= 3" \
--channel nch_abc123 \
--name "homepage-down"Baseline-deviation rules use baseline_anomaly:<metric> (defaults to 3σ, 3 consecutive, above) or the explicit baseline_anomaly:<metric>@<sigma>σ:<consecutive>[:above|below|both] form:
yorker alerts rules create \
--monitor "Checkout API" \
--condition "baseline_anomaly:response_time" \
--channel nch_abc123 \
--severity warning
yorker alerts rules create \
--monitor "Marketing site" \
--condition "baseline_anomaly:lcp@4σ:5:above" \
--channel nch_pagerduty \
--severity criticalSee the CLI reference for the full list of alert commands and condition formats.