Concepts

Architecture

How Yorker's three-tier architecture works — Control Plane, Orchestrator, and Runners.

Architecture

Yorker uses a three-tier architecture that separates check management from check execution. This design gives you per-execution isolation for browser checks, lets runners operate in 14 global regions, and lets one control plane coordinate alerting, SLOs, and insights across every monitor. Different check types and location types take different telemetry paths to your OTel backend — the Telemetry flow section below lays out exactly which data travels which route.

Three-tier model

Control Plane

The control plane is a Next.js application deployed on Vercel. It handles:

  • Check definitions -- creating, updating, and deleting monitors via the Web UI, CLI, or API.
  • User accounts and teams -- authentication (Clerk), team membership, API key management.
  • Results storage -- check metadata (pass/fail, response times, Web Vitals) stored in Postgres via Neon.
  • Artifact storage -- screenshots and debug artifacts stored in Cloudflare R2 (S3-compatible).
  • API -- RESTful endpoints for all operations, consumed by the dashboard, CLI, and runners.

The control plane never executes checks itself. It defines what to monitor, stores check results in Postgres, evaluates alert rules and SLO burn, runs anomaly detection, and generates monitor and team insights. It also hands off OTel events to the orchestrator's outbox for asynchronous delivery — see Telemetry flow for the full breakdown of which emissions come from the runner versus the control plane.

Orchestrator

The orchestrator is an always-on service running on Fly.io. It:

  1. Polls Postgres on a schedule to find checks that are due for execution.
  2. Dispatches Fly Machines in the correct region for each check.
  3. Manages the machine pool lifecycle -- creating, reusing, and destroying machines.

The orchestrator is the bridge between "this check should run every 5 minutes from London" and "spin up a machine in lhr right now."

Runners

Runners execute checks. They run on Fly.io across 14 global regions, close to the infrastructure being monitored. Every runner, regardless of mode, does the same three things for each check:

  1. Executes the check (HTTP request, Playwright browser session, or MCP tool exchange).
  2. Uploads screenshots directly to Cloudflare R2 (browser checks only).
  3. Submits the full check result — timing breakdown, assertions, Web Vitals, certificates, network data — to the Yorker control plane via POST /api/runner/results for storage and evaluation.

Some runners additionally emit OTLP metrics/traces directly from the runner to your collector — but not all of them do, and which do depends on check type and deployment mode. That split is explained in Telemetry flow below.

Three execution tiers

Different check types have different isolation and resource requirements. Yorker uses three execution tiers:

Tier 1 -- Ephemeral Heavy (Browser checks)

Each browser check runs in its own Fly Machine with Playwright and Chromium (~1GB image). The machine is created for the check and destroyed after it completes.

This gives you:

  • Full isolation -- no shared browser state between checks or tenants.
  • Clean environment -- no cookies, cache, or extensions carrying over.
  • Predictable performance -- no resource contention from other checks.

Tier 3 — Per-Tenant Persistent (HTTP and MCP checks)

HTTP and MCP checks run in lightweight Node.js containers (~200MB image). Each customer gets one persistent container per region. The container stays alive and executes checks as they come due.

This gives you:

  • Low latency — no cold-start overhead for each check.
  • Efficiency — HTTP and MCP checks are fast and lightweight, so sharing a container is safe.
  • Tenant isolation — each customer has their own container, so one customer's checks cannot affect another's.

Tier 2 — Ephemeral Light (Coming soon)

Multi-step API tests will run in ephemeral lightweight containers — isolated like browser checks but without the Chromium overhead.

Telemetry flow

Yorker's emission model has two constants and one split:

  • Constant 1: every check run submits its result (status, timing, assertions, Web Vitals, certificates, screenshots, console logs, etc.) to the Yorker control plane via POST /api/runner/results. This is an internal submission protocol, not OTLP. It is how alerting, SLOs, insights, the dashboard, and the CLI all work. It happens whether or not you have configured an OTLP endpoint.
  • Constant 2: once you have configured a team OTLP endpoint under Settings > Telemetry (OTLP), the control plane enqueues OTLP log and span events in an outbox whenever it has something worth telling your collector about — a completed or failed check, an alert state change, an SLO burn, a certificate rotation, a new insight, a deployment marker, a maintenance-window edit. The orchestrator (a separate always-on Fly service) drains that outbox every ~10 s, applies SSRF guards, and POSTs the OTLP payload to your collector. You can see every event type in apps/web/src/lib/otel-events.ts and the shipper in apps/orchestrator/src/outbox-drain.ts. If no team OTLP endpoint is configured, the control plane skips the enqueue entirely — the result still lands in Postgres and still drives alerting and the dashboard, but no OTel events are produced.
  • The split: browser checks also emit OTLP metrics and traces directly from the runner to your collector (when a team OTLP endpoint is configured; the orchestrator threads it into each browser execution payload). HTTP and MCP checks running on Yorker-hosted locations do not emit OTLP from the runner — the OTel signal for those runs is the synthetics.check.completed / synthetics.check.failed log event the control plane enqueues on your behalf (it carries the same dimensions: response time, status, assertions, timing breakdown). Private-location runners can opt in to runner-direct OTLP for HTTP and MCP checks by setting OTLP_ENDPOINT/OTLP_API_KEY as environment variables on the runner container at deploy time.

The table below is the short version. All "outbox → collector" rows assume you have configured a team OTLP endpoint — without one, the outbox columns are skipped entirely.

Check type / locationPer-check metrics + tracescheck.completed log eventAlerts / SLO / insight / cert / deployment events
Browser, any locationRunner → collector (direct)Control plane → outbox → collectorControl plane → outbox → collector
HTTP / MCP, Yorker-hosted(not emitted from runner today)Control plane → outbox → collectorControl plane → outbox → collector
HTTP / MCP, private locationRunner → collector (direct) — only if you set OTLP_ENDPOINT on the agent container at startupControl plane → outbox → collectorControl plane → outbox → collector

A few things to note:

  • Every path targets the same otlpEndpoint. Whether a signal is runner-direct or outbox-delivered, it lands in the same collector you configured under Settings > Telemetry (OTLP).
  • Every emission is OTLP HTTP JSON. There is no proprietary ingestion format to learn. Runner-direct emission skips entirely if otlpEndpoint is unset (best-effort). The outbox path is retried with exponential backoff by the orchestrator.
  • A runner crash skips both emission paths for the affected run — the control plane only learns about the attempt if and when the runner submits a result. An orphaned attempt shows up as a gap in your result history, not an OTLP event.

What this means for compliance and data flow

  • Check results transit the Yorker control plane. They land in Yorker's Postgres and power the dashboard, alerting, SLOs, and insights. What the control plane sees per run: timing breakdown, assertion pass/fail, HTTP status codes, truncated response bodies (for assertion re-evaluation and display), TLS certificate metadata, and — for browser checks — Web Vitals, network request metadata, console logs, screenshot references (the actual image bytes live in R2, uploaded directly by the runner), and step results. Retention depends on plan tier.
  • For browser checks, per-check metrics and traces also reach your collector without traversing Yorker. If you need low-latency, runner-direct OTel for browsers on hosted locations today, you already have it.
  • For HTTP and MCP checks on Yorker-hosted locations, any OTel signals reaching your collector flow through the Yorker control plane (and the orchestrator outbox), once you have configured an OTLP endpoint. This is a real architectural trade-off: it is how Yorker can enrich the check.completed event with things the runner doesn't know — SLO state, anomaly scores, alert context.
  • If you need full runner-direct OTLP for HTTP and MCP checks, run a private location and set OTLP_ENDPOINT/OTLP_API_KEY on the runner container at startup. The agent reads those env vars once and emits OTLP for every check it runs.

Private locations

Private locations let you run Yorker's runner inside your own network. The runner calls your internal services from inside your VPC (so internal hosts stay private), uploads screenshots directly to R2 (or a local fallback if R2 isn't configured), and POSTs results to the Yorker control plane over outbound HTTPS. That last path is mandatory — alerting, SLOs, and the dashboard all depend on it.

You can opt your private runners into runner-direct OTLP for HTTP and MCP checks by setting OTLP_ENDPOINT and OTLP_API_KEY as environment variables on the runner container when you start it. Do that with an internal collector and no per-check telemetry ever leaves your network — only the result submission to the control plane egresses. Browser checks on private locations also emit runner-direct OTLP the same way hosted browser checks do. Derived events (alerts, SLO, cert, insight, check.completed) continue to flow via the control plane outbox → orchestrator → your collector.

Screenshot pipeline

For browser checks, screenshots flow through a dedicated pipeline:

  1. Capture -- Playwright captures screenshots during script execution (every step, on failure only, or disabled).
  2. Upload -- The runner uploads screenshots to Cloudflare R2 with a path scoped to the team, check, and run.
  3. Serve -- The control plane API serves screenshots with team ownership validation. Only members of the team that owns the check can access its screenshots.

Screenshots are stored as R2 artifacts with retention based on your plan tier.

Data tiering

Check result data is split into tiers based on how often it is accessed and how long it needs to be retained:

TierWhat it storesRetentionStorage
Tier A (checkResults)Core metrics: pass/fail, response time, status code, Web VitalsFull plan retentionPostgres
Tier B (checkResultDetails)Debug data: network request waterfalls, console logs, DOM snapshotsShorter retentionPostgres (JSONB)
R2 artifactsScreenshots, full network headersBased on plan tierCloudflare R2

Tier A data is always stored and drives the dashboard, alerting, and SLO calculations. Tier B data is for debugging failed checks and is retained for a shorter window to manage storage costs.