How Waldo Works · Flow Diagrams · Whiteboard Companion

Waldo System Flow Atlas

Every load-bearing flow in the Waldo agent harness and interface layer, drawn end-to-end — from a wearable's overnight samples to a message that feels like "already on it." Built for walkthroughs, deep dives and whiteboarding.

Companion to WALDO_HARNESS_SYSTEM_DESIGN.html (the Lead Engineer Map). The canonical source of truth is WALDO_ARCHITECTURE_OVERVIEW.md + the ADR set — every figure cites its grounding. The few decisions still in flight are marked in flight; everything else is the locked V1 design.

00
Atlas index

Four passes over the same machine, each one level deeper: the product view, the harness internals, the body-and-memory substrate, and the interface & trust flows.

Fig	Flow	Grounded in
01	A day with Waldo — the proactive rhythm	ADR-0017 · Brief/Patrol/Dreaming cadence
02	Master request lifecycle — the whole machine, one pass	Architecture overview §6
03	Initiation & KAIROS routing — why it costs ~nothing to be always-on	ADR-0017 · cost model v2
04	The bounded ReAct loop	Harness plan · ADR-0008/0033
05	Run journal — crash-resume state machine	ADR-0054
06	Delivery consistency — the transactional outbox	ADR-0054 · ADR-0009
07	Failure ladder · circuit breaker · spend degrade	ADR-0051 · harness plan
08	Safety pipeline — the 9-event hook gauntlet	ADR-0032/0024/0033
09	Multi-source reconciliation → CRS	ADR-0011 (+ amendment)
10	Memory lifecycle — write · read · dream	ADR-0005/0006/0024/0031/0037
11	Telegram, rich two-way — one brain on every surface	Architecture overview Flow E · ADR-0055
12	The Handoff — Explore · Plan · Act with a human tap	ADR-0018 · ADR-0027 (token mechanic)
13	Trigger concurrency inside the DO	Harness plan (two-level guard) · ADR-0014
14	Timezone & travel — circadian intelligence on the move	Backend plan (tz detection)
15	GDPR deletion — erasure across every store	ADR-0055

A
The product view

FIG 01A day with Waldoproduct rhythm

The agent works while the user doesn't: it gets smarter at 2am, pre-computes the morning at 6:30, delivers the Brief in under 3 seconds at wake, and patrols quietly all day. Chat can interrupt anywhere — same brain, same memory.

flowchart LR
  N2[2:00 · Dreaming Mode
consolidate memory
pre-compute tomorrow] --> S1[6:30 · pre-Brief sweep
today.md refreshed]
  S1 --> B1[7:00 · Morning Brief
from checkpoint · under 3s
in-app · never a push]
  B1 --> P[all day · Patrol every 15 min
pure compute · KAIROS skips ~80%]
  P --> M[12:00 · midday Brief]
  M --> F[afternoon · stress sustained 10 min
Fetch candidate · shadow in V1]
  F --> E[18:00 · evening Close]
  E --> Q[night · sleep window
alarms mostly skip]
  Q --> N2
  C([chat / Telegram · anytime]) -.->|user_message| P
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  classDef act fill:#10241c,stroke:#34d399,color:#f3ece0;
  class N2 ev; class S1,P,Q neu; class B1,M,E act; class F,C core;

FIG 02Master request lifecyclecanonical

Surface to delivery, crossing the two trust zones. The hard invariant: only derived insight crosses from Supabase into the agent — raw HRV/HR/sleep never leave the data zone. Every side-effect exits through the journaled outbox, so a crash can never half-deliver.

flowchart TB
  W[Wearable] --> A[App / Telegram]
  A -->|encrypted sync · JWT| EF[Edge Functions
sync + build-intelligence
deterministic · zero LLM]
  EF --> DB[(Postgres + RLS
raw health stays here)]
  AL([DO alarm · webhook · chat]) --> K{KAIROS
tick-and-decide}
  K -->|not worth it| Z([hibernate · ~$0])
  K -->|act| LOOP[runAgentLoop
bounded ReAct ≤3]
  DB -->|derived insight only
per-user RLS JWT| LOOP
  LOOP <-->|provider-shaped| L[[LLM · AI Gateway]]
  LOOP <-->|Zod · ACL · taint| T[Tool plane]
  LOOP --> G[Quality gates]
  G --> OB[(run journal + outbox)]
  OB -->|idempotent flush| CH[Channels
in-app · APNs / FCM · Telegram]
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  class W,A,Z,CH neu; class EF,DB info; class K,LOOP,T,OB core; class G risk; class L ev;

FIG 03Initiation & KAIROS routing decisioncanonical

Everything starts inside the per-user DO — alarms, webhooks, chat. Patrol is ~80% pure compute; the model only fires when a branch resolves to a real trigger, and even then the pre-filter can answer with a free template. This is the cost-control heart, drawn as the decision it actually is.

flowchart TB
  AL([DO alarm · every 15 min waking]) --> RC[recompute CRS + stress sniff
zero LLM]
  UM([user message · channel webhook]) --> LOOP
  RC --> D1{wake window or
pre-Brief sweep?}
  D1 -->|yes| BR[brief · variant]
  D1 -->|no| D2{stress ≥0.60 sustained 10 min
cooldown ok · budget ok}
  D2 -->|yes| FE[Fetch candidate
shadow in V1 · gated + logged]
  D2 -->|no| D3{2am · 3+ unconsolidated
or 48h since last run?}
  D3 -->|yes| DM[dreaming_mode]
  D3 -->|no| Z([hibernate · ~$0])
  BR --> PRE{pre-filter
Form>60 and calm?}
  PRE -->|yes| TPL[template with real data
no LLM · $0]
  PRE -->|no| LOOP[runAgentLoop]
  FE --> LOOP
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  classDef act fill:#10241c,stroke:#34d399,color:#f3ece0;
  class AL,UM,Z,TPL neu; class RC,LOOP core; class D1,D2,D3,PRE risk; class DM ev; class BR,FE act;

IN FLIGHT · The Fetch goes live (push) only after its measured false-positive rate clears a written bar on the validation cohort — until then it runs the full pipeline in shadow.

B
The agent harness, under the hood

FIG 04The bounded ReAct loopcanonical

Trust is reset on every wake — an injection in an earlier session can't carry permissions forward. The loop is bounded three ways: max 3 tool iterations, a diminishing-returns guard, and a circuit breaker on provider failures.

sequenceDiagram
  autonumber
  participant DO as DO loop
  participant LLM as Model
  participant T as Tool plane
  participant CH as Channel
  DO->>DO: session trust reset · emergency scan · pre-filter
  DO->>DO: recall(ctx) · build REASONS prompt (frozen snapshot)
  loop max 3 iterations
    DO->>LLM: prompt + tool schemas
    LLM->>T: tool_use (Zod parse · trigger ACL · autonomy + taint gates)
    T-->>LLM: sanitised result, capped ~500 tok
    Note over DO: guards — 3-iteration cap · diminishing returns · circuit breaker
  end
  LLM-->>DO: candidate message
  DO->>DO: quality gates (medical scrub · hallucination · canary)
  DO->>CH: deliver via outbox (push budget · idempotency key)
  DO->>DO: write episode · trace · memory intents

FIG 05Run journal — crash-resume (ADR-0054)canonical

Cloudflare may evict a DO mid-loop at any time. Every state transition is one committed DO SQLite transaction, so the next wake calls tick(runId) and resumes exactly where the run stopped — no silent dark, no double-send.

stateDiagram-v2
  [*] --> PENDING : startRun(trigger · variant · nonce)
  PENDING --> CONTEXT_BUILT : recall + prompt committed
  CONTEXT_BUILT --> LLM_CALLED : model response committed
  LLM_CALLED --> TOOLS_DONE : tool batch committed
  TOOLS_DONE --> GATED : safety gates passed
  GATED --> DELIVERED : outbox flushed (idempotent)
  DELIVERED --> DONE : episode + trace written
  LLM_CALLED --> FAILED : retries exhausted → fallback ladder
  FAILED --> [*]
  DONE --> [*]
  note right of GATED
    journal + outbox persist together.
    DO evicted anywhere? alarm refires,
    tick(runId) resumes from the last
    committed state - outbox dedupes.
  end note

FIG 06Delivery consistency — the transactional outboxcanonical

The write fan-out is where a crash would leave stores inconsistent. The outbox makes each side-effect independently retryable; the idempotency key makes every retry safe.

flowchart TB
  C[candidate msg + state delta] --> J[(run journal
DO SQLite · single txn)]
  J --> OB[(outbox rows
each with idempotency_key)]
  OB --> P1[push · APNs / FCM / Telegram]
  OB --> P2[memory intents -> Scribe inbox]
  OB --> P3[trace -> agent_logs]
  OB --> P4[scores / feedback -> Supabase]
  P1 -->|ack| M[mark sent]
  P1 -.->|crash before ack| RT[retry next tick
idempotency_key dedups]
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef mem fill:#10241c,stroke:#34d399,color:#f3ece0;
  classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  class C,J,OB core; class P1 neu; class P2 mem; class P3 ev; class P4 info; class M neu; class RT risk;

LOCKED · The key is content-derived — hash(user · trigger · variant · run_nonce) (ADR-0054). A retry that crosses a time boundary keeps its key; two distinct same-hour runs never collide. Rate-limiting lives separately, in the DO push budget (ADR-0009).

FIG 07Failure ladder · circuit breaker · spend degradecanonical

The agent never "fails" — it degrades, level by level, and the user never sees an error. A spend-cap hit degrades to the cheap model, never to a template, and is never logged as an attack — normal heavy use is not abuse (ADR-0051).

flowchart TB
  L1[L1 · primary model · full context] -->|ok| OUT([deliver])
  L1 -->|fail| L2[L2 · primary model · reduced context]
  L2 -->|fail| L3[L3 · template with real data · no LLM]
  L3 -->|fail| L4[L4 · silent · log · retry next alarm]
  CB{circuit breaker
3 consecutive fails → OPEN
2 successes → CLOSED} -->|open| L3
  GW{AI Gateway down?} -->|yes| L3
  SC{spend cap hit?} -->|yes| GE[degrade to cheap model
not template · never flagged as abuse]
  GE --> OUT
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  class L1,L2,GE core; class L3,L4,OUT neu; class CB,GW,SC risk;

IN FLIGHT · The cap threshold is a pricing decision (a power user can cost more than a Pro tier earns) — the mechanism above is locked; the number is the open founder call (ADR-0051).

FIG 08Safety pipeline — the 9-event hook gauntletcanonical

Nine deterministic hook events wrap every invocation (ADR-0032) — this flow shows the gauntlet a message runs. Emergency detection fires before the model and bypasses the whole loop. A canary leak terminates the session. All of it is code, none of it is prompt instruction.

flowchart TB
  IN[input / trigger] --> STR[session trust reset
fresh ACL · fresh canaries]
  STR --> ED{emergency patterns?}
  ED -->|yes| SAFE[safe response + hotlines
no LLM · terminate · log]
  ED -->|no| PF{pre-filter calm?}
  PF -->|yes| TPL[template]
  PF -->|no| H1[pre-LLM hooks
fence memory · inject canaries · budget check]
  H1 --> LLM[[LLM · pre/post-tool hooks
ACL · autonomy · taint · sanitise]]
  LLM --> QG{quality gates
medical scrub · hallucination · canary}
  QG -->|canary leak| KILL[terminate session · log incident]
  QG -->|ok| PB{push budget ok?}
  PB -->|yes| SEND([deliver])
  PB -->|no| HOLD[hold · in-app only]
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  class IN,TPL,SEND,HOLD neu; class STR,ED,PF,QG,PB,SAFE,KILL risk; class H1 core; class LLM ev;

C
The body & memory substrate

FIG 09Multi-source reconciliation → CRSbody substrate

Per-metric confidence is authoritative for conflicts (device priority is only the tiebreaker), and baselines are kept per source so switching devices never silently re-baselines the user. The formula is locked, citable science — pure math, zero LLM.

flowchart TB
  AW[Apple Watch · 1.0] --> R
  OU[Oura · 0.85] --> R
  WH[WHOOP · 0.8] --> R
  SA[Samsung proxy · 0.6] --> R
  R{per-metric confidence rank
priority list = tiebreaker only} --> HD[(health_daily
primary_source + contributing_sources)]
  R --> BG[per-source baselines
no silent re-baseline on switch]
  HD --> BI[build-intelligence EF · zero LLM]
  BG --> BI
  BI --> FORM["Form = Sleep ×0.50 + HRV ×0.35
+ Circadian ×0.075 + Motion ×0.075
SAFTE-FAST grounded · 856-day validated"]
  FORM --> CRS[(crs_scores
zone + pillar drag)]
  classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef act fill:#10241c,stroke:#34d399,color:#f3ece0;
  class AW,OU,WH,SA,HD,CRS info; class BI,FORM core; class R risk; class BG act;

IN FLIGHT · Device capability is honest by design: WHOOP/Oura expose overnight data only — intraday stress detection needs Apple Watch / Health Connect. Onboarding states this per device.

FIG 10Memory lifecycle — write · read · dreammemory substrate

Nothing writes the brain directly: writes are sanitised at write time and staged in the Scribe inbox; reads union committed halls with the sanitised pending entries, so same-day learning is visible before the 2am merge — no amnesia, no poisoning window.

flowchart TB
  TC[tool / agent observation] --> SAN[sanitise at write
5 checks · taint label · per-hall ACL]
  SAN --> INB[(memory_inbox · staged)]
  Q[recall ctx · every invocation] --> U{union read:
committed halls ∪ sanitised pending}
  INB -.->|same-day visible
trust = provisional| U
  U --> RRF[BM25 + recency · RRF k=60
confidence + salience boosts]
  RRF --> BLK[fenced recall block
marked NOT instructions]
  INB --> MRG[2am Scribe merge -> halls
pattern_id dedupe / supersede
bi-temporal · never DELETE]
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef mem fill:#10241c,stroke:#34d399,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  class TC,Q,BLK core; class SAN risk; class INB,U,RRF mem; class MRG ev;

D
The interface layer & trust flows

FIG 11Telegram, rich two-way — one brain on every surfaceinterface layer

Telegram is not a second agent — it's a channel adapter into the same per-user DO. Unknown senders are silently dropped; linking uses a one-time deep-link token; and a Telegram conversation lands in the same thread the app shows, with the same memory and approvals.

flowchart TB
  TU[Telegram user] -->|message · voice · button tap| WH[webhooks EF router]
  WH --> AL{telegram_user_id
on allowlist?}
  AL -->|no| DROP([silent drop · always 200])
  AL -->|link flow| OTT[one-time deep-link token
10 min · single use] --> LINK[account linked]
  AL -->|yes| NRM[normalize event
text · voice -> Whisper transcript · callback]
  NRM --> DOX[per-user DO
same brain · same memory]
  DOX --> THR[(canonical thread store
shared with in-app Chat)]
  DOX --> RSP[reply · channel-shaped
concise prose + inline buttons]
  RSP --> OBX[(outbox · idempotent flush)]
  OBX --> TU
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef ifc fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  class TU,DROP neu; class WH,OTT,LINK info; class DOX,OBX core; class AL risk; class NRM,THR,RSP ifc;

BY DESIGN · Health-derived content on Telegram is consent-gated, never carries raw biometric values, and its cloud-chat residual is documented in the deletion runbook (ADR-0055).

FIG 12The Handoff — Explore · Plan · Act, with a human tapinterface layer

Waldo's most powerful loop is also its most gated. Explore is read-only; Plan produces proposal cards; Act fires only after a signed human tap. The approval token is HMAC-signed and expires in 5 minutes — a prompt injection cannot fake the tap.

flowchart TB
  ST([Handoff starts]) --> EX[EXPLORE
read-only tools · build context]
  EX --> PL[PLAN
propose_action -> proposal cards]
  PL --> CARD[user sees card
Do it · Modify · Not now]
  CARD -->|tap approve| TOK{HMAC approval token
5-min expiry · server-validated}
  CARD -->|2h no response| EXP([auto-decline · expire])
  TOK -->|valid| ACT[ACT
execute_action · the only gated door]
  TOK -->|invalid / expired| EXP
  ACT --> PE[Patrol entry + undo affordance]
  AUT{autonomy level} -.->|L1 observe only| EX
  AUT -.->|L2 propose · default| PL
  AUT -.->|L3 auto + undo
except connector_write| ACT
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef act fill:#10241c,stroke:#34d399,color:#f3ece0;
  classDef ifc fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  class ST,EXP neu; class EX,PL core; class CARD,PE ifc; class TOK,AUT risk; class ACT act;

FIG 13Trigger concurrency inside the DOcanonical

The DO is single-writer by construction. Overlapping triggers queue through the two-level guard (pendingInterrupt); proactive content arriving during a live chat merges into the open thread instead of pushing — Waldo never talks over itself.

flowchart TB
  A1[Patrol alarm] --> GATE
  A2[user message] --> GATE
  GATE{DO input gate
single-writer}
  GATE -->|run active| Q[pendingInterrupt queue
processed after current run]
  GATE -->|idle| RUN[start run]
  Q --> RUN
  RUN --> PC{proactive content while
a chat is live?}
  PC -->|yes| SUPP[merge into the open thread
notification dot · no push]
  PC -->|no| SEND([deliver normally])
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef ifc fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  class A2,SEND neu; class A1,RUN core; class GATE,PC risk; class Q,SUPP ifc;

FIG 14Timezone & travel — circadian intelligence on the movecanonical

A circadian product has to survive a red-eye. Timezone changes are detected on the next health sync and the DO reschedules every per-user alarm; scheduling is IANA-zone based so DST never silently shifts the wake alarm.

flowchart TB
  FLY[travel LAX -> LHR] --> SY[next HealthKit sync
X-Device-Timezone header]
  SY --> CH{tz changed?}
  CH -->|no| OK([no-op])
  CH -->|yes| UP[update users.timezone
IANA zone · DST-safe]
  UP --> RS[reschedule signal -> DO]
  RS --> AL[recompute wake · Patrol · 2am alarms
in the new local time]
  AL --> JL[recovery-day skill primes
for 24-48h jet-lag window]
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  classDef act fill:#10241c,stroke:#34d399,color:#f3ece0;
  class FLY,OK neu; class SY,UP info; class RS,AL core; class CH risk; class JL act;

FIG 15GDPR deletion — erasure across every storecanonical · ADR-0055

An Article-9 product owes a complete answer to "delete me." The runbook enumerates every store that holds personal or derived-health data — including observability — with the Telegram cloud-chat residual handled honestly: disable, notify, and instruct.

flowchart TB
  DEL[deletion request
in-app · Apple webhook] --> ORCH[deletion orchestrator
30-day ceiling]
  ORCH --> S1[Supabase · FK cascade delete]
  ORCH --> S2[DO SQLite · wake + purge tables]
  ORCH --> S3[R2 · workspace + archives by prefix]
  ORCH --> S4[AI Gateway logs
payload logging off on health routes]
  ORCH --> S5[observability · hashed IDs
Langfuse / PostHog / Sentry]
  ORCH --> S6[Telegram · disable + notice
+ user-side delete instructions]
  classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
  classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
  classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
  classDef mem fill:#10241c,stroke:#34d399,color:#f3ece0;
  classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0;
  classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
  class DEL neu; class ORCH core; class S1 info; class S2 core; class S3 mem; class S4,S5 ev; class S6 risk;

IN FLIGHT · Per-store latency commitments and the consent/DPIA wording are being finalized (ADR-0055, proposed) — the store enumeration itself is locked.

00Atlas index

AThe product view

BThe agent harness, under the hood

CThe body & memory substrate

DThe interface layer & trust flows

00
Atlas index

A
The product view

B
The agent harness, under the hood

C
The body & memory substrate

D
The interface layer & trust flows