How Waldo Works · Flow Diagrams · Whiteboard Companion
Every load-bearing flow in the Waldo agent harness and interface layer, drawn end-to-end — from a wearable's overnight samples to a message that feels like "already on it." Built for walkthroughs, deep dives and whiteboarding.
Four passes over the same machine, each one level deeper: the product view, the harness internals, the body-and-memory substrate, and the interface & trust flows.
| Fig | Flow | Grounded in |
|---|---|---|
| 01 | A day with Waldo — the proactive rhythm | ADR-0017 · Brief/Patrol/Dreaming cadence |
| 02 | Master request lifecycle — the whole machine, one pass | Architecture overview §6 |
| 03 | Initiation & KAIROS routing — why it costs ~nothing to be always-on | ADR-0017 · cost model v2 |
| 04 | The bounded ReAct loop | Harness plan · ADR-0008/0033 |
| 05 | Run journal — crash-resume state machine | ADR-0054 |
| 06 | Delivery consistency — the transactional outbox | ADR-0054 · ADR-0009 |
| 07 | Failure ladder · circuit breaker · spend degrade | ADR-0051 · harness plan |
| 08 | Safety pipeline — the 9-event hook gauntlet | ADR-0032/0024/0033 |
| 09 | Multi-source reconciliation → CRS | ADR-0011 (+ amendment) |
| 10 | Memory lifecycle — write · read · dream | ADR-0005/0006/0024/0031/0037 |
| 11 | Telegram, rich two-way — one brain on every surface | Architecture overview Flow E · ADR-0055 |
| 12 | The Handoff — Explore · Plan · Act with a human tap | ADR-0018 · ADR-0027 (token mechanic) |
| 13 | Trigger concurrency inside the DO | Harness plan (two-level guard) · ADR-0014 |
| 14 | Timezone & travel — circadian intelligence on the move | Backend plan (tz detection) |
| 15 | GDPR deletion — erasure across every store | ADR-0055 |
The agent works while the user doesn't: it gets smarter at 2am, pre-computes the morning at 6:30, delivers the Brief in under 3 seconds at wake, and patrols quietly all day. Chat can interrupt anywhere — same brain, same memory.
flowchart LR N2[2:00 · Dreaming Mode
consolidate memory
pre-compute tomorrow] --> S1[6:30 · pre-Brief sweep
today.md refreshed] S1 --> B1[7:00 · Morning Brief
from checkpoint · under 3s
in-app · never a push] B1 --> P[all day · Patrol every 15 min
pure compute · KAIROS skips ~80%] P --> M[12:00 · midday Brief] M --> F[afternoon · stress sustained 10 min
Fetch candidate · shadow in V1] F --> E[18:00 · evening Close] E --> Q[night · sleep window
alarms mostly skip] Q --> N2 C([chat / Telegram · anytime]) -.->|user_message| P classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0; classDef act fill:#10241c,stroke:#34d399,color:#f3ece0; class N2 ev; class S1,P,Q neu; class B1,M,E act; class F,C core;
Surface to delivery, crossing the two trust zones. The hard invariant: only derived insight crosses from Supabase into the agent — raw HRV/HR/sleep never leave the data zone. Every side-effect exits through the journaled outbox, so a crash can never half-deliver.
flowchart TB W[Wearable] --> A[App / Telegram] A -->|encrypted sync · JWT| EF[Edge Functions
sync + build-intelligence
deterministic · zero LLM] EF --> DB[(Postgres + RLS
raw health stays here)] AL([DO alarm · webhook · chat]) --> K{KAIROS
tick-and-decide} K -->|not worth it| Z([hibernate · ~$0]) K -->|act| LOOP[runAgentLoop
bounded ReAct ≤3] DB -->|derived insight only
per-user RLS JWT| LOOP LOOP <-->|provider-shaped| L[[LLM · AI Gateway]] LOOP <-->|Zod · ACL · taint| T[Tool plane] LOOP --> G[Quality gates] G --> OB[(run journal + outbox)] OB -->|idempotent flush| CH[Channels
in-app · APNs / FCM · Telegram] classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0; classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0; class W,A,Z,CH neu; class EF,DB info; class K,LOOP,T,OB core; class G risk; class L ev;
Everything starts inside the per-user DO — alarms, webhooks, chat. Patrol is ~80% pure compute; the model only fires when a branch resolves to a real trigger, and even then the pre-filter can answer with a free template. This is the cost-control heart, drawn as the decision it actually is.
flowchart TB AL([DO alarm · every 15 min waking]) --> RC[recompute CRS + stress sniff
zero LLM] UM([user message · channel webhook]) --> LOOP RC --> D1{wake window or
pre-Brief sweep?} D1 -->|yes| BR[brief · variant] D1 -->|no| D2{stress ≥0.60 sustained 10 min
cooldown ok · budget ok} D2 -->|yes| FE[Fetch candidate
shadow in V1 · gated + logged] D2 -->|no| D3{2am · 3+ unconsolidated
or 48h since last run?} D3 -->|yes| DM[dreaming_mode] D3 -->|no| Z([hibernate · ~$0]) BR --> PRE{pre-filter
Form>60 and calm?} PRE -->|yes| TPL[template with real data
no LLM · $0] PRE -->|no| LOOP[runAgentLoop] FE --> LOOP classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0; classDef act fill:#10241c,stroke:#34d399,color:#f3ece0; class AL,UM,Z,TPL neu; class RC,LOOP core; class D1,D2,D3,PRE risk; class DM ev; class BR,FE act;
IN FLIGHT · The Fetch goes live (push) only after its measured false-positive rate clears a written bar on the validation cohort — until then it runs the full pipeline in shadow.
Trust is reset on every wake — an injection in an earlier session can't carry permissions forward. The loop is bounded three ways: max 3 tool iterations, a diminishing-returns guard, and a circuit breaker on provider failures.
sequenceDiagram
autonumber
participant DO as DO loop
participant LLM as Model
participant T as Tool plane
participant CH as Channel
DO->>DO: session trust reset · emergency scan · pre-filter
DO->>DO: recall(ctx) · build REASONS prompt (frozen snapshot)
loop max 3 iterations
DO->>LLM: prompt + tool schemas
LLM->>T: tool_use (Zod parse · trigger ACL · autonomy + taint gates)
T-->>LLM: sanitised result, capped ~500 tok
Note over DO: guards — 3-iteration cap · diminishing returns · circuit breaker
end
LLM-->>DO: candidate message
DO->>DO: quality gates (medical scrub · hallucination · canary)
DO->>CH: deliver via outbox (push budget · idempotency key)
DO->>DO: write episode · trace · memory intents
Cloudflare may evict a DO mid-loop at any time. Every state transition is one committed DO SQLite transaction, so the next wake calls tick(runId) and resumes exactly where the run stopped — no silent dark, no double-send.
stateDiagram-v2
[*] --> PENDING : startRun(trigger · variant · nonce)
PENDING --> CONTEXT_BUILT : recall + prompt committed
CONTEXT_BUILT --> LLM_CALLED : model response committed
LLM_CALLED --> TOOLS_DONE : tool batch committed
TOOLS_DONE --> GATED : safety gates passed
GATED --> DELIVERED : outbox flushed (idempotent)
DELIVERED --> DONE : episode + trace written
LLM_CALLED --> FAILED : retries exhausted → fallback ladder
FAILED --> [*]
DONE --> [*]
note right of GATED
journal + outbox persist together.
DO evicted anywhere? alarm refires,
tick(runId) resumes from the last
committed state - outbox dedupes.
end note
The write fan-out is where a crash would leave stores inconsistent. The outbox makes each side-effect independently retryable; the idempotency key makes every retry safe.
flowchart TB C[candidate msg + state delta] --> J[(run journal
DO SQLite · single txn)] J --> OB[(outbox rows
each with idempotency_key)] OB --> P1[push · APNs / FCM / Telegram] OB --> P2[memory intents -> Scribe inbox] OB --> P3[trace -> agent_logs] OB --> P4[scores / feedback -> Supabase] P1 -->|ack| M[mark sent] P1 -.->|crash before ack| RT[retry next tick
idempotency_key dedups] classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef mem fill:#10241c,stroke:#34d399,color:#f3ece0; classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0; classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0; classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; class C,J,OB core; class P1 neu; class P2 mem; class P3 ev; class P4 info; class M neu; class RT risk;
LOCKED · The key is content-derived — hash(user · trigger · variant · run_nonce) (ADR-0054). A retry that crosses a time boundary keeps its key; two distinct same-hour runs never collide. Rate-limiting lives separately, in the DO push budget (ADR-0009).
The agent never "fails" — it degrades, level by level, and the user never sees an error. A spend-cap hit degrades to the cheap model, never to a template, and is never logged as an attack — normal heavy use is not abuse (ADR-0051).
flowchart TB
L1[L1 · primary model · full context] -->|ok| OUT([deliver])
L1 -->|fail| L2[L2 · primary model · reduced context]
L2 -->|fail| L3[L3 · template with real data · no LLM]
L3 -->|fail| L4[L4 · silent · log · retry next alarm]
CB{circuit breaker
3 consecutive fails → OPEN
2 successes → CLOSED} -->|open| L3
GW{AI Gateway down?} -->|yes| L3
SC{spend cap hit?} -->|yes| GE[degrade to cheap model
not template · never flagged as abuse]
GE --> OUT
classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
class L1,L2,GE core; class L3,L4,OUT neu; class CB,GW,SC risk;
IN FLIGHT · The cap threshold is a pricing decision (a power user can cost more than a Pro tier earns) — the mechanism above is locked; the number is the open founder call (ADR-0051).
Nine deterministic hook events wrap every invocation (ADR-0032) — this flow shows the gauntlet a message runs. Emergency detection fires before the model and bypasses the whole loop. A canary leak terminates the session. All of it is code, none of it is prompt instruction.
flowchart TB IN[input / trigger] --> STR[session trust reset
fresh ACL · fresh canaries] STR --> ED{emergency patterns?} ED -->|yes| SAFE[safe response + hotlines
no LLM · terminate · log] ED -->|no| PF{pre-filter calm?} PF -->|yes| TPL[template] PF -->|no| H1[pre-LLM hooks
fence memory · inject canaries · budget check] H1 --> LLM[[LLM · pre/post-tool hooks
ACL · autonomy · taint · sanitise]] LLM --> QG{quality gates
medical scrub · hallucination · canary} QG -->|canary leak| KILL[terminate session · log incident] QG -->|ok| PB{push budget ok?} PB -->|yes| SEND([deliver]) PB -->|no| HOLD[hold · in-app only] classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0; class IN,TPL,SEND,HOLD neu; class STR,ED,PF,QG,PB,SAFE,KILL risk; class H1 core; class LLM ev;
Per-metric confidence is authoritative for conflicts (device priority is only the tiebreaker), and baselines are kept per source so switching devices never silently re-baselines the user. The formula is locked, citable science — pure math, zero LLM.
flowchart TB
AW[Apple Watch · 1.0] --> R
OU[Oura · 0.85] --> R
WH[WHOOP · 0.8] --> R
SA[Samsung proxy · 0.6] --> R
R{per-metric confidence rank
priority list = tiebreaker only} --> HD[(health_daily
primary_source + contributing_sources)]
R --> BG[per-source baselines
no silent re-baseline on switch]
HD --> BI[build-intelligence EF · zero LLM]
BG --> BI
BI --> FORM["Form = Sleep ×0.50 + HRV ×0.35
+ Circadian ×0.075 + Motion ×0.075
SAFTE-FAST grounded · 856-day validated"]
FORM --> CRS[(crs_scores
zone + pillar drag)]
classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
classDef act fill:#10241c,stroke:#34d399,color:#f3ece0;
class AW,OU,WH,SA,HD,CRS info; class BI,FORM core; class R risk; class BG act;
IN FLIGHT · Device capability is honest by design: WHOOP/Oura expose overnight data only — intraday stress detection needs Apple Watch / Health Connect. Onboarding states this per device.
Nothing writes the brain directly: writes are sanitised at write time and staged in the Scribe inbox; reads union committed halls with the sanitised pending entries, so same-day learning is visible before the 2am merge — no amnesia, no poisoning window.
flowchart TB TC[tool / agent observation] --> SAN[sanitise at write
5 checks · taint label · per-hall ACL] SAN --> INB[(memory_inbox · staged)] Q[recall ctx · every invocation] --> U{union read:
committed halls ∪ sanitised pending} INB -.->|same-day visible
trust = provisional| U U --> RRF[BM25 + recency · RRF k=60
confidence + salience boosts] RRF --> BLK[fenced recall block
marked NOT instructions] INB --> MRG[2am Scribe merge -> halls
pattern_id dedupe / supersede
bi-temporal · never DELETE] classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef mem fill:#10241c,stroke:#34d399,color:#f3ece0; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0; class TC,Q,BLK core; class SAN risk; class INB,U,RRF mem; class MRG ev;
Telegram is not a second agent — it's a channel adapter into the same per-user DO. Unknown senders are silently dropped; linking uses a one-time deep-link token; and a Telegram conversation lands in the same thread the app shows, with the same memory and approvals.
flowchart TB
TU[Telegram user] -->|message · voice · button tap| WH[webhooks EF router]
WH --> AL{telegram_user_id
on allowlist?}
AL -->|no| DROP([silent drop · always 200])
AL -->|link flow| OTT[one-time deep-link token
10 min · single use] --> LINK[account linked]
AL -->|yes| NRM[normalize event
text · voice -> Whisper transcript · callback]
NRM --> DOX[per-user DO
same brain · same memory]
DOX --> THR[(canonical thread store
shared with in-app Chat)]
DOX --> RSP[reply · channel-shaped
concise prose + inline buttons]
RSP --> OBX[(outbox · idempotent flush)]
OBX --> TU
classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0;
classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
classDef ifc fill:#1e1430,stroke:#c084fc,color:#f3ece0;
class TU,DROP neu; class WH,OTT,LINK info; class DOX,OBX core; class AL risk; class NRM,THR,RSP ifc;
BY DESIGN · Health-derived content on Telegram is consent-gated, never carries raw biometric values, and its cloud-chat residual is documented in the deletion runbook (ADR-0055).
Waldo's most powerful loop is also its most gated. Explore is read-only; Plan produces proposal cards; Act fires only after a signed human tap. The approval token is HMAC-signed and expires in 5 minutes — a prompt injection cannot fake the tap.
flowchart TB ST([Handoff starts]) --> EX[EXPLORE
read-only tools · build context] EX --> PL[PLAN
propose_action -> proposal cards] PL --> CARD[user sees card
Do it · Modify · Not now] CARD -->|tap approve| TOK{HMAC approval token
5-min expiry · server-validated} CARD -->|2h no response| EXP([auto-decline · expire]) TOK -->|valid| ACT[ACT
execute_action · the only gated door] TOK -->|invalid / expired| EXP ACT --> PE[Patrol entry + undo affordance] AUT{autonomy level} -.->|L1 observe only| EX AUT -.->|L2 propose · default| PL AUT -.->|L3 auto + undo
except connector_write| ACT classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; classDef act fill:#10241c,stroke:#34d399,color:#f3ece0; classDef ifc fill:#1e1430,stroke:#c084fc,color:#f3ece0; class ST,EXP neu; class EX,PL core; class CARD,PE ifc; class TOK,AUT risk; class ACT act;
The DO is single-writer by construction. Overlapping triggers queue through the two-level guard (pendingInterrupt); proactive content arriving during a live chat merges into the open thread instead of pushing — Waldo never talks over itself.
flowchart TB
A1[Patrol alarm] --> GATE
A2[user message] --> GATE
GATE{DO input gate
single-writer}
GATE -->|run active| Q[pendingInterrupt queue
processed after current run]
GATE -->|idle| RUN[start run]
Q --> RUN
RUN --> PC{proactive content while
a chat is live?}
PC -->|yes| SUPP[merge into the open thread
notification dot · no push]
PC -->|no| SEND([deliver normally])
classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae;
classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0;
classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0;
classDef ifc fill:#1e1430,stroke:#c084fc,color:#f3ece0;
class A2,SEND neu; class A1,RUN core; class GATE,PC risk; class Q,SUPP ifc;
A circadian product has to survive a red-eye. Timezone changes are detected on the next health sync and the DO reschedules every per-user alarm; scheduling is IANA-zone based so DST never silently shifts the wake alarm.
flowchart TB FLY[travel LAX -> LHR] --> SY[next HealthKit sync
X-Device-Timezone header] SY --> CH{tz changed?} CH -->|no| OK([no-op]) CH -->|yes| UP[update users.timezone
IANA zone · DST-safe] UP --> RS[reschedule signal -> DO] RS --> AL[recompute wake · Patrol · 2am alarms
in the new local time] AL --> JL[recovery-day skill primes
for 24-48h jet-lag window] classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0; classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; classDef act fill:#10241c,stroke:#34d399,color:#f3ece0; class FLY,OK neu; class SY,UP info; class RS,AL core; class CH risk; class JL act;
An Article-9 product owes a complete answer to "delete me." The runbook enumerates every store that holds personal or derived-health data — including observability — with the Telegram cloud-chat residual handled honestly: disable, notify, and instruct.
flowchart TB DEL[deletion request
in-app · Apple webhook] --> ORCH[deletion orchestrator
30-day ceiling] ORCH --> S1[Supabase · FK cascade delete] ORCH --> S2[DO SQLite · wake + purge tables] ORCH --> S3[R2 · workspace + archives by prefix] ORCH --> S4[AI Gateway logs
payload logging off on health routes] ORCH --> S5[observability · hashed IDs
Langfuse / PostHog / Sentry] ORCH --> S6[Telegram · disable + notice
+ user-side delete instructions] classDef neu fill:#1b1813,stroke:#322c23,color:#c9bfae; classDef core fill:#2a1d10,stroke:#f97316,color:#f3ece0; classDef info fill:#10202e,stroke:#60a5fa,color:#f3ece0; classDef mem fill:#10241c,stroke:#34d399,color:#f3ece0; classDef ev fill:#1e1430,stroke:#c084fc,color:#f3ece0; classDef risk fill:#2a1414,stroke:#f87171,color:#f3ece0; class DEL neu; class ORCH core; class S1 info; class S2 core; class S3 mem; class S4,S5 ev; class S6 risk;
IN FLIGHT · Per-store latency commitments and the consent/DPIA wording are being finalized (ADR-0055, proposed) — the store enumeration itself is locked.