§ recovery · backup + disaster-recovery runbook

every state OC keeps · how it's backed up · how it's reconstructible.

Two layers of resilience: (1) Upstash Redis snapshots for fast restore; (2) reconstruction from public state (Nostr + OTS + federation ledger) when snapshots aren't enough. Most of what OC stores is metadata that's already replicated to public substrates by design — backups are belt-and-suspenders.

what gets backed up

Each store, where it lives, snapshot cadence, RPO target, and the public substrate that reconstructs it without a snapshot.

store	where	cadence	rpo	reconstructable from
envelope ledger (events-store)	Upstash Redis · oc-me-web	continuous replication + daily snapshot	< 15 min	Nostr-published billable-event records (kind-30078) + OTS calendar attestations · every envelope is content-addressed and OTS-anchored within 15 min
rebind envelopes (custody-state transitions)	Upstash Redis · oc-me-web	continuous replication + daily snapshot	< 15 min	Nostr-published rebind envelopes + OTS proofs · each transition is independently verifiable
integrator projects + price configs	Upstash Redis · oc-me-web	continuous replication + daily snapshot	< 15 min	the integrator's published config at <domain>/.well-known/oc-config.json (canonical source · OC mirrors)
attest-tier records (sat-bond attestations)	Upstash Redis · oc-me-web	continuous replication + daily snapshot	< 15 min	user's signed bond attestation digest + OTS proof (anchored at upgrade time)
treasury ledger (commercial sat flow)	Upstash Redis · oc-me-web	continuous replication + daily snapshot	< 15 min	federation gateway settlement log + Lightning gateway invoice records · OC is its own customer of the federation infra
session jti revocation list	Supabase · oc-www	managed by Supabase · daily snapshot retained 30d	< 5 min	NOT reconstructible · loss means existing sessions remain valid until natural expiry. Worst case: revocations issued during the lost window need to be re-applied. Mitigated by short JWT lifetimes (≤ 24h).
account rows (identity_id, display_name, npub)	Supabase · oc-www	managed by Supabase · daily snapshot retained 30d	< 5 min	the master identity itself is the user's BIP-322 address or did:email DID — both stable; loss of the row only loses display_name / npub which the user can re-supply on next sign-in

what nobody backs up · because nobody holds it

The most important consequence of the no-custody charter · the things that would require a serious backup posture (private keys, PII) aren't in scope because OC never collects them.

· user private keys
OC does not custody. Federation guardians hold key shares; the user holds the graduated BIP-322 key. OC has nothing to back up.
· mnemonics / recovery phrases
OC does not custody. Self-custody users hold their own; federation users delegate to the threshold quorum.
· PII (real names, addresses, payment cards)
OC does not collect. me.ochk.io is privacy-default; integrators define their own KYC if they need it (OC neither performs nor brokers).

recovery procedure · runbook

Step-by-step when an Upstash region fails or a snapshot has to be restored. Operative principle · writes are paused (503 maintenance) at the gateway until step 6.

1
restore Upstash Redis snapshot
point oc-me-web at the most-recent snapshot. Upstash supports point-in-time restores within the retention window.
2
replay Nostr feed for the gap
fetch every kind-30078 envelope between snapshot timestamp and now from the relay set. Each envelope is content-addressed; idempotent re-ingestion produces no duplicates.
3
verify OTS anchoring
walk OTS proofs for envelopes in the gap window. Any envelope without an anchored proof is queued for /api/cron/anchor on the next run.
4
reconcile against federation ledger
fetch the federation gateway settlement log for the gap. Cross-check that every envelope OC re-ingested has a matching federation transaction (or vice versa). Diffs land on /audit as divergences for manual review.
5
spot-check per-user balances
pick 50 random users, recompute balance from envelope replay, compare against /api/me/balance output. Mismatches block the restore from being declared successful.
6
unfreeze writes
flip /api/integrator/event back from "503 maintenance" to live. Webhooks resume firing; the retry queue drains.

drill cadence

A runbook untested is a runbook that won't work when you need it. The commit:

monthly · partial restore against a staging Redis instance · validates the runbook stays in sync with current schema
quarterly · full end-to-end drill on staging · verifies a fresh-from-empty Redis can be reconstructed by replaying Nostr + OTS + federation ledger alone, no production snapshot
annually · publish a transparency report summarizing drill results · what passed, what didn't, what the runbook had to be updated to handle

§ rpo / rto targets

rpo · recovery point objective: ≤ 15 min; worst-case data loss in a recovery scenario · 15 min matches the OTS anchor cron interval, so anything written within the window is still reconstructable from Nostr + OTS even if the snapshot is slightly older
rto · recovery time objective: ≤ 60 min; target end-to-end runbook execution time · steps 1–5 from snapshot restore through balance spot-check, before unfreezing writes

/security · what OC actually stores · the input side of the recovery question
/audit · public divergence tracker · catches recovery drift in real time
/status · liveness probes · current health snapshot
/economics · sat flow transparency · the ledger that survives any single OC store failure

every state OC keeps · how it's backed up · how it's reconstructible.

what gets backed up

what nobody backs up · because nobody holds it

recovery procedure · runbook

drill cadence

related