me · ochk · io
§ recovery · backup + disaster-recovery runbook

every state OC keeps · how it's backed up · how it's reconstructible.

Two layers of resilience: (1) Upstash Redis snapshots for fast restore; (2) reconstruction from public state (Nostr + OTS + federation ledger) when snapshots aren't enough. Most of what OC stores is metadata that's already replicated to public substrates by design — backups are belt-and-suspenders.

what gets backed up

Each store, where it lives, snapshot cadence, RPO target, and the public substrate that reconstructs it without a snapshot.

storewherecadencerporeconstructable from
envelope ledger (events-store)Upstash Redis · oc-me-webcontinuous replication + daily snapshot< 15 minNostr-published billable-event records (kind-30078) + OTS calendar attestations · every envelope is content-addressed and OTS-anchored within 15 min
rebind envelopes (custody-state transitions)Upstash Redis · oc-me-webcontinuous replication + daily snapshot< 15 minNostr-published rebind envelopes + OTS proofs · each transition is independently verifiable
integrator projects + price configsUpstash Redis · oc-me-webcontinuous replication + daily snapshot< 15 minthe integrator's published config at <domain>/.well-known/oc-config.json (canonical source · OC mirrors)
attest-tier records (sat-bond attestations)Upstash Redis · oc-me-webcontinuous replication + daily snapshot< 15 minuser's signed bond attestation digest + OTS proof (anchored at upgrade time)
treasury ledger (commercial sat flow)Upstash Redis · oc-me-webcontinuous replication + daily snapshot< 15 minfederation gateway settlement log + Lightning gateway invoice records · OC is its own customer of the federation infra
session jti revocation listSupabase · oc-wwwmanaged by Supabase · daily snapshot retained 30d< 5 minNOT reconstructible · loss means existing sessions remain valid until natural expiry. Worst case: revocations issued during the lost window need to be re-applied. Mitigated by short JWT lifetimes (≤ 24h).
account rows (identity_id, display_name, npub)Supabase · oc-wwwmanaged by Supabase · daily snapshot retained 30d< 5 minthe master identity itself is the user's BIP-322 address or did:email DID — both stable; loss of the row only loses display_name / npub which the user can re-supply on next sign-in

what nobody backs up · because nobody holds it

The most important consequence of the no-custody charter · the things that would require a serious backup posture (private keys, PII) aren't in scope because OC never collects them.

  • · user private keys

    OC does not custody. Federation guardians hold key shares; the user holds the graduated BIP-322 key. OC has nothing to back up.

  • · mnemonics / recovery phrases

    OC does not custody. Self-custody users hold their own; federation users delegate to the threshold quorum.

  • · PII (real names, addresses, payment cards)

    OC does not collect. me.ochk.io is privacy-default; integrators define their own KYC if they need it (OC neither performs nor brokers).

recovery procedure · runbook

Step-by-step when an Upstash region fails or a snapshot has to be restored. Operative principle · writes are paused (503 maintenance) at the gateway until step 6.

  1. 1
    restore Upstash Redis snapshot

    point oc-me-web at the most-recent snapshot. Upstash supports point-in-time restores within the retention window.

  2. 2
    replay Nostr feed for the gap

    fetch every kind-30078 envelope between snapshot timestamp and now from the relay set. Each envelope is content-addressed; idempotent re-ingestion produces no duplicates.

  3. 3
    verify OTS anchoring

    walk OTS proofs for envelopes in the gap window. Any envelope without an anchored proof is queued for /api/cron/anchor on the next run.

  4. 4
    reconcile against federation ledger

    fetch the federation gateway settlement log for the gap. Cross-check that every envelope OC re-ingested has a matching federation transaction (or vice versa). Diffs land on /audit as divergences for manual review.

  5. 5
    spot-check per-user balances

    pick 50 random users, recompute balance from envelope replay, compare against /api/me/balance output. Mismatches block the restore from being declared successful.

  6. 6
    unfreeze writes

    flip /api/integrator/event back from "503 maintenance" to live. Webhooks resume firing; the retry queue drains.

drill cadence

A runbook untested is a runbook that won't work when you need it. The commit:

  • monthly · partial restore against a staging Redis instance · validates the runbook stays in sync with current schema
  • quarterly · full end-to-end drill on staging · verifies a fresh-from-empty Redis can be reconstructed by replaying Nostr + OTS + federation ledger alone, no production snapshot
  • annually · publish a transparency report summarizing drill results · what passed, what didn't, what the runbook had to be updated to handle
§ rpo / rto targets
rpo · recovery point objective
≤ 15 min
worst-case data loss in a recovery scenario · 15 min matches the OTS anchor cron interval, so anything written within the window is still reconstructable from Nostr + OTS even if the snapshot is slightly older
rto · recovery time objective
≤ 60 min
target end-to-end runbook execution time · steps 1–5 from snapshot restore through balance spot-check, before unfreezing writes

related

  • /security · what OC actually stores · the input side of the recovery question
  • /audit · public divergence tracker · catches recovery drift in real time
  • /status · liveness probes · current health snapshot
  • /economics · sat flow transparency · the ledger that survives any single OC store failure