Session Handoff — 2026-05-07

Trigger: post-#1759 truth-sync (sync PR opened from docs/sync-post-opaque-receipt-stack-2026-05-07; merge commit recorded after merge). Filename follows docs/dev/HANDOFF_TEMPLATE.md Usage Notes (handoff-YYYY-MM-DD.md). First handoff for 2026-05-07.

Session Goal

Land the opaque receipt storage stack (#1757 → #1758 → #1759) safely and in order, resolving substantive AI review findings as merge-gating per #1744. Surfaced and filed a pre-existing sled-flusher race during the merge cycle (#1760, fix opened as #1761). Sync canonical project truth (docs/STATE.md, docs/PHASE_PROGRESS.md, docs/DOCUMENT_REGISTRY.md, latest handoff) with the runtime work that landed.

Decisive Test

Can a new organizer enter the system mid-cycle, switch into the right scope, see the summit's current phase, understand what was decided, know what is blocked, receive their obligations, trace why they exist, and continue the work without private oral history?

Unchanged from the 2026-04-15 handoff. Every move is judged against whether it makes that more answerable.


Final State (Verified)

main HEAD

92734b25 feat(governance): route ProcessGateResultReceipt through opaque storage cascade (#1759)

Open PRs

PR Branch State CI Status Blocker
#1761 fix/commons-sled-open-retry-on-wouldblock OPEN at handoff write-time CI in flight review pass + merge
#1736 dependabot/npm_and_yarn/sdk/typescript/dev-dependencies-99611381d0 OPEN Dependabot dev-deps; not on critical path maintainer disposition
#1735 dependabot/npm_and_yarn/web/pilot-ui/dev-dependencies-f805b103c2 OPEN Dependabot dev-deps; not on critical path maintainer disposition

Open implementation follow-ups

Issue Title
#1760 fix(commons): add CommonsManager::close() to drain actor before sled handle drop (epic:commons-compute + type:impl) — corrected diagnosis: sled 0.34 flusher-thread shutdown race; fix opened as PR #1761

Open coordination/control issues (not implementation)

Issue Title
#1748 milestone(process): define Institutional Process Substrate (epic:arch-invariants + type:spec)
#1746 milestone(showcase): make NYCN organizer rehearsal operable before first presentation
#1744 ci(review): make substantive AI review findings merge-gating

Branches

  • docs/sync-post-opaque-receipt-stack-2026-05-07 — local + remote at handoff write-time. This sync PR.
  • fix/commons-sled-open-retry-on-wouldblock — local + remote at handoff write-time. PR #1761.
  • main — origin/main at 92734b25 (post-#1759) at handoff write-time.

Stack landed in order

PR Squash SHA Merged At
#1757 feat(gateway): add meaning-blind opaque receipt storage primitive 73b4beb0 2026-05-06 19:49 UTC
#1758 feat(governance): expose opaque storage on GovernanceReceiptBackend trait b87b2197 2026-05-07 10:44 UTC
#1759 feat(governance): route ProcessGateResultReceipt through opaque storage cascade 92734b25 2026-05-07 12:13 UTC

Required-check status (each PR's final gates)

All required checks (Build Release, Test, Clippy, Format Check, Meaning Firewall Check, Kernel Forbidden Dependencies, Firewall Contract Enforcement, TypeScript SDK, Accessibility Tests, Regulatory Compliance Linter, Agent Tooling Drift Check) passed on all three PRs. Compare Against Base (benchmarks) failed on all three but is NOT in the required-checks list; perf-regression-ok label was added to each via REST API (the auto-comment from the workflow says "Add the perf-regression-ok label to merge anyway"). Inspected the benchmark report on each PR: 79–90 benchmarks "regress" each PR, almost all with p > 0.05 ("No change in performance detected") or "Change within noise threshold"; the few with p < 0.05 are in task_operations/resource_profile benchmarks completely unrelated to receipt-store changes — classic shared-runner noise.

Validators (this session, observed outputs)

Run on every push, every cherry-pick, every fix:

  • cargo fmt --check -p icn-gateway -p icn-governance-actor -p icn-commons — pass.
  • cargo clippy -p <touched> --all-targets --all-features -- -D warnings — pass.
  • cargo test -p icn-governance-actor --test process_gate_result_receipt_runtime_slice — 15/15 pass in 0.01s (was multi-second with sleeps).
  • cargo test -p icn-governance-actor — all suites pass.
  • cargo test -p icn-gateway --lib — 541 → 542 pass.
  • cargo test -p icn-commons --lib — new tests + existing all pass (including the new sled_open_survives_rapid_drop_and_reopen 50-iteration loop and is_sled_lock_contention_only_matches_wouldblock classifier test).
  • cargo test -p icn-gateway --test commons_integration — 13/13 (including the three previously-flaky *_survives_sled_drop_and_reopen tests).
  • python3 .github/scripts/firewall_denylist.py — "No NEW firewall violations"; baseline 10, 0 new.
  • bash scripts/check-meaning-firewall.sh — 3 violations (baseline preserved).
  • bash ops/scripts/drift-check.sh — 0 errors, 0 warnings; STATUS: PASS.

What Changed

1. Committed and pushed #1759 deterministic-test fix

Two Copilot review findings on #1759 (raised at 2026-05-06 11:47 UTC) had been edited into the working tree but were uncommitted at handoff resume. Committed as 192b805d on feat/process-gate-receipt-durable-via-opaque. Three tests previously used std::thread::sleep(Duration::from_millis(1100)) to force recorded_at to advance by one second between writes. Replaced with explicit, strictly-increasing recorded_at timestamps on directly-constructed ProcessGateResultReceipt values fed through the backend trait. Test suite finishes in 0.01s, deterministic. Also replaced the flagged .unwrap() calls in opaque_only_backend_chains_session_history_chronologically with named .expect("...") messages.

2. Addressed new #1757 codex P2 raised against cb9d6daf

Codex re-reviewed cb9d6daf at 2026-05-06 12:00:33 UTC (about 2 minutes after the push that triggered CI for cb9d6daf). The new finding ("Reject idempotent replays under a new index key") was substantive: the existing write-once-by-hash check on OPAQUE_REC_PREFIX only protected payload bytes for a fixed (class, record_hash) key; a caller that replayed the same (class, record_hash, payload) tuple under a different (key1, key2, recorded_at) would still write a new secondary by_key entry pointing at the existing primary. That let one canonical receipt fan out across multiple audit chains, or reappear under get_latest_opaque for the wrong tuple, even though no new payload was written.

Addressed in a8fbb1a6 on feat/opaque-receipt-store-primitive before merge. Added a new OPAQUE_HASH_BIND_PREFIX keyspace storing record_hash → canonical (key1, key2_opt, recorded_at) once on first write. put_opaque now consults this binding inside the same sled transaction:

  • absent → insert (first-bind);
  • matches → idempotent fall-through (the secondary index entry is still re-written so the heal-missing-secondary case keeps working);
  • differs → abort with stable sentinel opaque_record_hash_index_collision; no writes land.

Bind, primary, and secondary writes are enforced atomically inside the same sled transaction. New focused test opaque_same_record_hash_different_index_tuple_errors_no_overwrite covers all three divergence axes (different key2, different key1, different recorded_at) and confirms none mutate the originally-bound chain. No domain imports added; no firewall ratchet change.

3. Merged #1757 → #1758 → #1759 in order

For each PR:

  1. Confirmed required checks green on the head commit.
  2. Confirmed Copilot/codex review threads addressed in code (some bot threads remain technically open at the platform level; the substance is in code).
  3. Added perf-regression-ok label via REST API (the gh pr edit --add-label path errors with an unrelated GraphQL Projects-classic deprecation message; REST API works).
  4. gh pr merge --squash with explicit subject line matching repo convention (<title> (#<num>)).
  5. After each merge, retargeted GitHub auto-retargeted the dependent stacked PR to main. Both #1758 and #1759 then needed a rebase off main: reset to origin/main and cherry-picked the original lone commit(s), resolving conflicts where the squashed earlier PR's content collided with the stacked branch's pre-squash merge commits. Force-pushed with --force-with-lease.
  6. Waited for full CI to run on the rebased branch (now base = main, so the full CI workflow runs).

#1759's Test job failed once on the rebased branch — test_commons_charter_survives_sled_drop_and_reopen panicked with EAGAIN/WouldBlock from flock(LOCK_NB). The diff on #1759 was entirely in apps/governance and never touched the commons stack — pre-existing race. Rerun of the failed Test job on the same SHA (no code change) went green, confirming the load-dependent classification.

4. Filed issue #1760 for the surfaced race; opened PR #1761 with the fix

Initial diagnosis (filed and then corrected): my first read of the panic was actor-drop ordering — that CommonsManager wraps a tokio actor task whose drop is asynchronous. That was wrong. CommonsHandle is a synchronous Arc<RwLock<CommonsInner>> wrapper with no spawned tasks; the drop chain unwinds synchronously through CommonsManager → CommonsHandle → CommonsInner → Arc<dyn CommonsStoreBackend> → SledCommonsStore → sled::Db::drop. Updated #1760's body with the corrected diagnosis.

Real root cause: sled 0.34 (sled = "0.34" in icn/Cargo.toml) spawns a background flusher thread on Db::open that holds the OS flock(LOCK_EX) on the database directory's lockfile. On Db::drop, sled signals the flusher to stop but does not synchronously join it — the flock is released only when the flusher actually exits. Without a retry, an immediate-reopen pattern (drop a manager, open a new one against the same path) races against sled's internal shutdown.

Fix opened as PR #1761 on fix/commons-sled-open-retry-on-wouldblock. Single-file change in icn/crates/icn-commons/src/store.rs: bounded retry-with-backoff in SledCommonsStore::open, 8 attempts max, 500ms total budget cap, 10ms initial backoff. Only matches io::ErrorKind::WouldBlock so genuine errors (NotFound, PermissionDenied, etc.) are not masked. Two new unit tests pin the behavior: sled_open_survives_rapid_drop_and_reopen (50-iteration loop) and is_sled_lock_contention_only_matches_wouldblock (classifier coverage). All local validation green; integration tests (cargo test -p icn-gateway --test commons_integration) including the three previously-flaky *_survives_sled_drop_and_reopen tests pass cleanly. CI in flight at handoff write-time.

5. STATE.md truth-sync (docs/STATE.md)

  • Last Reviewed advanced 2026-05-05 → 2026-05-07.
  • Added a 2026-05-07 (post-#1755 / #1756 / #1757 / #1758 / #1759) sync-edit comment block at the top, immediately before the existing 2026-05-05 (post-#1753) block. Records the runtime/implementation truth (the May-5 syncs were doc/control-plane only; this is real Rust), enumerates each landing, names the new OPAQUE_HASH_BIND_PREFIX invariant, captures the surfaced flake → real bug filed → fix opened (#1760 → #1761), and explicitly disclaims Phase 2 advance, formal NYCN pilot, production readiness, live federation, K3s/DNS/Forgejo mutation, and gateway typed governance import widening.
  • Replaced the "Current status (2026-05-05 snapshot)" header with "(2026-05-07 snapshot)" and rewrote the long descriptive paragraph to mark the May-6/May-7 sequence as runtime/implementation truth (distinct from the May-5 doc/control-plane sequence), enumerate the three-PR stack and the new invariant, capture the #1760/#1761 follow-up, and reduce the candidate-next-moves enumeration so candidate (b) reflects the new state ("additional ProcessTransitionReceipt classes" rather than "first").
  • Appended six new rows at the top of "Recently merged (since 2026-04-15)" table: #1759, #1758, #1757, #1756, #1755, #1754.
  • Added #1761 to the "Open PRs" table and #1760 to the "Open coordination/control issues" table.
  • Inserted a new "Opaque receipt storage stack" bullet block at the very top of "What landed since Phase 1 (Charter Engine)", recording each PR's contribution, the new invariant, the surfaced flake → fix path, the hook-tooling fix, and the explicit non-claims (no firewall ratchet increase, no widened gateway typed governance imports, no Phase 2 / production / live federation claims, no NYCN private-data handling, no infra mutation).
  • Updated the References block to point at this handoff (docs/dev/handoff-2026-05-07.md).

6. PHASE_PROGRESS.md truth-sync (docs/PHASE_PROGRESS.md)

  • Last Updated advanced 2026-05-05 → 2026-05-07.
  • Updated the top-of-file phase paragraph to mention the May-6/May-7 opaque receipt storage stack landing alongside the May-5 framing sequence.
  • Added a 2026-05-07 (post-#1755 / #1756 / #1757 / #1758 / #1759) sync-edit comment block summarizing the landings, the new invariant, and the idea-0019 (#1748) acceptance-gate posture: gate (a) is now partially satisfied (one ProcessTransitionReceipt class emitted and durably persisted); gates (b)–(d) remain unchanged.
  • Extended the Phase 2 deliverables list with five new [x] entries crediting #1755 (first ProcessGateResultReceipt runtime slice), #1756 (hook tooling fix), #1757 (opaque storage primitive + new OPAQUE_HASH_BIND_PREFIX invariant), #1758 (GovernanceReceiptBackend trait extension), #1759 (cascade routing + test-suite determinism).
  • Replaced the now-superseded [ ] for "first idea-0019 ProcessTransitionReceipt class" with an "additional classes" [ ] enumerating the seven remaining classes (ProcessSessionOpenedReceipt, DeliberationEntryRecordedReceipt, DecisionRecordedReceipt, ActivationCrossedReceipt, MutationPlanRecordedReceipt, MutationAppliedReceipt, EvidencePacketProducedReceipt) — all eligible through the same opaque storage cascade.
  • Added a 2026-05-07 (post-#1755 / #1756 / #1757 / #1758 / #1759) entry at the top of "Decisions Made" recording the stack landing, the new invariant, the substantive review findings addressed pre-merge, the #1760/#1761 follow-up, the gate-(a) partial-satisfaction posture, the Phase 2 status (still ⏳ partner-bound), and the explicit "next move not yet selected" stance.

7. New session handoff

This file (docs/dev/handoff-2026-05-07.md) replaces docs/dev/handoff-2026-05-05-c.md as the latest handoff. The May-5 handoffs are preserved verbatim as historical artifacts.

8. Registry refresh

Regenerated docs/DOCUMENT_REGISTRY.md to reflect the +1 corpus delta (this new handoff under docs/dev/). The handoff falls under the existing [[doc_path_defaults]] rule with prefix = "docs/dev/" so no per-file overlay row is needed in docs/registry.toml.


What's Open

The next session picks one of these. None is selected here. The next session must choose deliberately from updated canonical truth, not auto-pilot.

  • PR #1761 review pass + merge — the sled-open retry fix. Bounded retry-with-backoff in SledCommonsStore::open for WouldBlock. Closes #1760. CI in flight at handoff write-time. Single-file change; no API change.
  • Additional idea-0019 ProcessTransitionReceipt classes — adding any of the seven remaining classes (ProcessSessionOpenedReceipt, DeliberationEntryRecordedReceipt, DecisionRecordedReceipt, ActivationCrossedReceipt, MutationPlanRecordedReceipt, MutationAppliedReceipt, EvidencePacketProducedReceipt) is now a one-file change in apps/governance because the opaque storage cascade landed. None of these is a separately scoped PR yet.
  • idea-0019 (#1748) acceptance gate (b) — real visibility/privacy-boundary run with redaction in evidence export. Unchanged: not started.
  • idea-0019 (#1748) acceptance gate (c) — accessibility-gate ProcessGateResult produced through docs/design/ORGANIZER_MEMBER_ACCESSIBILITY_GATE.md on a real surface. Unchanged: not started.
  • idea-0019 (#1748) acceptance gate (d) — open-question triage: at least one of Q1 (ProcessTargetRef polymorphism), Q3 (DeliberationEntry kind taxonomy), or Q4 (HumanDecisionSet vs proposal/vote) resolved or explicitly deferred in writing. Unchanged: not started.
  • idea-0020 (DAP) runtime dogfood — emitting at least one receipt under ADR-0026 for one DAP primitive. Unchanged: not started. The opaque storage cascade landed in this session makes adding DAP receipt classes the same one-file change in apps/governance that idea-0019 classes get.
  • DAP §17 follow-up framing briefs — pre-RFC, decompose-only. Unchanged: not started.
  • Control-plane cleanup, including unresolved/stale review-thread hygiene if inspection confirms it; pre-existing yaml-vs-registry drift on docs/ai/*, docs/architecture/INSTITUTION_PACKAGE_BOUNDARY.md, docs/design/*, docs/development/testing/governance-proof-layers.md, docs/onboarding/*, docs/planning/*, docs/plans/* is named here as a possible candidate but not selected.
  • NYCN steward-facing communication-groups directory tool (NYCN #33) — verify status before reading; not in this repo. Carried forward.

Unsafe Assumptions

  • This sync trusts that #1755/#1756/#1757/#1758/#1759 are themselves internally consistent; their squash commit messages and PR bodies are the audit record.
  • "Open PRs" trusts gh pr list --state open --limit 20 at handoff write-time. Dependabot may open additional dev-dependency bumps before the next session.
  • The benchmark "regression" classification (CI-runner noise) trusts p > 0.05 as the de-facto noise floor and trusts that the affected benchmarks (task_operations/resource_profile) are unrelated to any change in icn-gateway/src/receipt_store.rs. A real performance regression in these benchmarks would not be detected by this judgment alone.
  • perf-regression-ok was added to all three PRs via REST API. The label addition succeeded but the gh pr edit --add-label path errored with an unrelated GraphQL Projects-classic deprecation; the REST path is what landed the label.
  • Codex did not auto-re-acknowledge its 2026-05-06 12:00:33 UTC P2 against #1757 after a8fbb1a6 was pushed. The bot thread remains technically open at the platform level even though the substance is addressed in code with explicit test coverage. Treated as resolved for merge-readiness purposes per the user's clarification that the in-code fix + green CI satisfy the merge gate; a future audit may want to verify codex bot thread state.
  • The sled 0.34 flusher-thread shutdown analysis is based on inspection of icn-commons/src/store.rs, icn-commons/src/handle.rs, and the public sled 0.34 API. It is not based on instrumenting the flusher thread directly; the EAGAIN/WouldBlock failure mode is consistent with the diagnosis but other threading interactions in sled 0.34 have not been audited.
  • Rust workspace verification was scoped to the changed crates each push (icn-gateway, icn-governance-actor, icn-commons). Workspace-wide cargo check --workspace was not run as part of this docs-sync; the last full workspace check happened during the merge cycle on the rebased branches before each push.

Next Move

Not selected here. Pick deliberately from the "What's Open" list at the next session.

The single most-likely next move (named here so the next session can choose deliberately, not by inertia) is reviewing and merging PR #1761. After that, no specific next move is preselected — the seven open idea-0019 ProcessTransitionReceipt classes, the three remaining #1748 acceptance gates (visibility/privacy boundary; accessibility-gate ProcessGateResult on a real surface; open-question triage on Q1/Q3/Q4), the DAP runtime dogfood, the DAP §17 follow-up framing briefs, and control-plane cleanup are all eligible candidates.


Architectural Decisions

  1. OPAQUE_HASH_BIND_PREFIX invariant added to the gateway. Each (class, record_hash) is bound to exactly one canonical (key1, key2_opt, recorded_at) tuple at first write; divergent re-binds abort with stable sentinel opaque_record_hash_index_collision. Bind, primary, and secondary writes are atomic inside the same sled transaction. This closes a secondary-index fan-out hole that the original write-once-by-hash check on OPAQUE_REC_PREFIX (cb9d6daf) did not catch. Codified in icn/crates/icn-gateway/src/receipt_store.rs doc comments (a8fbb1a6 → squashed into #1757 → main as 73b4beb0).
  2. Apps add new receipt classes opaquely without expanding gateway typed governance imports. The (class, key1, key2_opt, recorded_at, record_hash, payload) tuple is the single contract between apps and the gateway for receipt persistence. The opaque cascade pattern (#1758 trait extension + #1759 default routing) means a new ProcessTransitionReceipt class becomes a one-file change in apps/governance — no gateway touch, no firewall ratchet expansion. Codified in OPAQUE_REC_PREFIX doc comment and put_opaque doc comment.
  3. No promoting standalone receipt classes to canonical without runtime emission. The slice-local class candidates (ConflictDisclosureAcceptedReceipt, MinorityReportRecordedReceipt, FacilitatorSummaryRecordedReceipt) referenced in #1753 and #1751 dogfood/framing artifacts remain non-canonical until a runtime dogfood actually emits them. ProcessGateResultReceipt IS now canonical because #1755 emits it and #1759 makes that emission durable. Pattern explicitly preserved.

No other binding architectural decisions were made or ratified this session.


Verification Commands

# 1. Confirm main HEAD is post-#1759.
git fetch origin main
git log --oneline -1 origin/main
# expect: 92734b25 feat(governance): route ProcessGateResultReceipt through opaque storage cascade (#1759)

# 2. Confirm Rust compile-clean against the merged stack.
cd icn
cargo check -p icn-gateway -p icn-governance-actor -p icn-commons
cargo test -p icn-governance-actor --test process_gate_result_receipt_runtime_slice
cargo test -p icn-gateway --lib
cargo test -p icn-gateway --test commons_integration

# 3. Confirm firewall ratchet preserved.
cd ..
python3 .github/scripts/firewall_denylist.py
bash scripts/check-meaning-firewall.sh

# 4. Confirm doc-control plane clean.
python3 docs/scripts/doc_control_check.py --repo . --registry docs/registry.toml --strict
bash ops/scripts/drift-check.sh

# 5. Confirm the open follow-up PRs (review state, CI status).
gh pr view 1761 --json state,mergeable,statusCheckRollup,reviewDecision
gh pr view 1762 --json state,mergeable,statusCheckRollup,reviewDecision

# 6. Confirm the open issue list matches this handoff.
gh issue list --state open --limit 20

If any of (1)–(4) diverges from the expected state, audit the divergence before resuming work. If (5) shows #1761 or #1762 in an unexpected state, re-read the PR's review threads before acting.


Truth-Plane Notes

  • Declared project truth (loaded from): docs/STATE.md post-#1753 sync edit at session start; docs/PHASE_PROGRESS.md post-#1753 sync edit at session start. Both advanced this session to post-#1759 sync edits as part of this handoff's accompanying docs PR. The Phase 2 framing (still ⏳ partner-bound) is unchanged across this advance — only deliverable enumeration and idea-0019 (#1748) acceptance-gate posture changed.
  • Implementation truth (what was verified from code): the opaque receipt storage stack landed and was inspected end-to-end (OPAQUE_REC_PREFIX, OPAQUE_BY_KEY_PREFIX, OPAQUE_HASH_BIND_PREFIX keyspaces; the put_opaque sled transaction; the GovernanceReceiptBackend trait extension; the put_process_gate_result cascade). The sled-flusher race in commons_integration.rs was diagnosed by reading icn-commons/src/handle.rs, icn-commons/src/store.rs, and sled-0.34.7/src/config.rs — confirmed CommonsHandle is synchronous Arc<RwLock<CommonsInner>> with no spawned task, and confirmed sled 0.34.7's Config::try_lock wraps flock failures with ErrorKind::Other.
  • Execution truth (branch/PR/CI state confirmed): all three stack PRs verified merged via gh pr view (states + merge SHAs); local working tree clean before each branch switch; CI gate status confirmed via gh pr checks per PR.
  • Narrative truth (any strategy doc claims that may be stale): the May-5 sync edits' references to "candidate (b)" (idea-0019 runtime dogfood toward receipt-backed promotion as a candidate next move) were superseded by this session — that candidate is now partially landed via #1755 + #1759. The candidate enumeration in this PR's STATE.md sync edit reflects that. Strategy-level claims about Phase 2 partner-bound status are unchanged.
  • Known conflicts between layers: none discovered this session beyond the codex bot review thread on #1757 remaining technically open even after the in-code fix a8fbb1a6 was pushed and merged. The substance is addressed in the merged code with explicit test coverage; the bot thread state is a platform-level artifact, not an implementation truth divergence.