Multi-Agent VCS Replication — Architecture Spec (2026-05-03)
Multi-Agent VCS Replication — Architecture Spec (2026-05-03)
Section titled “Multi-Agent VCS Replication — Architecture Spec (2026-05-03)”Companion research:
multi-agent-vcs-replication-research-2026.md. This spec implements Path 1 from that research.
Premise and goal
Section titled “Premise and goal”Multiple AI coding agents (Claude Code instances, MENS workers) and humans, on one machine and across the Populi mesh, edit the same codebase concurrently. Today they isolate via per-task git worktrees and serialize back through PRs.
Goal: non-conflicting edits auto-converge across the entire fleet — local agents, local humans, mesh peers — with no manual merge step. Conflicts surface as first-class navigable artifacts, not as <<<<<<< markers in working trees.
Non-goals:
- Replacing git interop. External git remotes (GitHub, GitLab) remain reachable via
vox-git. Vox is the inner substrate; the git wire protocol is preserved at the repo boundary. - Real-time keystroke-level co-editing. We replicate agent-commit-granularity ops, not keystrokes. Use Yjs/CRDT editing in the IDE if that’s wanted; it’s out of scope here.
- Cross-organization federation. The mesh layer assumes a Populi-trust boundary (vox-secrets-issued identities, JWE-encrypted envelopes); cross-org sync is a future concern.
Decisions baked into this spec
Section titled “Decisions baked into this spec”- jj-lib stays the storage layer.
jj_backend.rsis extended, not replaced. No pivot to Pijul. See research §Recommendation. - Worktree-per-agent stays for now. The migration to jj-workspace-per-agent is deferred to Phase 5+ (out of scope for this spec). Phases 1–4 work on top of the existing worktree pattern.
- Op-log fragments, not snapshots, are the unit of exchange. Smaller payloads, native to jj’s model, preserves causality.
- Populi is the transport. The existing
a2a/dispatch/mesh.rsenvelope format is extended with anOpFragmentvariant. We do not introduce a parallel transport stack. - Iroh is evaluated in Phase 3, not adopted up front. Populi suffices for v1; Iroh is the upgrade path if Populi’s HTTP relay becomes a bottleneck or if NAT-traversal limits emerge. We keep the option open by keeping the gossip protocol transport-agnostic.
- Patch-theory commutativity rules (from Pijul) inform the auto-merge classifier without taking a project dependency.
Architecture overview
Section titled “Architecture overview”┌──────────────────────────────────────────────────────────────────────┐│ Local user box ││ ││ [Claude Tab A] [Claude Tab B] [MENS worker] [Human IDE] ││ │ │ │ │ ││ ▼ ▼ ▼ ▼ ││ ┌─────────────────────────────────────────────────────────────┐ ││ │ vox-orchestrator │ ││ │ │ ││ │ ┌──────────────────┐ ┌────────────────────────────┐ │ ││ │ │ jj_backend.rs │◄──►│ ConvergenceEngine (NEW) │ │ ││ │ │ (storage) │ │ (auto-merge classifier) │ │ ││ │ └──────────────────┘ └─────────────┬──────────────┘ │ ││ │ │ │ ││ │ ┌──────────────────┐ ┌─────────────▼──────────────┐ │ ││ │ │ conflict_manager │◄───┤ MergePolicy (NEW) │ │ ││ │ │ (existing) │ │ - patch commutativity │ │ ││ │ └──────────────────┘ │ - semantic guards │ │ ││ │ │ - socrates arbitration │ │ ││ │ ┌──────────────────┐ └────────────────────────────┘ │ ││ │ │ a2a/dispatch │◄────► OpFragment envelope (NEW) │ ││ │ │ (mesh transport) │ │ ││ │ └────────┬─────────┘ │ ││ └────────────┼───────────────────────────────────────────────┘ ││ │ │└────────────────┼─────────────────────────────────────────────────────┘ │ Populi mesh (HTTP relay; QUIC/Iroh in Phase 3+) ▼ Other peers (humans + agents on other boxes)New primitives
Section titled “New primitives”| Name | Purpose | Lives in |
|---|---|---|
AgentChange | jj change ID owned by exactly one agent at any time. Replaces “this agent’s branch” as the unit of work. | vox-orchestrator/src/jj_backend.rs (new submodule agent_change.rs) |
OpFragment | A single jj operation packaged for replay on a peer: parent op IDs, the operation payload, the agent’s signature, the convergence set membership. | vox-orchestrator/src/jj_backend.rs::op_fragment (new) |
ConvergenceSet | A logical “branch” — a set of agents that intend their work to converge. Replaces today’s manual branching. Each user has at minimum a local set. | vox-orchestrator/src/convergence/ (new module) |
MergePolicy | Pure decision function: given two OpFragments with overlapping file ranges, classify as auto-mergeable, escalate-to-conflict, or block-on-policy. | vox-orchestrator/src/convergence/policy.rs (new) |
ConvergenceEngine | The runtime that ingests local commits and remote OpFragments, applies MergePolicy, and either auto-converges or routes to conflict_manager. | vox-orchestrator/src/convergence/engine.rs (new) |
Components extended (not new)
Section titled “Components extended (not new)”jj_backend.rs— gainsOpFragmentserialization/deserialization, op-log replay-from-fragment.a2a/envelope.rs— addOpFragmentEnvelopevariant to the existing envelope enum.a2a/dispatch/mesh.rs— add agossip_op_fragmenttopic; reuses existing JWE encryption and idempotency keys.mcp_tools/vcs_tools/—change_createreturns anAgentChangeinstead of a raw branch name;conflicts_listreports the new convergence-set-aware conflict shape.vox-socrates-policy— gains a new arbitration rule: when two agents propose ops that theMergePolicyclassifies as semantically ambiguous (e.g., both rename the same symbol to different names), Socrates can arbitrate via hallucination-score weighting before falling back to human conflict.
Data model
Section titled “Data model”AgentChange
Section titled “AgentChange”// vox:skippub struct AgentChange { pub change_id: ChangeId, // jj change ID pub agent_id: AgentId, // owner; exclusive — only this agent appends pub convergence_set: ConvergenceSetId, pub parent_op_id: OpId, // op that created this change pub created_at: Timestamp,}Invariant: at any moment, an AgentChange has exactly one writer. Cross-agent handoff requires an explicit change_handoff op.
OpFragment
Section titled “OpFragment”// vox:skippub struct OpFragment { pub op_id: OpId, // content hash of (parents, payload, agent_id) pub parent_op_ids: Vec<OpId>, // for causal ordering across the mesh pub agent_id: AgentId, // who produced this op pub convergence_set: ConvergenceSetId, pub payload: OpPayload, // jj-lib operation: snapshot, edit, abandon, ... pub signature: Signature, // vox-secrets-issued; binds op_id to agent_id pub produced_at: Timestamp,}
pub enum OpPayload { Snapshot { tree_id: TreeId, commit_id: CommitId, ... }, Edit { change_id: ChangeId, ... }, Abandon { change_id: ChangeId, ... }, Squash { source: ChangeId, dest: ChangeId, ... }, Handoff { change_id: ChangeId, from: AgentId, to: AgentId }, // ...mirrors jj-lib's operation kinds}op_id is a content hash → identical ops dedupe naturally. parent_op_ids is a vector (not just one) so we preserve causal DAG semantics for ops produced concurrently across peers.
ConvergenceSet
Section titled “ConvergenceSet”// vox:skippub struct ConvergenceSet { pub id: ConvergenceSetId, // e.g., "local", "feature/auth-rewrite", "mesh:populi-org" pub members: Vec<AgentId>, // explicit; not implicit from peer connectivity pub merge_policy: MergePolicyId, // which policy applies inside this set pub upstream: Option<ConvergenceSetId>, // optional parent for hierarchical convergence}A user’s default set is local (all their agents on their machine). Joining a mesh-shared set is an explicit action. This is the new “branching” model — sets, not refs.
Wire protocol
Section titled “Wire protocol”Gossip, not pull
Section titled “Gossip, not pull”Each peer streams OpFragments on its outbound channel as soon as they’re produced and signed. Peers receiving fragments:
- Verify the signature against vox-secrets-issued agent identities.
- Check causal parents: if any
parent_op_idis unknown, queue the fragment and request the missing ancestors. - Deduplicate by
op_id. - Hand to
ConvergenceEnginefor replay + merge classification.
This is gossip-style eventual consistency, transport-agnostic. The Populi mesh provides ordered-per-peer delivery and JWE encryption; the protocol does not require it.
Envelope shape (additive change to A2A)
Section titled “Envelope shape (additive change to A2A)”// vox:skip// In crates/vox-orchestrator/src/a2a/envelope.rs:pub enum A2AMessage { // ... existing variants ... OpFragment(OpFragmentEnvelope), OpFragmentRequest(OpFragmentRequest), // for backfill ConvergenceSetAnnouncement(ConvergenceSetAnnouncement),}Reuses the existing JWE encryption (a2a/jwe.rs), idempotency keys, and durability store (populi-mesh-a2a-durability-spec-2026.md — superseded but the VoxDb backing still applies).
Backfill
Section titled “Backfill”A peer joining a convergence set mid-stream requests an op-log range starting from the most recent op it knows. Backfill is bounded: peers retain the last N ops in fast storage; older history is reconstructed from jj’s normal op-store and replayed on demand.
Merge classification (the auto-merge brain)
Section titled “Merge classification (the auto-merge brain)”When two OpFragments touch the same commit or file range, MergePolicy returns one of:
- Auto-merge — patches commute (non-overlapping line ranges, distinct symbols, additive changes). Apply both; no human involved. Borrows from Pijul’s patch theory: independent patches commute.
- Surface as conflict — patches overlap and the bytes don’t match. Materialize via
jj_backend.rs::ContentMerge::n_wayand route to the existingconflict_manager. Conflicts become first-class artifacts, not transient diffs in a working tree. - Escalate to Socrates arbitration — patches overlap but are semantically related (e.g., both rename the same symbol).
vox-orchestrator-types::socrates_policyscores each side’s hallucination risk + author trust and may auto-pick a winner; otherwise falls through to (2). - Policy block — the change violates a project rule (e.g., “agents can’t edit
vox-secrets/src/spec.rswithout human review”). Hold the op; surface to a human.
The classifier is informed by:
- Tree-sitter range overlap analysis for code files. Two edits to the same function body but at non-overlapping byte ranges → check token-level overlap before declaring conflict.
- Pijul-style patch commutativity at the byte level for non-code files.
- Semantic-Aware Replicated Data Type rules (per ICSE 2025) for class/function-level operations: rename + rename of the same target = conflict; rename + add-call-site = auto-merge.
The classifier is pure (no I/O), making it cheap to test and audit.
Phased rollout
Section titled “Phased rollout”Phase 1 — Local multi-agent (4–6 weeks)
Section titled “Phase 1 — Local multi-agent (4–6 weeks)”Scope: Two-plus Claude tabs / agents on one machine, one repo, ops gossiped between them via a local-only ConvergenceSet. No mesh, no remote.
Deliverables:
AgentChange,OpFragment,ConvergenceSet,MergePolicy,ConvergenceEnginetypes.jj_backend.rsextension:op_fragment::serialize/replay.- Local
ConvergenceEnginerunning insidevox-orchestrator, ingesting jj op-log writes and replaying ops from sibling agents. MergePolicy::v1— byte-range overlap classifier; tree-sitter integration deferred.mcp_tools/vcs_tools/change_createreturnsAgentChange; existing callers migrate.- Golden tests: 5-agent fixture, each adds non-overlapping functions to one file, all converge automatically; one fixture forces conflict and verifies it materializes.
- Telemetry:
vox.convergence.*span attributes (auto-merge / escalate / conflict counts).
Success criterion: With 5 Claude tabs editing one repo, ≥80% of edits auto-converge. Remaining 20% surface as named conflicts in conflicts_list.
Phase 2 — Conflict UX (3–4 weeks)
Section titled “Phase 2 — Conflict UX (3–4 weeks)”Scope: Make the conflicts that do surface navigable. Without this, Phase 1’s wins are invisible because users still drown in the conflicts that escape.
Deliverables:
- New
vox vcs conflictsCLI surface listing convergence-set conflicts, grouped by file and origin agent. vox vcs conflicts resolve <id>— materializes the n-way merge, opens editor with markers, on save replays the resolution as a new op (preserving op-log lineage).- MCP tool surface:
conflicts_describe(LLM-friendly conflict explanation: “agent A renamedfootobar; agent B added a call site tofoo”). - Dashboard view (extends
dashboard-migration-research-2026.md) showing live convergence status across local agents.
Success criterion: Time-to-resolve a conflict drops by ≥50% vs. the git status quo (measured against a fixed conflict-corpus of recorded prior PR review threads).
Phase 3 — Mesh gossip (4–6 weeks)
Section titled “Phase 3 — Mesh gossip (4–6 weeks)”Scope: Extend the local protocol across the Populi mesh. Two users, single shared ConvergenceSet, op-fragments gossiped over a2a/dispatch/mesh.rs.
Deliverables:
OpFragmentEnvelopevariant ina2a/envelope.rs.- Gossip topic + backfill protocol in
a2a/dispatch/mesh.rs. ConvergenceSetAnnouncementfor set discovery.- Secrets-issued agent identities for op signing (extend
crates/vox-secrets/). - Iroh evaluation: build a
Transporttrait so Populi or Iroh can be plugged in. Stay on Populi for v1; recommendation in a follow-up findings doc.
Success criterion: Two-user, two-agents-each (4 total) demo: all converge in real time across the mesh; no manual merge for non-overlapping work.
Phase 4 — Policy / safety (3–4 weeks)
Section titled “Phase 4 — Policy / safety (3–4 weeks)”Scope: Socrates arbitration, project-rule policy blocks, audit trail.
Deliverables:
- Socrates rule: hallucination-score-weighted arbitration for semantically ambiguous merges.
- Project policy file (
Vox.toml [convergence.policy]): file-glob rules, agent allowlists per path. - Op-log signing audit:
vox vcs auditlists all auto-merges and arbitrations, with signer identity. - Rollback:
vox vcs op undo <op_id>reverses a specific op across the convergence set, gossipped as a new “undo” op.
Success criterion: Audit trail covers 100% of auto-merged ops, attributable to signing agent. Policy blocks fire on the test fixture (forbidden-path edit by an agent without permission).
Phase 5 (out of scope, named for completeness)
Section titled “Phase 5 (out of scope, named for completeness)”- jj-workspace-per-agent (retire
.claude/worktrees/as the isolation primitive). - Iroh as Populi-replacement transport, if measured contention warrants.
- Cross-organization federation (multi-Populi-trust convergence sets).
- Live keystroke-level CRDT layer underneath op-log for IDE-shared editing.
These are explicit follow-ups. They’re not part of this spec because each requires its own design pass.
Risk register
Section titled “Risk register”| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| jj-lib 0.27 op-log API isn’t stable enough to depend on | medium | high | Pin version; abstract behind jj_backend.rs; track jj upstream changelog as a dependency |
MergePolicy mis-classifies a semantic conflict as auto-mergeable, corrupting code | medium | very high | Phase 1 ships byte-range-only (conservative); semantic rules gated behind explicit Socrates flag; always-on op-log lets jj op undo reverse any bad merge |
| Op-fragment volume saturates the Populi mesh | low (Phase 1–2), medium (Phase 3+) | medium | Local-first; Phase 3 measures actual bandwidth before committing transport choice |
Multi-agent contention on the same AgentChange | low | medium | Single-writer invariant on AgentChange; cross-agent handoff is an explicit op |
| Worktree pattern fights the new model | medium | medium | Phase 1–4 work on top of worktrees; Phase 5+ migrates to jj-workspace-per-agent as a separate spec |
| Pijul-style commutativity rules are wrong for code | medium | high | Conservative defaults; opt-in for aggressive auto-merge; ICSE 2025 rules cited as upper-bound aspiration not v1 baseline |
Open questions
Section titled “Open questions”- Convergence-set membership UX. How does a user discover and join a mesh-shared set? Punted to Phase 3 design; sketch only here.
- Op-log retention. How far back do we keep fast-replay storage? jj’s defaults vs. our needs — measure in Phase 1.
- Interaction with git remotes. When does an auto-merged op become a git commit pushable to GitHub? Likely a periodic “publish” boundary controlled by the convergence-set policy. Detailed design deferred.
Cross-references
Section titled “Cross-references”- Research foundation:
multi-agent-vcs-replication-research-2026.md. - Mesh:
populi-mesh-north-star-2026.md,populi-mesh-improvement-backlog-2026.md,populi-mesh-config-baseline-spec-2026.md. - Orchestrator context:
nextgen-orchestrator-research-2026.md. - Security / signing:
cryptography-ssot-2026.md,crates/vox-secrets/for agent identity. - Code surfaces:
crates/vox-orchestrator/src/jj_backend.rs,crates/vox-orchestrator/src/a2a/,crates/vox-orchestrator/src/mcp_tools/vcs_tools/,crates/vox-git/,crates/vox-orchestrator-types/src/socrates_policy/. - Implementation plan:
multi-agent-vcs-replication-impl-plan-phase1-2026.md— Phase 1 step-by-step. Phases 2–4 will be drafted as separate plans when each is queued.