Skip to content

Unified orchestration — SSOT

This document captures compatibility rules and opt-in migration toggles while MCP, CLI, and DeI share one orchestrator contract (vox-orchestrator).

Repo-backed vox-mcp and vox-orchestrator-d open the primary VoxDb via connect_workspace_journey_optional (default .vox/store.db). Env: VOX_WORKSPACE_JOURNEY_STORE, VOX_WORKSPACE_JOURNEY_FALLBACK_CANONICAL (env SSOT). Daemon diagnostics: JSON-RPC method orch.workspace_journey (bind repository_id vs discovered repo).

Bridge / routing policy: Vox-first codegen remains the default MCP path (vox_generate_code, local inference server for vox generate); non-Vox edits stay bounded behind explicit tools and repository policy — see completion policy SSOT.

Journey envelope (v1): contracts/orchestration/journey-envelope.v1.schema.json is the machine SSOT for per-request metadata (journey_id, session_id, thread_id, trace/correlation ids, repository_id, origin_surface). MCP vox_chat_message embeds this shape in structured transcript payloads; CLI and daemon surfaces wire fields incrementally.

Canonical MENS dev journey (Codex): Tables developer_journey_definitions / developer_journey_steps (baseline fragment developer_journeys) seed canonical_journey.v1.greenfield_vox_mens_devloop. MCP vox_journey_canonical_steps returns ordered step_json rows when VoxDb is attached. Human-readable limitation ids for journey maturity live in contracts/journeys/limitations.v1.yaml.

DeI planning on the daemon: JSON-line DeI methods ai.plan.new, ai.plan.replan, ai.plan.status, and ai.plan.execute are handled on the vox-orchestrator-d stdio surface (orch_daemon::dei_dispatch). Persistent plan rows require the same Codex VoxDb handle the orchestrator was built with.

ConcernEmbedded MCP (vox-mcp)vox-orchestrator-d (daemon)VoxDb / Turso
Session chat transcript (RAM)Orchestrator ContextStore in-processSame process model per ADR 022 until RPC parity
Structured chat turnschat_append_workspace_message + journey envelope v1Future orch.* parity for remote clientsconversation_messages, conversations
Legacy chat_transcripts rowsMCP chat path (dual-write)Not primary writer todaychat_transcripts
Workspace journey attach / diagnosticsconnect_workspace_journey_optional, MCP toolingJSON-RPC orch.workspace_journeyjourney + repo bind rows
Routing decisions (routing_decisions)MCP chat / codegen tools; orchestrator AiTaskProcessor when DB attachedSame table when daemon shares DBlocal-first SQLite
Unified routing experiment flagVOX_UNIFIED_ROUTING (telemetry reason shape in vox-actor-runtime::routing_telemetry)

When agents detect ambiguity, they invoke the vox_doubt_task MCP tool. This transitions the task to TaskStatus::Doubted and emits a TaskDoubted event; the resolution agent inside vox-orchestrator (see crates/vox-orchestrator/src/orchestrator/agent/doubt.rs) takes over to resolve the doubt with the user and submits an audit report that hooks into the gamification system (vox-gamify). For structural details, see the canonical HITL & Doubt reference.

  • Repo reconstruction campaigns: JSON Schema contracts/orchestration/repo-reconstruction.schema.json; benchmark tiers and KPI guidance in repo reconstruction benchmark ladder. Remote task envelopes may include optional exec_lease_id and campaign_id for mesh correlation (see ADR 017).
  • Types: vox_orchestrator::contractTaskCapabilityHints, SessionContractEnvelope, OrchestrationMigrationFlags (orchestration_v2_enabled, legacy_orchestration_fallback), MCP ↔ DeI plan tool alignment (MCP_PLAN_TOOL_NAMES, DEI_PLAN_METHODS_NEW_REPLAN_STATUS).
  • Runtime config: vox_orchestrator::OrchestratorConfig — process-wide limits, Socrates gates, scaling knobs, and nested orchestration_migration (OrchestrationMigrationFlags). Loaded from Vox.toml [orchestrator] and VOX_ORCHESTRATOR_* env overrides via OrchestratorConfig::merge_env_overrides in crates/vox-orchestrator/src/config/.

Agent queue capabilities (TaskCapabilityHints)

Section titled “Agent queue capabilities (TaskCapabilityHints)”

On Orchestrator::spawn_agent, each new AgentQueue gets capabilities from merge_agent_capabilities (crates/vox-orchestrator/src/capability_probe.rs):

  1. Start from default_agent_capabilities in config / TOML.
  2. Overlay host probe via probe_host_capabilities: cpu_cores (from available_parallelism), arch (std::env::consts::ARCH), hostname (HOSTNAME / COMPUTERNAME, or sysinfo when built with system-metrics).
  3. Labels: config labels preserved first; probe-supplied labels appended without duplicates.
  4. GPU / NPU flags: operator config wins if already true; otherwise probe may set gpu_cuda when VOX_MESH_ADVERTISE_GPU=1|true (legacy workstation advertisement), or gpu_vulkan / gpu_webgpu / npu from the matching VOX_MESH_ADVERTISE_* vars (not driver probes). Optional VOX_MESH_DEVICE_CLASS fills device_class. See mobile / edge AI SSOT.
  5. min_vram_mb / min_cpu_cores: filled from probe only when unset in config.

Routing reads capability_requirements on tasks and applies GPU / VRAM / min_cpu_cores / prefer_gpu_compute soft penalties in crates/vox-orchestrator/src/services/routing.rs (mens / Mens-style training hints).

When MCP polls GET /v1/populi/nodes, each row becomes a RemotePopuliRoutingHint: if last_seen_unix_ms is older than orchestrator stale_threshold_ms at poll time, heartbeat_stale is set and experimental Populi routing signals skip that node (maintenance / quarantine were already excluded).

Optional VOX_ORCHESTRATOR_MESH_EXEC_LEASE_RECONCILE: same poll tick may call GET /v1/populi/exec/leases and compare each holder_node_id to the fresh node list (tracing target vox.mcp.populi_reconcile; Codex event mesh_exec_lease_reconcile when VOX_MESH_CODEX_TELEMETRY). Opt-in VOX_ORCHESTRATOR_MESH_EXEC_LEASE_AUTO_REVOKE performs POST /v1/populi/admin/exec-lease/revoke on mismatches (mesh/admin token; aggressive — see env SSOT).

See also mens SSOT for VOX_MESH_* and local registry.

Mesh distribution vs single-process embedding

Section titled “Mesh distribution vs single-process embedding”
  • Embedding: Each vox-mcp (or vox dei CLI) process constructs an in-memory Orchestrator. That is “single-process gravity” for RAM-local queues and locks.
  • Distribution: With VOX_MESH_ENABLED, durable coordination (locks, oplog mirror, A2A inboxes, heartbeats) is backed by Turso so another MCP or laptop can participate in the same logical mesh. Two nodes = two orchestrator instances sharing one cross-node SSOT via the DB and HTTP A2A relay — not one magic cluster master in RAM.
  • Bootstrap SSOT: build_repo_scoped_orchestrator and build_repo_scoped_orchestrator_for_repository are the shared factory for MCP, CLI, and other embedders so repository id, affinity groups, and memory shard paths stay aligned.

For table-level detail and conflict rules, see Mens coordination.

The orchestrator intentionally uses more than one delivery plane; these are not interchangeable transports with hidden semantics.

Canonical planeCurrent wire token(s)GuaranteesUse for
local_ephemeralMCP route=localin-process only, best-effort per-receiver FIFO, restart-volatilelow-latency same-node agent coordination
local_durableMCP route=dbdurable row storage, explicit durable ack/poll semanticscross-process local inboxes and persistence-friendly retries
remote_meshMCP route=mesh, Populi HTTP A2AHTTP relay with bearer/JWT auth, explicit inbox lease + ack, client-supplied idempotencycross-node messaging and remote task envelopes
broadcastlocal bus broadcast, bulletin/event fanoutreceiver-local ordering only, no shared durable semanticsfanout notifications
streamDeI JSON lines, vox-orchestrator-d orch.* JSON lines/TCP, MCP WS gateway, SSE, OpenClaw WSordered per connection/byte stream, reconnect semantics vary by transportincremental output and live updates

Machine-readable source of truth for these names lives in contracts/communication/protocol-catalog.yaml. MCP A2A responses surface the canonical plane names in addition to legacy wire tokens so callers can migrate without breaking compatibility.

Boolean fields use Rust bool parsing (true / false only). Invalid values log a warning and leave the current setting unchanged.

VariableMaps to
VOX_ORCHESTRATOR_ENABLEDenabled
VOX_ORCHESTRATOR_MAX_AGENTSmax_agents
VOX_ORCHESTRATOR_LOCK_TIMEOUT_MSlock_timeout_ms
VOX_ORCHESTRATOR_TOESTUB_GATEtoestub_gate
VOX_ORCHESTRATOR_MAX_DEBUG_ITERATIONSmax_debug_iterations
VOX_ORCHESTRATOR_SOCRATES_GATE_SHADOWsocrates_gate_shadow
VOX_ORCHESTRATOR_SOCRATES_GATE_ENFORCEsocrates_gate_enforce
VOX_ORCHESTRATOR_SOCRATES_REPUTATION_ROUTINGsocrates_reputation_routing
VOX_ORCHESTRATOR_SOCRATES_REPUTATION_WEIGHTsocrates_reputation_weight
VOX_ORCHESTRATOR_TRUST_GATE_RELAX_ENABLEDtrust_gate_relax_enabled — when true and Codex agent_reliability for the agent is ≥ trust_gate_relax_min_reliability, Socrates enforce, completion grounding enforce, and strict scope may skip completion requeue / enqueue denial (see PolicyTrustRelax).
VOX_ORCHESTRATOR_TRUST_GATE_RELAX_MIN_RELIABILITYtrust_gate_relax_min_reliability — minimum reliability (default 0.85, aligned with trust auto-approve floor).
VOX_ORCHESTRATOR_ATTENTION_ENABLED / VOX_ORCHESTRATOR_ATTENTION_BUDGET_MS / VOX_ORCHESTRATOR_ATTENTION_ALERT_THRESHOLD / VOX_ORCHESTRATOR_ATTENTION_INTERRUPT_COST_MS / VOX_ORCHESTRATOR_ATTENTION_TRUST_ROUTING_WEIGHTPilot attention budget + dynamic interruption gating (see information-theoretic-questioning.md, env-vars.md). Vox.toml also supports [orchestrator].interruption_calibration for per-channel gain offsets and backlog/trust calibration.
VOX_ORCHESTRATOR_LOG_LEVELlog_level (raw string)
VOX_ORCHESTRATOR_FALLBACK_SINGLEfallback_to_single_agent
VOX_ORCHESTRATOR_MIN_AGENTSmin_agents
VOX_ORCHESTRATOR_SCALING_THRESHOLDscaling_threshold
VOX_ORCHESTRATOR_IDLE_RETIREMENT_MSidle_retirement_ms
VOX_ORCHESTRATOR_SCALING_ENABLEDscaling_enabled
VOX_ORCHESTRATOR_COST_PREFERENCEcost_preference (performance | economy)
VOX_ORCHESTRATOR_SCALING_LOOKBACKscaling_lookback_ticks
VOX_ORCHESTRATOR_RESOURCE_WEIGHTresource_weight
VOX_ORCHESTRATOR_RESOURCE_CPU_MULTresource_cpu_multiplier
VOX_ORCHESTRATOR_RESOURCE_MEM_MULTresource_mem_multiplier
VOX_ORCHESTRATOR_RESOURCE_EXPONENTresource_exponent
VOX_ORCHESTRATOR_SCALING_PROFILEscaling_profile (conservative | balanced | aggressive)
VOX_ORCHESTRATOR_MAX_SPAWN_PER_TICKmax_spawn_per_tick
VOX_ORCHESTRATOR_SCALING_COOLDOWN_MSscaling_cooldown_ms
VOX_ORCHESTRATOR_URGENT_REBALANCE_THRESHOLDurgent_rebalance_threshold
VOX_ORCHESTRATOR_MIGRATION_V2_ENABLEDorchestration_migration.orchestration_v2_enabled
VOX_ORCHESTRATOR_MIGRATION_LEGACY_FALLBACKorchestration_migration.legacy_orchestration_fallback
VOX_ORCHESTRATOR_MESH_CONTROL_URLpopuli_control_url — HTTP base for GET /v1/populi/nodes (read-only); MCP vox_orchestrator_status includes mesh_snapshot JSON when set. Uses VOX_MESH_TOKEN on the client when present. Does not change task routing.
VOX_ORCHESTRATOR_MESH_REMOTE_EXECUTE_EXPERIMENTALpopuli_remote_execute_experimental (TOML alias: mesh_remote_execute_experimental) — enables staged rollout for remote task-envelope dispatch over populi A2A relay (with local fallback).
VOX_ORCHESTRATOR_MESH_REMOTE_LEASE_GATING_ENABLEDpopuli_remote_lease_gating_enabled (TOML: mesh_remote_lease_gating_enabled) — when true with matching roles, relay is awaited before local enqueue; success puts the task in remote-hold (single owner, no local dequeue). Relay failure deterministically falls back to local queue only (no fire-and-forget duplicate relay).
VOX_ORCHESTRATOR_MESH_REMOTE_LEASE_GATED_ROLESpopuli_remote_lease_gated_roles — comma-separated planner, builder, verifier, reproducer, researcher (case-insensitive). Empty list means no task matches gating.
VOX_ORCHESTRATOR_MESH_REMOTE_RESULT_POLL_INTERVAL_SECSpopuli_remote_result_poll_interval_secs (TOML alias: mesh_remote_result_poll_interval_secs) — remote_task_result inbox poll interval in seconds; 0 disables. Implemented in vox_orchestrator::a2a::spawn_populi_remote_result_poller (MCP and other embedders pass a join slot).
VOX_ORCHESTRATOR_MESH_REMOTE_WORKER_POLL_INTERVAL_SECSpopuli_remote_worker_poll_interval_secs (TOML alias: mesh_remote_worker_poll_interval_secs) — remote_task_envelope worker poll interval in seconds; 0 disables remote worker consumption while keeping result polling optional. Implemented in vox_orchestrator::a2a::spawn_populi_remote_worker_poller.
VOX_ORCHESTRATOR_MESH_REMOTE_RESULT_MAX_MESSAGES_PER_POLLpopuli_remote_result_max_messages_per_pollper-page size when draining the parent mesh inbox for remote_task_result rows (minimum 1; default 64). The poller walks cursor pages (before_message_id, newest-first) up to a fixed cap so deep inboxes do not hide older results behind unrelated A2A mail.

Populi client helpers now expose typed HTTP status errors (PopuliRegistryError::HttpStatus) and non-claimer inbox cursor paging (before_message_id, plus A2AInboxPager), so orchestrator fallback logic can branch on status codes (403/404/409) without brittle string matching.

Implemented: task_id, agent_id, and placement_reason are emitted as structured fields on the terminal tracing::info! event in task_dispatch/submit/task_submit.rs for every task routing decision. lease_id is carried in PopuliRemoteDelegate on the task record and forwarded in the A2A envelope when a lease is active.

Field / conceptPurpose
task_idCorrelate orchestrator task lifecycle across logs and traces.
lease_idCarried in PopuliRemoteDelegate.exec_lease_id on lease-held tasks; propagated in RemoteTaskEnvelope and ExecLeaseGrantResponse.
placement_reasonMachine-readable code for the selected execution surface (local vs lease-remote); emitted on every routing decision.
populi_node_id / claimer_node_idMesh identity for inbox claims and execution attribution where applicable.

Stable placement_reason codes:

  • local_queue_default
  • populi_remote_lease_hold
  • local_queue_fallback_after_remote_relay_error
  • local_queue_fallback_insufficient_vram — no registered node meets the task’s min_vram_mb requirement

Rollout and kill switches: Populi remote execution rollout checklist. Work-type boundaries: placement policy matrix.

Canonical descriptions for VOX_BENCHMARK_TELEMETRY / VOX_SYNTAX_K_TELEMETRY (and related Codex row shapes) live in env-vars.md. Trust boundaries for optional telemetry: telemetry-trust-ssot.

VariablePurpose
VOX_BENCHMARK_TELEMETRYWhen 1 / true, CLI benchmark entry points append benchmark_event rows via VoxDb::record_benchmark_event.
VOX_SYNTAX_K_TELEMETRYWhen 1 / true, syntax-K benchmark classes append syntax_k_event rows via VoxDb::record_syntax_k_event (session syntaxk:<repository_id>). If unset, falls back to VOX_BENCHMARK_TELEMETRY.
VOX_WORKFLOW_JOURNAL_CODEX_OFFWhen 1 / true, skip Codex append for interpreted workflow journal rows. By default, when DB config resolves after vox workflow run / vox mens workflow run ( workflow-runtime ), Vox appends versioned workflow journal rows via VoxDb::record_workflow_journal_entry (session workflow:<repository_id>, metric workflow_journal_entry). Rows can include lifecycle events, retry events (ActivityAttemptRecovered, ActivityAttemptFailed, ActivityRetryScheduled), replay events, and per-step payloads (for example MeshActivity / MeshActivitySkipped) keyed by durable run_id + activity_id semantics described in durable execution.
VOX_MESH_MAX_STALE_MSClient-side filter for mens node lists in MCP snapshots (see mens SSOT).
VOX_MESH_CODEX_TELEMETRYWhen 1 / true, append populi_control_event rows via VoxDb::record_populi_control_event (session mens:<repository_id>): after vox run local registry publish when the CLI was built with populi (includes vox-populi), after vox-mcp startup publish when mens is enabled, and after MCP vox_orchestrator_status mens HTTP snapshot when Codex is connected. Implementation: vox_db::populi_registry_telemetry. Never stores VOX_MESH_TOKEN.
VOX_MCP_LLM_COST_EVENTSOptional override for MCP LLM CostIncurred bus events vs Codex-only accounting; see Model Routing.
VOX_REPOSITORY_ROOTOptional directory for repository_id discovery in benchmark telemetry (and other CLI paths that adopt the same pattern); align with MCP’s discovered repo root when subprocess CWD differs.

TOML: under [orchestrator], set orchestration_migration = { orchestration_v2_enabled = true, … } (field names match OrchestrationMigrationFlags in crates/vox-orchestrator/src/contract.rs). When v2 is enabled, MCP vox_submit_task success JSON may include orchestration_contract { "v2" as a client hint.

Optional [mens] in Vox.toml merges mens scope/URL/labels for CLI and MCP (see mens SSOT); env wins per field when set.

Effective Socrates thresholds still merge from vox-socrates-policy with optional overrides in OrchestratorConfig::socrates_policy — no literal drift outside the policy crate + merge logic.

Deprecation / compatibility matrix (current)

Section titled “Deprecation / compatibility matrix (current)”
SurfaceRule
MCP tool namesAdd aliases before removing names; vox_plan, vox_replan, vox_plan_status stay stable.
DeI RPC idsai.plan.* method strings unchanged (vox_cli::dei_daemon::method).
Orchestrator daemon RPC idsorch.* method strings are versioned in vox_protocol::orch_daemon_method; contract schema contracts/orchestration/orch-daemon-rpc-methods.schema.json.
File sessions + CodexBoth remain valid; MCP SessionManager uses with_db when Codex is attached.
vox dbRemains implementation SSOT; vox scientia is a documented facade only.