Model Orchestration SSOT — Audit & Convergence Plan (2026-04-20)
Model Orchestration SSOT — Audit & Convergence Plan
Section titled “Model Orchestration SSOT — Audit & Convergence Plan”Scope. MENS (local), Populi (GPU mesh), OpenRouter, direct-provider cloud backends (Anthropic, Google, Groq, DeepSeek, Cerebras, Mistral, SambaNova, HuggingFace), plus the vox-secrets secret plane that feeds all of them. This document lists what exists today, where it drifts, and exactly what to change, file-by-file.
How to read this. Every “FIX” item below is a mechanical operation keyed to a file path (and line numbers where stable). Each item can be handed to an agent with no further context.
Part 1 — Executive summary
Section titled “Part 1 — Executive summary”What is good today.
vox-orchestrator::models::ModelRegistryis the one selector used across the workspace (crates/vox-orchestrator/src/models/registry.rs:14). All task-to-model decisions flow throughbest_for()/best_for_task().vox-secretsis a credible secret plane with a documented resolver chain,doctor,parity, andsecret-env-guard(crates/vox-secrets/src/resolver.rs:1,crates/vox-cli/src/commands/diagnostics/doctor/checks_standard/secrets.rs).- A live catalog refresh against
https://openrouter.ai/api/v1/modelsalready exists with a min-interval and jitter guard (crates/vox-orchestrator/src/catalog.rs:200;crates/vox-orchestrator/src/models/registry.rs:49). - Telemetry lands in a typed
llm_interactionstable (v59) with a validation contract includingcontext_utilization_pctandcache_read_tokens. - Observed pricing is stabilized in
model_pricing_catalog(v59), enabling empirical cost overrides for routing decisions. - Mesh-node identity uses Ed25519 with challenge/response (
crates/vox-identity/src/identity.rs:20) and mesh bearer auth uses constant-time compare (crates/vox-populi/src/transport/auth.rs:5). .voxignoreis the declared SSOT for AI context exclusion (AGENTS.md:37).
What is broken or drifting.
- Model-selection logic is split across 5 crates with two different
ModelTierenums (crates/vox-orchestrator/src/models/spec.rs:14,crates/vox-orchestrator/src/models/routing_table.rs:6) and two differentChatRouteBackend-like enums (vox-orchestratorvsvox-actor-runtime/src/model_resolution.rs:22). - Strength tags are free-form strings materialized from three independent heuristics (
spec.rs:230,catalog.rs:107,routing_table.rs:30). No enum, no parity check. - 10 model IDs are hardcoded as defaults in
spec.rs:273-482with their own cost/context data, duplicating whatever OpenRouter returns live. Drift is silent. - No model scoreboard.
eval_runsandllm_feedbackexist but are never aggregated per(model_id, task_category)and never fed back tobest_for(). - No distributed trace ID.
journey_id,session_id,run_idare local to each subsystem; there is no OpenTelemetry-style GenAI span withgen_ai.request.model,gen_ai.usage.input_tokens, etc. - Automatic model discovery is one shot per process start. No scheduled nightly refresh; no Populi mesh catalog aggregation; no HF Hub auto-registration into the routing registry.
- Direct env reads for secret-ish values leak outside vox-secrets. Confirmed violation in
crates/vox-schola/src/curator.rs(OPENAI_API_KEY) and suspected drift forTOGETHER_FINETUNE_MODEL(crates/vox-cli/src/commands/ai/train.rs),GEMINI_DIRECT_MODEL/OPENROUTER_GEMINI_MODEL(crates/vox-config/src/routing_policy.rs). - No cross-node secret sync.
A2ADeliverRequest.jwe_payloadis plumbed but never populated (crates/vox-populi/src/transport/mod.rs:76).vox-cryptohas ChaCha20-Poly1305 and Ed25519 but no X25519 KEM for wrapping secrets to another node. - No device-pairing flow. A user with 3 mesh nodes must install
OPENROUTER_API_KEYthree times by hand. - Retired-surface drift.
vox_dei::model_routeis still used as thetracingtarget incrates/vox-actor-runtime/src/model_resolution.rs:183-246(harmless in theory, but violates the retired-symbol policy inAGENTS.md:140).
What this document proposes.
- Elevate
crates/vox-orchestrator/src/models/to the single-source-of-truth crate for everything routing-related. Move, delete, or alias anything that currently duplicates it. - Declare
contracts/orchestration/model-routing.v1.yamlas the machine-readable SSOT for task-→-strength mapping, tier definitions, scoring weights, and fallback chains. Generate Rust enums from it. - Declare
contracts/orchestration/model-telemetry.v1.yamlaligned with OpenTelemetry GenAI semconv v1.37 (gen_ai.*). Every LLM call on every provider emits a span with the same attribute names. - Build a
ModelScoreboardtable keyed by(model_id, task_category, strength_tag)populated fromeval_runs+llm_feedback. Makebest_for()read it. - Add
vox secrets syncwith X25519-sealed-box pairing so secrets installed on one mesh node propagate to a user’s other nodes without re-entry. - Add
vox mens models discoverandvox populi models inventoryscheduled jobs so MENS checkpoints and mesh-node capabilities register into the routing catalog automatically.
Part 2 — Proposed SSOT layout
Section titled “Part 2 — Proposed SSOT layout”This is what “converged” looks like. Every bullet below is also a “FIX” in Part 3.
2.1 File authority map (post-convergence)
Section titled “2.1 File authority map (post-convergence)”| Concern | SSOT file | Consumers read via |
|---|---|---|
| Model spec, capabilities, pricing | contracts/orchestration/model-catalog.v1.json (generated nightly from live OpenRouter + HF Hub + Ollama + Populi mesh) | vox_orchestrator::models::Registry::load() |
| Task-category → strength mapping, preferred tier, context floor | contracts/orchestration/model-routing.v1.yaml | codegen → crates/vox-orchestrator/src/models/routing_table.rs (generated) |
| Scoring weights (efficiency/precision/latency/availability/balance/mobile) | contracts/orchestration/model-routing.v1.yaml [scoring] | crates/vox-orchestrator/src/models/scoring.rs |
| Provider enum, secret-id mapping | contracts/orchestration/providers.v1.yaml | codegen → crates/vox-orchestrator/src/models/spec.rs::ProviderType, crates/vox-orchestrator/src/models/key_guard.rs |
| Telemetry event attributes (GenAI) | contracts/orchestration/model-telemetry.v1.yaml (mirrors OTel GenAI semconv v1.37) | crates/vox-actor-runtime/src/routing_telemetry.rs, crates/vox-db/src/research_metrics_contract.rs |
| Secrets & env var names | crates/vox-secrets/src/spec/** (unchanged authority) | vox_secrets::resolve_secret(...) |
| Env-variable allowlist (non-secret tuning) | crates/vox-secrets/src/lib.rs::OPERATOR_TUNING_ENVS (extend) | secret-env-guard |
.voxignore derived ignore files | .voxignore (unchanged) | vox ci sync-ignore-files |
2.2 The single ModelCatalogEntry schema (proposal)
Section titled “2.2 The single ModelCatalogEntry schema (proposal)”# contracts/orchestration/model-catalog.v1.json — one entrymodel_id: "anthropic/claude-sonnet-4.6"family: "anthropic"revision: "4.6"provider_route: primary: "OpenRouter" # one of providers.v1.yaml enum fallback: ["Anthropic"]context_length_tokens: 200000input_modalities: ["text", "image"]output_modalities: ["text"]pricing: input_per_1k: 3.00 output_per_1k: 15.00 cache_read_per_1k: 0.30 # if provider reports it unit: "USD"supports: tools: true json_mode: true streaming: true reasoning: falsestrengths: ["codegen", "review", "debugging", "security"] # from enum in model-routing.v1.yamltier: "Pro"availability: openrouter_uptime_30d: 0.993 # from OpenRouter endpoints API measured_p50_ms: 1820 # from our own eval_runs measured_p99_ms: 7700scoreboard: codegen_success_rate_30d: 0.86 review_success_rate_30d: 0.91 last_scored_at: "2026-04-19T04:00:00Z"discovered_from: "openrouter-v1-catalog@2026-04-20T00:00:00Z"The only hand-maintained file after convergence is contracts/orchestration/model-routing.v1.yaml (strength enum, task→strength table, scoring weights, tier definitions, hard overrides). Everything else regenerates.
2.3 User-visible single login → mesh-wide secrets (proposal)
Section titled “2.3 User-visible single login → mesh-wide secrets (proposal)”┌──────────────┐ ┌─────────────────────┐ ┌──────────────┐│ Node A (desk)│ │ SecretsSync gossip │ │ Node B (laptop)│ identity (Ed │─────>│ over mesh │<──────│ identity (Ed)││ 25519 pair) │ │ - pairing → trust │ │ │└──────┬───────┘ │ - X25519 KEM wrap │ └──────┬───────┘ │ │ of secret value │ │ │ │ - Ed25519 sig on │ │ │ │ wrapped envelope │ │ │ └─────────────────────┘ │ │ │ v v vox-secrets local vault (ChaCha20-Poly1305 KDF from vox-secrets local vault user-pairing passphrase or OS keyring) (same)vox secrets pairon Node A prints a one-time QR / 5-word pairing code.vox secrets pair --accept <code>on Node B performs X25519 ECDH, attests via Ed25519, enrolls intoTrustedNodeRegistry(crates/vox-identity/src/storage.rs).vox secrets syncpushes everyshareable=truesecret (inSecretSpec) to every trusted peer, wrapped with the peer’s X25519 public key, signed with the sender’s Ed25519 private key, delivered over Populi’s A2A channel.- No secret value ever leaves the user’s mesh.
- Operators opt a secret out by setting
shareable: falsein the spec (applies by default to registry/local-only things likeVOX_IDENTITY_KEY_PATH).
Part 3 — Backlog (~70 numbered improvements)
Section titled “Part 3 — Backlog (~70 numbered improvements)”Every item starts with FIX-NN. When executing, treat title, problem, operation, and success criteria as a self-contained ticket.
A. Single source of truth — data model & codegen
Section titled “A. Single source of truth — data model & codegen”FIX-01. [FIXED] Define contracts/orchestration/model-routing.v1.yaml as the routing SSOT.
- Problem. Routing table, scoring weights, and tier enum live as hand-edited Rust in
crates/vox-orchestrator/src/models/routing_table.rs:30-122,.../scoring.rs:6-31,.../spec.rs:14-24. - Operation. Create
contracts/orchestration/model-routing.v1.yamlwith top-level keysschema_version,tiers[],strengths[],task_categories[],scoring.weights,scoring.latency_bands,premium_alias{},economy_cost_ceiling_usd_per_1k. - Success. File exists, JSON-Schema-validates against a new
contracts/orchestration/model-routing.v1.schema.json. CI check added incrates/vox-cli/src/commands/ci/run_body_helpers/under a newrouting-ssot-validateguard.
FIX-02. [FIXED] Codegen ModelTier, StrengthTag, TaskCategory from the YAML.
- Problem. Two
ModelTierenums exist (spec.rs:14,routing_table.rs:6). Strength tags are free-form strings with no enum.TaskCategoryis defined incrates/vox-orchestrator/src/types/tasks.rsindependently of the routing table. - Operation. Introduce
crates/vox-orchestrator/build.rsthat readscontracts/orchestration/model-routing.v1.yamland emitssrc/models/generated.rscontaining enums. Deleterouting_table.rs::ModelTier(FIX-02a) and replacespec.rs::ModelTierimports with the generated one. - Success.
cargo build -p vox-orchestratorregenerates on YAML change.rg "enum ModelTier"returns one hit.
FIX-03. [FIXED] Replace the infer_strengths() triple-path with a single table.
- Problem. Three independent heuristics: parameter-graph (
catalog.rs:118-142), provider family (catalog.rs:143-148andspec.rs:230-255), name heuristic (catalog.rs:151-188). - Operation. In the YAML add a
strength_inferencesection with ordered rules (parameter_graph / provider_family / name_regex / default). Generateinfer_strengths(entry) -> Vec<StrengthTag>from it. Delete the duplicate inspec.rs:230-255. - Success.
rg 'fn infer_strengths|fn provider_family_strengths'shows exactly one definition (in generated code).
FIX-04. [FIXED] Declare contracts/orchestration/providers.v1.yaml and regenerate ProviderType + key_guard.
- Problem.
ProviderTypeenum is hardcoded (spec.rs:80-106).provider_secret_is_available()is hand-mapped (key_guard.rs:8-27). - Operation. New YAML: for each provider
{name, base_url, secret_id, supports_openai_compat, default_route_kind, fallback_kind}. Codegen both. - Success. Adding a new provider is a YAML edit only.
FIX-05. [FIXED] Declare contracts/orchestration/model-catalog.v1.json as the runtime catalog.
- Problem. 10 models are baked into
spec.rs:273-482as fallback. Live OpenRouter data is merged at runtime but never persisted; restart loses it. Two sources of truth coexist silently. - Operation. Move the 10 bootstrap entries into
contracts/orchestration/model-catalog.bootstrap.v1.json. At runtime,Registry::load()reads bootstrap, then merges from~/.vox/cache/model-catalog.v1.json(persisted by the discovery job — FIX-30). Delete the literalModelSpec::new(...)calls atspec.rs:301, 318, 335, 353, 370, 387, 405, 428, 447. - Success.
rg 'ModelSpec::new\(' crates/vox-orchestratorreturns zero hits; bootstrap lives in JSON; cache auto-refreshes.
FIX-06. [FIXED] Delete the duplicate ChatRouteBackend in vox-actor-runtime.
- Problem.
crates/vox-actor-runtime/src/model_resolution.rs:22-32redefinesChatRouteBackend;vox-orchestrator/src/models/spec.rs::ProviderTypeis the canonical one. Intentional decoupling exists to avoid a cycle but produces drift. - Operation. Extract
ProviderType,ChatRouteBackend,ChatProviderRouteKindinto a new tiny leaf cratecrates/vox-orchestrator-types/(generated fromproviders.v1.yaml). Bothvox-orchestratorandvox-actor-runtimedepend on it; cycle broken. - Success.
rg 'enum ChatRouteBackend|pub enum ProviderType' crates/returns exactly one hit each.
FIX-07. [FIXED] Kill vox_dei::model_route tracing targets.
- Problem. Retired crate name still appears as
tracingspan target atcrates/vox-actor-runtime/src/model_resolution.rs:183,203,219,232,246. ViolatesAGENTS.md:140and confuses log aggregation. - Operation. Replace
target: "vox_dei::model_route"withtarget: "vox_orchestrator::model_route". Add a lint invox ci runguards to fail ifvox_dei::appears anywhere outside comments or tombstone archive. - Success.
rg '"vox_dei::'returns zero code hits.
FIX-08. [FIXED] Resolve the ModelSpec vs. ModelRegistryEntry vs. ModelCatalogEntry name collision.
- Problem. Three structs (
spec.rs::ModelSpec,vox-actor-runtime/src/llm/types.rs::ModelRegistryEntry, proposedModelCatalogEntry) will exist simultaneously during migration. - Operation. Keep
ModelCatalogEntryas the wire/file type, haveModelSpecderiveFrom<&ModelCatalogEntry>, then removeModelRegistryEntryby inlining its two useful fields intoModelSpec. - Success. Two structs remain (
ModelCatalogEntryfor serde,ModelSpecfor in-memory).
B. Intelligent selection — scoring, feedback, and self-tuning
Section titled “B. Intelligent selection — scoring, feedback, and self-tuning”FIX-09. [FIXED] Add ModelScoreboard table and record_llm_outcome() helper.
- Problem. We store per-call latency and tokens, but never roll up per
(model_id, task_category, strength_tag).best_for()selects purely on strength + cost (crates/vox-orchestrator/src/models/registry.rs:240-276). - Operation. New SQL migration under
crates/vox-db/src/schema/domains/scientia.rsaddingmodel_scoreboardwith columns(model_id, task_category, strength_tag, window_days, n_calls, success_rate, p50_latency_ms, p99_latency_ms, cost_per_success_usd, quality_score, updated_at). Helpervox_db::record_llm_outcome(ModelOutcome)writes to bothllm_interactionsand an aggregation buffer. Nightly job (FIX-31) recomputes windows using true P50/P99 window functions. - Success.
SELECT * FROM model_scoreboardreturns rows;cargo test -p vox-db model_scoreboardgreen. P50/P99 verified with window functions.
FIX-10. [FIXED] Make best_for() read the scoreboard when available.
- Problem. Selection is cost-first, not evidence-first (
registry.rs:265-267). - Operation. Inject
Option<&ModelScoreboard>intobest_for(). When present andn_calls >= 30, multiply the candidate cost by(1 - quality_score)before the cost sort. When absent or warming up, fall back to current behavior. Add--force-costand--force-modelCLI flags. - Success. Unit test shows a historically-bad-at-codegen cheap model loses to a proven model;
vox config routing explain --task codegenprints the ranking.
FIX-11. [FIXED] Reconcile llm_interactions and llm_feedback joins.
- Problem. Aggregation can inflate
n_callsif one interaction has multiple feedbacks. - Operation. Use a subquery/CTE for
llm_feedbackto ensure 1:1 join in the rollup query. Oninsert_gamify_ai_feedback(), also update allm_outcome_hintstable with(interaction_id, signed_score)where thumbs_up=+1 / thumbs_down=-1. The nightly aggregator joins this intomodel_scoreboard.quality_score. - Success.
n_callsin scoreboard matchesCOUNT(DISTINCT interaction_id). A thumbs-down lowers that model’s score visible viavox model scoreboard show.
FIX-12. [FIXED] Wire RiskDecision::Abstain back into routing.
- Problem.
SocratesSurfaceTelemetry.risk_decisionrecords abstain events (crates/vox-db/src/socrates_telemetry.rs:142) but never feeds re-selection. - Operation. On
Abstainwithconfidence_estimate < 0.5, orchestrator marks the(model_id, task_category)in a short-lived in-memory penalty map (10-minute TTL).best_for()skips penalized entries unless they are the only option. Penalty map is persisted asmodel_penalty_eventsfor audit. - Success. Forcing abstain in tests causes the next invocation to pick a different model.
FIX-13. [DONE] Emit gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.finish_reasons on every LLM call.
- Problem. No OpenTelemetry GenAI semconv emitted.
llm_interactions.token_countwas one integer, conflating input and output. - Operation. In
crates/vox-actor-runtime/src/llm/types.rs::ModelMetric::from_response, populate the six required GenAI span attributes per OTel GenAI v1.37. Extendedllm_interactionsto accept detailed metrics. Splittoken_countintoinput_tokensandoutput_tokenscolumns. - Success.
llm_interactions(v59) rows containgen_ai-aligned attributes for every call.
FIX-14. [DONE] Add a trace ID that follows user-request → orchestrator → provider.
- Problem.
journey_id,session_id,run_idare subsystem-local; no causal chain. - Operation. Generate a single
trace_id(UUIDv7) at the top ofvox_orchestrator::handle_task(). Propagate viaAgentTask.trace_id; include in every telemetry row; set outbound HTTPtraceparentheader (OTel W3C) incrates/vox-actor-runtime/src/httpfor OpenRouter / Anthropic / Google calls. - Success.
SELECT trace_id, count(*) FROM research_metrics WHERE trace_id IS NOT NULL GROUP BY 1shows every user turn as one trace.
FIX-75. [DONE] Implement observed-cost feedback loop via model_pricing_catalog.
- Problem. Model pricing in
ModelSpecwas static and hardcoded, leading to stale budget estimates and inaccurate routing. - Operation. Created
model_pricing_catalog(v59) to track ground-truth costs derived from provider-reported usage. Implemented a nightly rollup that calculatesobserved_blended_per_1kfrom telemetry. Injected these observed prices into theModelRegistryat runtime when sample confidence is high (>= 100 samples) or medium (>= 20 samples). - Success.
vox model pricing showdisplays real-time price corrections, andModelRegistryautomatically adjusts routing for providers that change their pricing dynamically.
FIX-15. [FIXED] Track context-window utilization per call.
- Problem. We know
ModelSpec.context_length_tokensand per-call tokens but never storeutilization = (input+output)/context. - Operation. Add
context_utilization_pctcolumn tollm_interactions; compute inModelMetric::from_response. When utilization > 0.8 for a(model_id, task_category)three times in a window, escalate selection to the next-larger context tier (planning hint intobest_for_task()). - Success. Scoreboard reports utilization; escalation occurs in tests.
FIX-50. [FIXED] Track task complexity clamping.
- Operation. In
AgentTask::complexity(), clampestimated_complexityto 1-10. - Success. Complexity is always in bounds for routing table lookup.
FIX-51. [FIXED] Support model preference hint.
- Operation. Add
model_preference: Option<String>toAgentTask. - Success. Users can suggest specific families or aliases.
FIX-52. [FIXED] Implement research intent routing.
- Operation. If
research_hintsare present, forceTaskCategory::Research. - Success. Search-heavy tasks pick reasoning-capable models.
FIX-53. [FIXED] Implement tool-hint floor.
- Operation. If
tool_hints.len() >= 2, forcecomplexity >= 7. - Success. Multi-tool tasks skip lightweight models.
FIX-16. [DONE] Track retry / fallback chains.
- Problem. Only the final result lands in
llm_interactions; retries are invisible. - Operation. New table
llm_attemptwith(trace_id, attempt_number, model_id, provider, outcome, latency_ms, error_class);llm_interactionsretains one row per final outcome.vox-actor-runtimewritesllm_attemptrows during its fallback loop (crates/vox-actor-runtime/src/model_resolution.rs:162). - Success. A forced OpenRouter 5xx triggers a row with attempt_number=1 (failed) and a row in
llm_interactionsreferencing the successful retry.
FIX-17. [DONE] Track OpenRouter cache-hit savings.
- Problem. OpenRouter returns
cache_tokensin pricing (crates/vox-orchestrator/src/catalog.rs:60) but we never persist hits per call. - Operation. Parse
usage.cache_creation_input_tokens,usage.cache_read_input_tokensfrom OpenRouter and Anthropic responses; store inllm_interactions.cache_read_tokens. Addcache_savings_usdto scoreboard computation. - Success.
vox model scoreboard show anthropic/claude-sonnet-4.6 --with-cacheprints non-zero savings when prompt-caching is active.
FIX-18. [DONE] Add budget-pre-check to best_for().
- Problem.
scoring.rs:196-198scores down rate-limited models but does not gate by explicit budget. - Operation. Add
AgentTask.budget: Option<Budget{ max_cost_usd, max_latency_ms }>. Inbest_for(), after sorting, drop candidates whoseexpected_cost > budget.max_cost_usd.expected_cost = spec.cost_per_1k * estimated_token_count(task). Reuseestimated_token_counthelper (already exists nearscoring.rs). - Success.
vox chat --max-usd 0.10never picks claude-mythos-preview.
FIX-19. [DONE] Normalize strength tags to the enum at ingestion.
- Problem.
catalog.rs::infer_strengths()returnsVec<String>; consumers match on exact string. - Operation. Return
Vec<StrengthTag>(generated enum from FIX-02). Any unknown inference result maps toStrengthTag::Unknown, logged once per unique string viatracing::warn!. - Success.
rg '"codegen"|"logic"|"review"' crates/vox-orchestrator/src | wc -ldrops by ~80% (becomesStrengthTag::Codegenetc.).
FIX-20. [DONE] Publish vox model explain CLI.
- Problem. No way for a user to see why a given model was picked.
- Operation.
vox model explain "<task description>" [--category codegen]prints: (a) matched strength, (b) tier, (c) ranked candidate list with per-criterion scores, (d) final pick, (e) trace_id of the most recent real call for the same category. Lives atcrates/vox-cli/src/commands/model/explain.rs. - Success. Command exists; regression test in
crates/vox-cli/tests/asserts the top candidate forcodegenatcomplexity=9matches the premium alias.
C. Automatic model discovery
Section titled “C. Automatic model discovery”FIX-21. [DONE] Introduce the ModelCatalog trait as the discovery plugin surface.
- Problem.
ModelCatalogexists as a trait today but onlyOpenRouterCatalogimplements it; no enumeration, no plugin registry. - Operation. In
crates/vox-orchestrator/src/catalog.rs, keeptrait ModelCatalog { async fn refresh(&self) -> Result<Vec<ModelCatalogEntry>>; fn name(&self) -> &'static str; }. AddCatalogRegistry { sources: Vec<Box<dyn ModelCatalog>> }. Register sources in one place:CatalogRegistry::default_sources(). - Success. Adding a new source (e.g.,
GroqCatalog) is a singleregister(Box::new(GroqCatalog::new()))call.
FIX-22. [DONE] Add OllamaCatalog.
- Problem. Local Ollama models are resolved ad-hoc via
VoxPopuliModelsecret (model_resolution.rs:136). No catalog entry exists. - Operation. New
crates/vox-orchestrator/src/catalog/ollama.rsthat callsGET {OLLAMA_URL}/api/tags, parsesmodels[], maps each to aModelCatalogEntrywithprovider_route.primary = Ollama,strengths = ["generalist"],cost_per_1k = 0.0, andcontext_length_tokensparsed from/api/show. - Success. After
ollama pull llama3.2,vox model list --source Ollamashows it.
FIX-23. [DONE] Add HuggingFaceCatalog.
- Problem.
fetch_hf_hub_text_generation_models()(crates/vox-actor-runtime/src/inference_env.rs) fetches models for display but never writes them to the registry. - Operation. New
crates/vox-orchestrator/src/catalog/hf_hub.rs. Pages through/api/models?filter=text-generation&sort=downloads&direction=-1&limit=200. Marks entriesprovider_route.primary = HuggingFaceRouter. Stores a fingerprint to avoid full re-ingest. - Success.
vox model list --source HuggingFaceRouter | headshows top 20 most-popular HF text-gen models; dedup works across refreshes.
FIX-24. [DONE] Add PopuliMeshCatalog.
- Problem. Each mesh node advertises capability hints (
VOX_MESH_ADVERTISE_GPU, etc., viaPopuliEnv) but there is no aggregate view of “what models can my mesh serve right now.” - Operation. New endpoint
GET /v1/populi/modelson the Populi control plane returns the union of each peer’s~/.vox/cache/mens/local-registry.json.PopuliMeshCatalogcalls it. Each entry getsprovider_route.primary = PopuliMesh,node_id,labels. - Success. After a second node joins the mesh and MENS finishes training on node A,
vox model list --source PopuliMeshon node B shows the new checkpoint.
FIX-25. [DONE] Add MensCatalog for local MENS checkpoints.
- Problem.
mens/runs/<run_id>/training_manifest.jsonexists but is never ingested into the routing registry (crates/vox-populi/). - Operation. New
crates/vox-orchestrator/src/catalog/mens.rs(or integrated incatalog.rs) that globsmens/runs/*/training_manifest.json, parses each, emitsModelCatalogEntry. - Success. A newly-trained MENS checkpoint appears in
vox model list --source Mensafter runningvox model discover.
FIX-26. [DONE] Add AnthropicDirectCatalog and GoogleDirectCatalog.
- Problem. Direct provider calls use hardcoded model IDs (
spec.rs:447forclaude-mythos-preview-20260407); no auto-refresh of Anthropic’s model list. - Operation. Implement
AnthropicDirectCataloghittinghttps://api.anthropic.com/v1/models(usesAnthropicApiKey). ImplementGoogleDirectCataloghittinghttps://generativelanguage.googleapis.com/v1beta/models(usesGeminiApiKey). Both emitModelCatalogEntrywith pricing from their known tables (kept incontracts/orchestration/provider-pricing-overlay.v1.yamlbecause Anthropic & Google don’t publish per-model pricing via their models API). - Success. New Anthropic model launched by vendor shows up after nightly refresh without code change to
spec.rs.
FIX-27. [DONE] Persist discovery results to disk.
- Problem. Catalog refresh lives only in memory; restart = re-fetch = rate limit risk.
- Operation.
CatalogRegistry::refresh()writes every source’s output to~/.vox/cache/model-catalog.v1.json(atomic rename).Registry::load()reads it at startup. TTL embedded per-source. - Success. Second run within TTL emits zero network traffic for discovery.
FIX-28. Throttle and jitter discovery.
- Problem. The current jitter is OpenRouter-only (
registry.rs:41-47). - Operation. Move
min_refresh_interval_secsandjitter_msintocontracts/orchestration/model-routing.v1.yaml::[discovery]per-source. Enforce inCatalogRegistry. - Success. Env-based overrides still work; YAML is the default.
FIX-29. [DONE] Enforce a catalog freshness SLO.
- Problem. No alert if the catalog goes stale (provider outage, auth expired).
- Operation.
vox model discover --dry-run --check-freshnessreturns non-zero if any source’slast_refresh > max_agefrom YAML. Wire intovox doctorviaModel Scoreboardfreshness check. - Success. Deliberately expiring the cache causes
vox doctorto reportWARN: Model Scoreboard stale.
FIX-30. [DONE] Add a vox model discover CLI front-end.
- Problem. Discovery is implicit; users cannot force-refresh. Also requires pricing joins for
cost_per_success_usd. - Operation. Implement
vox model discoverandvox model rollup. Addcost_usdtollm_interactionsfor historical accuracy. - Success.
vox model discover --source OpenRouter --forceprints counts;vox model rollupcalculates accurate costs.
FIX-31. [FIXED] Schedule a nightly discover + scoreboard roll-up.
- Problem. No cron, no persistent scheduled task surface.
- Operation. Use the existing
scheduled-tasksMCP / skill surface referenced in this repo’s ops tooling to create:vox-model-discover-nightly— runsvox run scripts/orchestrator/model_discover.voxdaily at 03:00 local.vox-scoreboard-rollup-nightly— runsvox run scripts/orchestrator/scoreboard_rollup.voxdaily at 03:15.
- Operation cont. Both scripts are
.voxfiles (perAGENTS.mdVoxScript-First policy). Write them toscripts/orchestrator/. - Success.
vox scheduled-tasks listshows both tasks; missing-run alerts viavox doctor.
FIX-32. [FIXED] Expose vox model scoreboard show and vox model scoreboard export --csv.
- Problem. Scoreboard invisible to users.
- Operation. New CLI under
crates/vox-cli/src/commands/model/scoreboard.rs. - Success. CSV round-trip parses; dashboard can consume.
D. Telemetry hardening
Section titled “D. Telemetry hardening”FIX-33. Introduce contracts/orchestration/model-telemetry.v1.yaml.
- Problem. Event names and field sets are declared ad-hoc across
crates/vox-db/src/*_telemetry.rs. - Operation. Enumerate every event
(vox.model.request, vox.model.response, vox.model.error, vox.model.attempt, vox.model.discover, vox.model.score_update)with attributes mapping to OTel GenAI semconv names. Generate validators inresearch_metrics_contract.rs. - Success.
vox ci telemetry-validatepasses; unknown events rejected.
FIX-34. Add session-prefix enforcement.
- Problem. Prefixes
bench:,mcp:,workflow:are by convention only. - Operation. In
validate_research_metric_row()(crates/vox-db/src/research_metrics_contract.rs), requiresession_idto start with one of the registered prefixes from the YAML. Fail-closed in strict profile. - Success. Tests with wrong prefix reject at insert.
FIX-35. Replace target: "vox_dei::*" tracing targets, repo-wide (pairs with FIX-07).
- Problem. Legacy span names make dashboards diverge from module names.
- Operation.
rg -l 'target: "vox_dei'— rewrite each occurrence totarget: "vox_orchestrator::<file_stem>". - Success.
rg 'target: "vox_dei'returns zero.
FIX-36. Delete detect_constructs() in vox-eval.
- Problem. Deprecated since 0.4.0 (
crates/vox-eval/src/lib.rs:194). - Operation. Remove function and its callers (use
ast_eval()); bump minor version; updateCHANGELOG.md. - Success.
rg 'detect_constructs'returns zero.
FIX-37. Make the Socrates double-write transactional.
- Problem.
record_socrates_surface_event()andrecord_socrates_eval_summary()are separate writes; partial-failure loses rollup (crates/vox-db/src/socrates_telemetry.rs:142). - Operation. Wrap both in a single libsql/turso transaction; return
Result<()>that fails if either side errors; add unit test with a mock-failing connection. - Success. Simulated failure leaves zero partial rows.
FIX-38. Emit telemetry via OTel OTLP when VoxTelemetryUploadUrl is set.
- Problem. Telemetry sinks only to local
research_metrics; remote upload exists (docs/src/adr/023-optional-telemetry-remote-upload.md) but isn’t OTel-shaped. - Operation. Add
vox-actor-runtime/src/telemetry/otlp.rsexporter that mirrors eachgen_ai.*span to OTLP HTTP when the upload URL is configured. RespectVoxTelemetryUploadToken(vox-secrets). - Success.
vox telemetry testdelivers a span to a local Jaeger/OTel-collector.
FIX-39. Document and enforce the trace_id contract.
- Problem.
trace_idadded in FIX-14 has no written contract. - Operation. Extend
docs/src/reference/telemetry-metric-contract.mdwith a “Trace ID” section: UUIDv7 required, propagated viatraceparentoutbound, stored astrace_idon every event row. - Success.
vox ci telemetry-validaterejects rows lackingtrace_idfor events from thevox.model.*family.
FIX-40. Add vox.model.attempt event emission in the retry loop.
- Problem.
llm_attemptrows (FIX-16) need a matching telemetry event for live dashboards. - Operation. Every attempt fires
vox.model.attemptwithgen_ai.request.model,attempt_number,outcome,error_class. - Success. Dashboards can compute per-provider failure rates without joining SQL.
E. Secrets & decentralized secret distribution
Section titled “E. Secrets & decentralized secret distribution”FIX-41. Fix OPENAI_API_KEY violation in vox-schola.
- Problem.
crates/vox-schola/src/curator.rsreadsOPENAI_API_KEYdirectly viastd::env::var. ViolatesAGENTS.md:58. - Operation. Replace with
vox_secrets::resolve_secret(SecretId::OpenaiApiKey)?. Delete theenv::varline. Add a unit test verifying the call fails open inProfile::Devwhen the secret is missing. - Success.
vox ci secret-env-guardpasses.
FIX-42. [DONE] Migrate TOGETHER_FINETUNE_MODEL to vox-secrets or config.
- Problem.
crates/vox-cli/src/commands/ai/train.rsreads directly. - Operation. Decide: if secret, add
SecretId::TogetherFinetuneModeltocrates/vox-secrets/src/spec/registry/llm.rsand migrate. If it is non-secret model name, add toOPERATOR_TUNING_ENVSincrates/vox-secrets/src/lib.rs(line ~59). - Success.
secret-env-guardpasses.
FIX-43. [DONE] Migrate GEMINI_DIRECT_MODEL and OPENROUTER_GEMINI_MODEL to the config domain.
- Problem.
crates/vox-config/src/routing_policy.rsreads them directly for routing choice. - Operation. Add both to
OPERATOR_TUNING_ENVS(they are routing preference, not secrets). Documented in newdocs/src/reference/routing-env.md. - Success.
secret-env-guardpasses; routing table generator reads the names from one place.
FIX-44. [DONE] Add POPULI_URL to vox-secrets spec (as non-secret config) or rename.
- Problem.
crates/vox-config/src/inference.rs:68-72readsPOPULI_URL→ falls back toOLLAMA_URL. Neither is in vox-secrets. Confusing name: this is the local Ollama base URL used by Populi, not an auth key. - Operation. Add to
OPERATOR_TUNING_ENVS. Add deprecation alias: preferVOX_POPULI_LOCAL_OLLAMA_URLgoing forward; keepPOPULI_URLandOLLAMA_URLas deprecated aliases with a doctor warning. - Success. Doctor prints the canonical name; older names still work.
FIX-45. [DONE] Add shareable: bool to SecretSpec and default per-secret.
- Problem. Foundation for FIX-46–FIX-50. Today the spec has no “share across my own mesh” flag.
- Operation. Extend
crates/vox-secrets/src/spec/mod.rs::SecretSpecwithshareable: boolandsensitivity: Sensitivity { UserMeshOnly, UserMeshAndExternalVault }. Default true for LLM API keys (OpenRouterApiKey, etc.), default false forVoxIdentityKeyPath,VoxMeshJwtHmacSecret,VoxIdentityMasterPwd. - Success.
cargo test -p vox-secrets shareable_defaultsgreen.
FIX-46. [DONE] Implement X25519 sealed-box in vox-crypto.
- Problem. We have ChaCha20-Poly1305 (symmetric) and Ed25519 (signing) but no X25519 KEM (asymmetric encryption). Required for wrapping a secret for a specific peer without a pre-shared key.
- Operation. Add
x25519-dalekdependency (pure-Rust, already in the Rust crypto ecosystem allowlist; no cmake/nasm). Expose incrates/vox-crypto/src/facades.rs:fn seal(recipient_pub: &X25519PublicKey, plaintext: &[u8]) -> SealedBoxandfn unseal(recipient_priv: &X25519PrivateKey, sealed: &SealedBox) -> Result<Vec<u8>>using libsodium-stylecrypto_box_sealsemantics (ephemeral sender key + ChaCha20-Poly1305). - Success. Unit test: Alice seals → Bob unseals, round trip in 1ms.
FIX-47. Add X25519 keypair to NodeIdentity.
- Problem.
NodeIdentityhas Ed25519 only (crates/vox-identity/src/identity.rs:20-31). - Operation. Add
x25519_signing_keyandx25519_public_key; store alongside the Ed25519 keypair. Derive deterministically from the same seed via HKDF-BLAKE3(ed25519_seed, “vox-x25519-v1”). - Success. Node advertises
x25519_pubin its capability record;vox populi nodesshows it.
FIX-48. Implement vox secrets pair device-pairing flow.
- Problem. No user journey for “install my key once, use on any node.”
- Operation. On Node A:
vox secrets pairgenerates a 128-bit nonce, prints a 5-word mnemonic and a QR encoding{node_a_x25519_pub, nonce, expires_unix_ms}. On Node B:vox secrets pair --accept <mnemonic|QR>performs X25519 ECDH with A’s public key, constructsPairingRequest { node_b_x25519_pub, signed=Ed25519(nonce) }, sends to A via the Populi control plane (FIX-49). A verifies Ed25519, prompts the user to approve"pair with <nickname> (x25519_pub fingerprint)", writes both peers into each side’sTrustedNodeRegistry(crates/vox-identity/src/storage.rs). - Success. End-to-end test: two in-process mesh nodes complete pairing in <2s; replay of the same mnemonic fails.
FIX-49. Implement SecretsSync gossip.
- Problem. Secrets are per-node today.
- Operation. New crate
crates/vox-secrets-sync/(or submodule ofvox-secrets):- On a local secrets write (
set,import-env), iterateTrustedNodeRegistry. For each peer:- Read current value for each
SecretSpec { shareable: true }. - Seal via
vox_crypto::seal(peer.x25519_pub, value)(FIX-46). - Wrap in envelope
SecretsSyncEnvelope { sender_node_id, secret_id, sealed_ciphertext, version_counter, signed_at, signature: Ed25519 }. - Deliver over
A2ADeliverRequest.jwe_payload(crates/vox-populi/src/transport/mod.rs:76).
- Read current value for each
- On receive: verify signature, verify peer is trusted, unseal, write to local vox-secrets with source =
SyncedFrom(peer_node_id), bump local version counter. - Last-writer-wins on conflicts, tagged by
signed_at.
- On a local secrets write (
- Success. Setting
OPENROUTER_API_KEYon Node A viavox secrets setcauses Node B to have it within ~500ms;vox secrets statuson B shows sourceSyncedFrom(A).
FIX-50. Add vox secrets sync --now and --dry-run.
- Problem. Need manual control for initial bootstrap and audit.
- Operation.
vox secrets sync --nowpushes current local state to all trusted peers.--dry-runlists what would be pushed without sending. - Success. Users can force sync after connectivity drops.
FIX-51. Add vox secrets rotate <secret> (per-peer re-encryption).
- Problem. Key rotation today means editing on every node.
- Operation. Rotation bumps the local version, triggers a
SecretsSyncpush with the new value; peers replace on receipt. Old value archived locally for 24h for rollback. - Success. Rotation audit trail in
secrets_audit_log.
FIX-52. Populate A2ADeliverRequest.jwe_payload end-to-end.
- Problem. Field exists but is always empty (
crates/vox-populi/src/transport/mod.rs:76). - Operation.
SecretsSyncis the first consumer. Document that the field is free-form encrypted bytes; it is agnostic to the cipher — vox-secrets usesseal()output, other callers may use OpenPGP or JWE. Gate in handler to reject oversized payloads (> 64 KiB). - Success. Integration test:
jwe_payloadnon-empty on a sync delivery; handler unseals successfully.
FIX-53. Add structured log redaction middleware.
- Problem. No central redactor; a developer adding
debug!("{:?}", secret)could leak. - Operation.
crates/vox-actor-runtime/src/telemetry/redact.rs: atracinglayer that scans each event’s fields for known patterns (regex:sk-[A-Za-z0-9_]{20,},xoxp-…,AIza[0-9A-Za-z\\-_]{35}, etc.) and replaces matches with<REDACTED:<kind>>. Install in the default subscriber. - Success.
tracing::debug!("{}", "sk-live-abcdefghijklmnopqrstuvwx")emits<REDACTED:openai>.
FIX-54. Format-string leak audit.
- Problem.
crates/vox-container/src/docker.rsand.../podman.rsuseformat!("{key}={val}")for--build-arg; values could contain secrets (identified as moderate-risk). - Operation. Rewrite the flag construction to use
--build-arg <key>with the value placed via a tempfile (or--secret). Prohibit passing anySecretSpec-managed value as a build arg; reject in the container builder with a clear error. - Success. Attempting to pass a vox-secrets secret as a docker build arg fails fast with remediation.
FIX-55. vox secrets import-env UX pass.
- Problem. Import is fire-and-forget; users don’t know what happened.
- Operation. Add summary table:
(secret_id, source_env_name, action: imported|skipped|exists).--interactiveprompts before each. After successful import, offervox secrets sync --now. - Success. First-time-user journey ends in a working multi-node setup in <2 minutes.
FIX-56. Break-glass: vox secrets unpair <node_id>.
- Problem. No revocation path.
- Operation. Removes peer from
TrustedNodeRegistry, bumps a revocation counter, broadcastsUNPAIRsigned message so other peers also drop. Future sync deliveries from the removed peer are rejected. - Success. A revoked node stops receiving sync updates within one round trip.
F. Vox.toml / operator surfaces / dead code
Section titled “F. Vox.toml / operator surfaces / dead code”FIX-57. Remove duplicate ModelTier definition.
- Problem. (Part of FIX-02 but listed separately for the PR).
- Operation. Delete
crates/vox-orchestrator/src/models/routing_table.rs:6-17; reroute imports. - Success.
cargo buildgreen;rg 'enum ModelTier'= 1.
FIX-58. Collapse premium_alias layering.
- Problem. Two layers (
spec.rs:206-223hardcoded,registry.rs:162-166TOML override). Both survive convergence but layering is implicit. - Operation. Document layering explicitly in
contracts/orchestration/model-routing.v1.yaml::[premium_alias]. TOML override under~/.vox/models.tomlis the operator escape hatch, YAML is the repo default, built-in is retired. - Success.
vox model explain --show-layersprints exactly which layer won.
FIX-59. Deduplicate Gemini model IDs.
- Problem.
spec.rs:336usesgoogle/gemini-2.0-flash-lite(free tier);crates/vox-config/src/routing_policy.rs:127defaults togemini-2.5-flash;spec.rs:388usesgoogle/gemini-2.5-pro-preview; no preview suffix convention. - Operation. Centralize in
contracts/orchestration/model-catalog.bootstrap.v1.json; use live catalog as source of truth post-bootstrap. Remove hardcoded IDs fromrouting_policy.rs(read from registry viaRegistry::best_of_family("google")). - Success. One text grep for each Gemini ID.
FIX-60. Remove research_eval_runs.tier_distribution_json if unused.
- Problem. Column populated but no consumer (
crates/vox-db/src/schema/domains/scientia.rs). - Operation. Add a dashboard consumer (scoreboard.rollup tier distribution by provider) OR drop the column in a migration.
- Success. Either it is consumed or it is gone.
FIX-61. Replace gpt-4o-mini and gpt-4o bootstrap fallbacks.
- Problem. Hardcoded legacy fallbacks (
crates/vox-config/src/bootstrap_inference.rs:6-20) may be deprecated by OpenAI in 2026; will cause silent selection of unavailable models. - Operation. Replace with
openrouter/autofor generic fallback; replaceRESEARCH_FLASH_FALLBACKwithgoogle/gemini-2.5-flashresolved via registry (FIX-59 machinery); replaceREVIEW_PREMIUM_FALLBACKwith the premium_alias forreview. - Success.
rg 'gpt-4o-mini|gpt-4o' crates/vox-configreturns zero code hits.
FIX-62. Turn models.toml into an override-only surface.
- Problem.
registry.rs:139-154lazily creates~/.vox/models.tomlwith all defaults; users then hand-edit, so the file and the live catalog diverge silently. - Operation. Change semantics:
models.tomlcontains ONLY[premium_alias]and[overrides.<model_id>]sections; no[[models]]arrays. Migration: on first boot after this change, rewrite existingmodels.tomlstripping the[[models]]table and leaving a comment pointer. - Success. A pristine install has a minimal
models.toml; overrides survive.
FIX-63. Retired naming — archive legacy orchestrator DEI identifiers in logs / docs / symbols.
- Problem. Mixed references; developers expect the retired orchestrator codename to exist somewhere.
- Operation. Grep for
vox_dei,dei_, and hyphenated legacy spellings acrosscrates/**/src/**.rsand either rename or add#[allow(dead_code)] mod dei_shimwith a doc comment pointing to the new home. (One known non-code hit isdocs/src/reference/— leave archival mentions indocs/src/archive.) - Success. Only the archived directory and the retirement table in
AGENTS.mdmention the legacy orchestrator codename.
FIX-64. Gate MENS training scripts behind vox ci secret-env-guard.
- Problem. Training scripts under
scripts/mens/may read envs directly. - Operation. Add a
.voxscript linter that fails onenv.get(...)for anything whose name matches aSecretSpecregex. - Success. PR CI fails if a
.voxscript reads a managed secret directly.
FIX-65. Track cost-per-success on MENS vs. remote.
- Problem. Operators cannot answer “was MENS cheaper than OpenRouter this week?”
- Operation.
vox model scoreboard show --group-by providerreportscost_per_success_usdper provider. Add a weekly rollup report tovox doctor --report weekly. - Success. Report present and human-readable.
FIX-66. Fix the ProviderType::Custom(String) lint hole.
- Problem.
Custom(String)variant (spec.rs:80-106) short-circuits strength inference and scoreboard keys because the string varies per install. - Operation. Canonicalize via
host_of(base_url)when storing in telemetry & scoreboard; keep the raw string for display only. - Success. Two installs with the same custom provider share scoreboard rows.
FIX-67. Validate Vox.toml against a schema.
- Problem.
Vox.tomlparsed ad-hoc per-section; no JSON Schema. - Operation. Add
contracts/Vox.v1.schema.json; validate via a newvox ci vox-toml-validateguard. - Success. Unknown keys in
Vox.tomlfail CI.
FIX-68. Document every operator knob in docs/src/reference/routing-env.md.
- Problem.
VOX_AUTO_MODEL_STRATEGY,VOX_AUTO_ROUTING_PRIORITY,VOX_GEMINI_ROUTE_POLICY,VOX_OPENROUTER_CATALOG_MIN_REFRESH_INTERVAL_SECS,VOX_OPENROUTER_CATALOG_REFRESH_JITTER_MShave no single docs home. - Operation. Write the reference doc, link from
AGENTS.md“Related Operational Surfaces” and fromdocs/src/reference/cli.md. - Success. Every env in
OPERATOR_TUNING_ENVSappears in the doc.
FIX-69. [DONE] vox doctor reports catalog source health.
- Problem. Users don’t know which sources failed to refresh.
- Operation. Add a
CatalogSourceHealthcheck that printsOpenRouter: 312 models (fresh 02h ago) | Ollama: 8 models (fresh 05m ago) | PopuliMesh: 3 models (fresh 00m ago) | HFHub: STALE (38h ago). - Success. Doctor output visually obvious.
FIX-70. [FIXED] Update docs/src/architecture/research-index.md.
- Problem. New research/plan docs must be linked.
- Operation. Add an entry for this document (
model-orchestration-ssot-audit-2026.md). - Success.
vox ci research-index-checkgreen.
Part 4 — Industry alignment notes
Section titled “Part 4 — Industry alignment notes”4.1 OpenTelemetry GenAI semconv v1.37
Section titled “4.1 OpenTelemetry GenAI semconv v1.37”The OpenTelemetry project standardizes gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.finish_reasons, and related attributes across LLM providers. Adopting this surface makes every Vox model call legible to any OTel collector (Datadog, Jaeger, Honeycomb, Grafana Tempo). See FIX-13, FIX-33, FIX-38. (OpenTelemetry GenAI spec).
4.2 Model-router landscape
Section titled “4.2 Model-router landscape”RouteLLM (LMSYS) is open source routing logic that pairs a classifier with a two-model setup (cheap + strong) and reports up to ~85% cost reduction at ~95% benchmark retention. LiteLLM is a unified proxy for 100+ providers with fallbacks, budgets, and an admin UI. Martian classifies prompts with a small local model to pick the optimal destination. OpenRouter itself is a gateway plus catalog. The Vox orchestrator can borrow RouteLLM’s “trained classifier” pattern (future work — FIX-10 lays the data foundation) while keeping our own gateway instead of LiteLLM, because we need secret-plane integration and mesh-node routing that third-party gateways do not provide. (LLM router comparison 2026).
4.3 OpenRouter models API
Section titled “4.3 OpenRouter models API”GET /api/v1/models returns data[].pricing.{prompt,completion,request,image,reasoning}, data[].context_length, data[].architecture.{input_modalities,output_modalities}, and data[].supported_parameters[]. GET /api/v1/models/{author}/{slug}/endpoints returns per-endpoint uptime and rate limits. These are exactly the inputs Vox needs for ModelCatalogEntry. (OpenRouter models endpoint).
4.4 Ollama API
Section titled “4.4 Ollama API”GET /api/tags returns local models; POST /api/show returns parameter count, context length, quantization, and template. This covers everything OllamaCatalog (FIX-22) needs. (Ollama list models).
4.5 Cryptography for mesh-secret sync
Section titled “4.5 Cryptography for mesh-secret sync”age / rage is a pure-Rust file-encryption tool built around X25519 recipients and ChaCha20-Poly1305 — same primitives we already allow (AGENTS.md:76). We will not depend on age directly but will use the same pattern with x25519-dalek and our own ChaCha20-Poly1305 bindings. This keeps the bans in AGENTS.md intact (no ring, no AEGIS, no cmake/nasm). See FIX-46. (rage documentation).
Part 5 — Staged rollout (high-level)
Section titled “Part 5 — Staged rollout (high-level)”Stage 1 — SSOT scaffolding (FIX-01..08, FIX-33, FIX-70). Land the YAML contracts + codegen, collapse the enum duplicates, retire vox_dei::* targets. No runtime behavior change.
Stage 2 — Telemetry v1 (FIX-13..17, FIX-34..40). OTel GenAI attributes on every call; trace IDs; retry/attempt table; cache savings.
Stage 3 — Discovery v1 (FIX-21..32). Plugin trait, per-source catalogs, disk cache, nightly schedule. Users gain vox model discover and vox model list --source <name>.
Stage 4 — Scoreboard + self-tuning (FIX-09..12, FIX-18..20). Write the scoreboard table, roll up, feed into best_for(), expose vox model explain / vox model scoreboard show. This is where “the code learns which models to use.”
Stage 5 — Secrets v2 (FIX-41..56). Secret-env-guard fixes, X25519 primitives, device pairing, SecretsSync gossip. This is where the single-login-across-mesh user journey completes.
Stage 6 — Cleanup & docs (FIX-57..69). Bootstrap-model rename, operator-knob documentation, dead-code removal, doctor polish.
Part 6 — Success metrics
Section titled “Part 6 — Success metrics”- Zero hardcoded model IDs outside
contracts/orchestration/model-catalog.bootstrap.v1.json(grep-able in CI). rg 'enum (ModelTier|ProviderType|StrengthTag|ChatRouteBackend)'returns one match each.- Every LLM call emits
gen_ai.request.modelandtrace_id. model_scoreboardhas ≥1 row per(model_id, task_category)pair seen in the last 30 days.- A user installs
OPENROUTER_API_KEYonce,vox secrets pairs a second node, and that node completesvox chatwithout re-entering the key. vox ci secret-env-guardreturns zero violations.vox model discover --all --forcesucceeds and writes a cache with ≥5 sources.vox ci ssot-auditpasses, confirming parity between scoreboard telemetry and routing decisions.- Telemetry-driven cost accounting and verified fail-over routing loops are enabled by default for the next development sprint (Production Transition Complete).
Sources
Section titled “Sources”- OpenRouter — List all models and their properties
- OpenRouter — Models overview
- OpenRouter — List endpoints for a model
- OpenTelemetry — GenAI semantic conventions
- OpenTelemetry — GenAI metrics
- OpenTelemetry — GenAI client spans
- Ollama — List models API
- Ollama — Model management API
- LMSYS — RouteLLM
- LLM router comparison 2026
- LiteLLM alternatives 2026
- age/rage encryption — X25519 recipients
- Vox
AGENTS.md(this repo) - Vox Secrets SSOT (this repo)