Skip to content

Model Catalog SSOT — Architecture & Implementation Plan 2026

Model Catalog SSOT — Audit & Implementation Plan (2026)

Section titled “Model Catalog SSOT — Audit & Implementation Plan (2026)”

Scope: crates/vox-orchestrator/src/{catalog.rs,models/} · contracts/orchestration/model-catalog.bootstrap.v1.json · contracts/orchestration/model-routing.v1.yaml


Part 1 — Confirmed Split-Brain Bugs (True Positives)

Section titled “Part 1 — Confirmed Split-Brain Bugs (True Positives)”

Every item below is proven with exact file + line references.


Bug 1 — Refresh-Interval Key Name Mismatch (Silent No-Op Throttle)

Section titled “Bug 1 — Refresh-Interval Key Name Mismatch (Silent No-Op Throttle)”
FileLine
Readsmodels/registry.rs279
Writesmodels/registry.rs386

The throttle check reads "openrouter_catalog_refresh" but writes "catalog_refresh". The check never finds its own timestamp. Every process startup triggers a full OpenRouter + LiteLLM network fetch regardless of the configured VoxOpenRouterCatalogMinRefreshIntervalSecs. Wasted bandwidth, slower startup, rate-limit risk.

Fix: Unify to a single constant MODEL_CATALOG_LAST_REFRESH_KEY = "model_catalog_last_refresh".


Bug 2 — register() Silently Overwrites Telemetry-Calibrated Pricing

Section titled “Bug 2 — register() Silently Overwrites Telemetry-Calibrated Pricing”
FileLines
models/registry.rs574–576 (unconditional insert)
models/registry.rs443–446 (call site during refresh)

inject_pricing_catalog() correctly guards PricingSource::Telemetry (lines 116–117), but the background refresh loop calls the unguarded register() for every model afterwards. A model whose cost was calibrated from real observed spend (e.g. $0.0045/1k) is overwritten by the stale OpenRouter catalog price on the next startup.

Fix: register() must merge-but-preserve pricing fields when existing.pricing_source == Telemetry.


Bug 3 — AnthropicDirect Registers Expensive Models as is_free: true

Section titled “Bug 3 — AnthropicDirect Registers Expensive Models as is_free: true”
FileLines
catalog.rs602–623
let (c_in, c_out) = (0.0_f64, 0.0_f64);
let pricing_unknown = c_in == 0.0 && c_out == 0.0;
// ...
is_free: pricing_unknown, // Claude Opus → is_free = true

When LiteLLM is unreachable, newly discovered Anthropic models enter as is_free: true. The Economy preference picker ranks them highest (best quality score, best latency score, zero apparent cost). A $75/M token model gets invoked. BudgetManager cannot catch it because cost_per_1k = 0.0.

Fix: Never set is_free: true for Anthropic models. Mark as PricingSource::Unknown. The routing gate (Wave 1) blocks Unknown from autonomous dispatch.


Bug 4 — Bootstrap JSON Has "max_context": 0 for Every Single Model

Section titled “Bug 4 — Bootstrap JSON Has "max_context": 0 for Every Single Model”
FileLines
contracts/orchestration/model-catalog.bootstrap.v1.json19, 39, 59, 79, 99, 119, 139, 159, 179, 199

Every capabilities block in the bootstrap JSON has "max_context": 0 despite max_tokens being correctly set (e.g. 128000). Any routing path that reads capabilities.max_context for long-context filtering receives a broken signal on cold-start.

Fix: Populate max_context to match max_tokens in the bootstrap file, or add normalization in ModelConfig::default() that sets cap.max_context = spec.max_tokens when zero.


Bug 5 — premium_alias Defined in Two Places With No Sync

Section titled “Bug 5 — premium_alias Defined in Two Places With No Sync”
Location
models/spec.rs:249–265Rust: built_in_premium_alias()
contracts/orchestration/model-routing.v1.yaml:84–92YAML: premium_alias: block

Both define the exact same model-ID-to-task mappings. The YAML is never loaded at runtime — ModelRegistry::new() always uses the Rust defaults. Editing the YAML has zero effect on routing.

Fix: Delete built_in_premium_alias(). Parse model-routing.v1.yaml at startup via vox_config::ModelRoutingConfig. Use the YAML as the only source. Add a vox ci model-routing-check guard.


Bug 6 — Scoring Constants Duplicated From YAML (One Copy Is Dead)

Section titled “Bug 6 — Scoring Constants Duplicated From YAML (One Copy Is Dead)”
Constantscoring.rsmodel-routing.v1.yamlLoaded at runtime?
LATENCY_EXCELLENT_MS = 500.0line 25line 81❌ Rust const only
LATENCY_POOR_MS = 8_000.0line 27line 82❌ Rust const only
exploration.budget_usd_per_day: 50.0not presentline 21❌ YAML only, never read
safety.max_cost_usd_per_request: 5.0not presentline 25❌ YAML only, never read

Operators editing the YAML believe they are changing scoring behavior. They are not.

Fix: Expose these through vox_config::ModelRoutingConfig loaded from the YAML. Replace the Rust const values with reads from this struct.


Bug 7 — quality_weights in YAML Is Completely Ignored

Section titled “Bug 7 — quality_weights in YAML Is Completely Ignored”
FileLines
contracts/orchestration/model-routing.v1.yaml8–13
models/scoring.rs249–254
quality_weights:
socrates_factuality: 0.25
contradiction_inverse: 0.15
success_rate: 0.25
p50_latency_inverse: 0.15
cost_inverse: 0.2

auto_score_model() uses AutoRoutingPriority::from_env() (VOX_ROUTE_* env vars). The YAML quality_weights block is dead config — declared but never consumed. This is the highest-priority maintainability issue because it creates a false belief that the YAML controls quality ranking.

Fix: Either consume quality_weights in the scoring path, or remove it from the contract and explicitly document that scoring.weights is the authoritative block.


Bug 8 — HuggingFaceCatalog Returns Three Hardcoded Models, One Deprecated

Section titled “Bug 8 — HuggingFaceCatalog Returns Three Hardcoded Models, One Deprecated”
FileLines
catalog.rs399–403
let known_models = vec![
"Qwen/Qwen2.5-72B-Instruct",
"meta-llama/Llama-3.1-70B-Instruct",
"mistralai/Mixtral-8x7B-Instruct-v0.1", // deprecated late 2024
];

All three are registered with is_free: true and $0.0 cost regardless of the user’s HF account tier. The HF Inference Providers API (/api/models?inference=warm) provides dynamic discovery and is not used.

Fix: Replace the static list with a call to HF’s warm-inference endpoint. Pass results through ModelAdmissionFilter (Wave 2).


Bug 9 — DeepSeek Off-Peak Bonus Misses OpenRouter-Routed DeepSeek Models

Section titled “Bug 9 — DeepSeek Off-Peak Bonus Misses OpenRouter-Routed DeepSeek Models”
FileLines
models/scoring.rs289–299
if matches!(m.provider_type, ProviderType::DeepSeek) && is_deepseek_off_peak()

The bonus only fires for ProviderType::DeepSeek. When DeepSeek R1 is accessed via OpenRouter (the common path — provider_type = OpenRouter), the off-peak bonus is never applied. R1 scores identically at 3am and 3pm, defeating the discount window entirely.

Fix: Match on m.id.to_ascii_lowercase().contains("deepseek") instead of provider_type, or introduce a time_of_day_discount: f32 field that the LiteLLM oracle populates.


Bug 10 — Exploration Budget Contract Is Never Enforced

Section titled “Bug 10 — Exploration Budget Contract Is Never Enforced”
FileLine
contracts/orchestration/model-routing.v1.yaml21
budget_usd_per_day: 50.0

No Rust code enforces a daily USD cap on exploration calls. If Thompson bandit scoring is miscalibrated or a novel model is mislabeled as excellent, the system can spend unboundedly on exploration.

Fix: Add exploration_usd_today: AtomicU64 to BudgetManager. Gate record_novel_routing_explore() against the ceiling from ModelRoutingConfig.


Part 2 — Eliminated False Positives (from Prior Plan)

Section titled “Part 2 — Eliminated False Positives (from Prior Plan)”
Prior ClaimVerdictReason
”Six catalog implementations is fragmented”False positiveEach provider has different auth/discovery APIs. Multiplicity is intentional. The bugs are within them, not in their count.
”Anthropic tier-by-name is a hack”False positiveTier display is harmless. Bug 3 (pricing=0.0 → is_free=true) is the real danger.
”Bootstrap JSON format should be YAML”False positiveFormat doesn’t matter. Content (max_context=0, stale prices) is the real issue.
”Pricing Oracle Risk is catastrophic”OverstatedLiteLLM failure degrades to Bootstrap pricing, not zero. The specific path to $0 cost is only via Bug 3 (AnthropicDirect).

OpenRouter is the discovery authority. model-routing.v1.yaml is the behavioral authority. VoxDb telemetry is the cost authority. These three must be wired together — the current system treats them as independent.

DISCOVERY (Who exists?)
OpenRouter /api/v1/models ──►
AnthropicDirect /v1/models ──► ModelAdmissionFilter ──► ModelRegistry
HuggingFace /api/models ────► ├─ capability inference
MensCatalog (local) ─────────► ├─ strength inference (from YAML rules)
└─ initial tier + PricingSource assignment
PRICING (What do they cost?)
LiteLLM oracle ──► apply_litellm_pricing() [Oracle]
VoxDb telemetry ──► inject_pricing_catalog() [Telemetry — highest]
Bootstrap seed ──► ModelConfig::default() [Fallback — lowest]
PricingSource::Unknown ──► BLOCKED from autonomous routing
SCORING (How good are they?)
auto_score_model() reads from:
• model-routing.v1.yaml (latency_bands, weights, exploration budget)
• ModelSpec.capabilities (live from discovery)
• ModelScore from VoxDb scoreboard (success_rate, p50_latency_ms)
• BudgetManager (exploration quota, doom-loop gate)
ROUTING (Who gets the task?)
ModelRegistry.best_for_task()
Filter: PricingConfidence gate (Unknown → blocked)
Filter: ExplorationBudget gate (daily cap enforced)
Filter: CircuitBreaker / penalty_map
Rank: auto_score_model() + Thompson bandit arm stats
premium_alias → read from model-routing.v1.yaml ONLY

New models released on OpenRouter are automatically discovered, classified, scored, and routed without manual intervention:

1. OpenRouter refresh (hourly) returns N new model IDs
2. ModelAdmissionFilter runs each new model through:
a. infer_strengths() — from YAML strength_inference rules
b. infer_tier() — context window size + provider family
c. LiteLLM match — pricing lookup (Bootstrap→LiteLLM if found)
d. Set PricingSource::Unknown if no LiteLLM match
3. New model enters registry with PricingSource set
4. If PricingSource == Unknown:
- Allowed in manual/interactive sessions
- BLOCKED in autonomous task dispatch
- Written to VoxDb model_admission_queue
5. On first N=3 successful calls:
- Actual cost recorded via BudgetManager.record_cost()
- PricingSource promoted to Telemetry
- Model fully admitted to autonomous routing
6. Thompson bandit tracks success_rate, quality_score, p50_latency
7. auto_score_model() ranks model against peers continuously
8. ModelRegistry.best_for_task() routes to it when it wins

Wave 0 — Critical Bug Fixes (Regressions Today, No Architecture Change)

Section titled “Wave 0 — Critical Bug Fixes (Regressions Today, No Architecture Change)”
TaskFileBug Fixed
W0-1: Unify refresh key name constantmodels/registry.rs:279,386Bug 1
W0-2: Guard register() against Telemetry overwritemodels/registry.rs:574Bug 2
W0-3: Normalize max_context=0 in bootstrapmodels/spec.rs:294Bug 4
W0-4: Remove is_free: pricing_unknown from AnthropicDirectcatalog.rs:623Bug 3
W0-5: Fix DeepSeek off-peak provider-type checkmodels/scoring.rs:289Bug 9

Wave 1 — SSOT Wiring (Behavioral Authority from YAML)

Section titled “Wave 1 — SSOT Wiring (Behavioral Authority from YAML)”
TaskFiles
W1-1: Add PricingSource::Unknown variantmodels/spec.rs
W1-2: Add Pricing Confidence Gate to best_for_internal()models/registry.rs
W1-3: Expose model-routing.v1.yaml at runtime via vox_config::ModelRoutingConfigcrates/vox-config/
W1-4: Delete built_in_premium_alias(), read from YAMLmodels/spec.rs, registry.rs
W1-5: Replace const scoring values with ModelRoutingConfig readsmodels/scoring.rs
W1-6: Resolve quality_weights / scoring.weights ambiguitymodel-routing.v1.yaml, scoring.rs
W1-7: Enforce exploration daily budget in BudgetManagerbudget/mod.rs
TaskFiles
W2-1: ModelAdmissionFilter structNew: models/admission.rs
W2-2: Wire admission to maybe_refresh_catalogs()models/registry.rs
W2-3: Persist PendingPricing models to VoxDb model_admission_queuebudget/persistence.rs
W2-4: vox model catalog status CLI commandcrates/vox-cli/src/commands/model.rs
W2-5: Replace HuggingFaceCatalog static list with HF warm APIcatalog.rs
Task
W3-1: vox ci model-routing-check guardVerify no hardcoded aliases, constants within 1% of YAML, no max_context=0 in bootstrap
W3-2: Bootstrap JSON as @generated artifactRegenerate from YAML + admission filter output; block manual edits in CI
W3-3: Doc-sync PricingSource priority ladderCode comment must exactly mirror enforced priority order in register() and best_for_internal()

BugFixed byPriority
Refresh key mismatchW0-1P0
register() overwrites TelemetryW0-2P0
AnthropicDirect is_free lieW0-4 + W1-1 + W1-2P0
max_context: 0 in bootstrapW0-3P1
DeepSeek off-peak provider checkW0-5P1
Exploration budget unenforcedW1-7P1
premium_alias duplicationW1-4P1
Scoring constants not from YAMLW1-5P2
quality_weights dead configW1-6P2
HF hardcoded model listW2-5P2
Automatic new model adoptionW2-1 → W2-3P2