Vox Speech CI Gates Proposal 2026
Vox Speech CI Gates Proposal 2026
Section titled “Vox Speech CI Gates Proposal 2026”The current CI posture validates schemas and some HIR fixtures, but it does not run a default numeric ASR gate unless VOX_SPEECH_CANARY_KPI is supplied. This proposal separates required gates from advisory gates so expensive model or hardware work does not block every contributor while still preventing silent regression.
Required Gates
Section titled “Required Gates”| Gate | Command | Why required |
|---|---|---|
| Speech audit contracts | cargo test -p vox-integration-tests --test speech_audit_contract_test | Ensures matrix, committed KPI, docs, and index entries exist. |
| Speech manifest triples | cargo test -p vox-integration-tests --test speech_benchmark_manifest_test | Ensures corpus rows have audio/transcript/expected metadata. |
| Speech schema parity | Existing speech_schema_parity_test | Prevents schema drift across contracts and MENS exports. |
| HIR fixture validation | Existing speech_fixture_validate_test | Prevents expected Vox outputs from becoming uncompilable. |
| Committed canary shape | VOX_SPEECH_CANARY_KPI=<absolute path to contracts/speech-to-code/canary.kpi.json> cargo test -p vox-integration-tests --test speech_canary_test | Makes canary validation non-optional at least for the baseline snapshot. |
| Forced OOM regression | cargo test -p vox-plugin-oratio single_window_branch_does_not_force_simulated_oom | Prevents the single-window Whisper path from regressing to a simulated inference failure. |
Advisory Gates
Section titled “Advisory Gates”| Gate | Trigger | Skip reason |
|---|---|---|
| Runtime Candle ASR canary | Nightly or release branch with model cache | offline |
| CUDA decode parity | Runner has nvcc and CUDA runtime | no-cuda |
| Sherpa parity | stt-sherpa feature and model dir available | feature-off |
| Mobile speech parity | Device/emulator runner available | not-available |
| Streaming WS partial/final | serve feature lane | transport_error |
Current Runtime-Suite Result
Section titled “Current Runtime-Suite Result”The .vox/audit/2026-05-11-oratio-full-runtime/ scorecard is CI-ready as an artifact, but it is not a passing ASR gate:
- CPU Candle audio evaluation executes end-to-end after the forced-OOM fix. The current spoken 16 kHz corpus measures
WER=0.1071,CER=0.0976, under the current canary thresholds. - CUDA is not a
no-cudaskip on the Windows host;nvccexists, but CUDA plugin linking fails on unresolved Candlemoe_gemm_*symbols. - iOS and Android cells are runtime-unavailable in this host because
xcrunis unavailable and Android SDKadbreports no attached/emulated devices. - The advertised Oratio streaming WebSocket URL has no matching
vox-oratioWS server route; only HTTPPOST /transcribeexists.
CLI Work Items
Section titled “CLI Work Items”- Add an
oratiofeature lane toFEATURE_SETSincrates/vox-cli/src/commands/ci/constants.rsand unignorefeature_sets_include_populi_oratio_lanein the matrix tests. - Add
vox ci speech-canaryas a wrapper that:- loads
contracts/speech-to-code/audit-matrix.v1.yaml, - runs the smallest MUST CLI eval cell when model assets are available,
- writes
.vox/audit/<run_id>/scorecard.json, - writes a KPI snapshot,
- runs
speech_canary_testagainst that snapshot.
- loads
- Extend
crates/vox-cli/src/commands/ci/run_body_helpers/cuda.rsfrom compile-only coverage to include a tiny runtime decode parity cell when the corpus and model cache exist. - Extend
vox ci speech-runtime-suitebeyond its current CPU Candle + classification implementation as browser, mobile, CUDA, Sherpa, and streaming harnesses become executable.
Policy
Section titled “Policy”Required gates must not download models. Advisory gates may use cached models or explicitly skip with a machine-readable reason. A skipped advisory cell is acceptable; an unreported skip is a CI bug.