Vox Speech Surface Inventory 2026
Vox Speech Surface Inventory 2026
Section titled “Vox Speech Surface Inventory 2026”This inventory anchors the ASR-primary broad-wave speech-to-code audit. A surface is included when it either captures speech, claims voice support, forwards audio/transcripts into Oratio, or is documented as part of the speech path.
Summary
Section titled “Summary”| Surface | Capture | Transport | Backend | Current status | Coverage |
|---|---|---|---|---|---|
| Editor file commands | Existing audio file | MCP | Oratio plugin / vox-oratio | Implemented | No extension E2E |
| Editor webview mic | Browser getUserMedia WAV | VS Code postMessage then MCP | Oratio plugin / vox-oratio | Implemented | No automated mic E2E |
| Vox app web speech | Web Speech API or prompt fallback | In-process JS runtime | Browser STT | Implemented but not Oratio | Stubbed E2E only |
| Vox app Android | Native mic | Capacitor bridge | Sherpa ONNX plugin | Present | No accuracy parity test |
| Vox app iOS | Native mic | Capacitor bridge | Apple Speech | Present | No accuracy parity test |
| Dashboard Loquela | None | Text chat only | None | Gap | No speech test |
| CLI eval | Dataset file paths | Local CLI | Oratio | Implemented | WER/CER unit math only |
| CLI record-transcribe | cpal mic | Local CLI | Oratio | Feature-gated | No automated mic test |
| MCP tools | Path or prompt | MCP | Oratio + codegen | Implemented | Schema/HIR tests only |
| Streaming WS | s16le 16 kHz chunks | WebSocket | Oratio server | Feature-gated | No runtime stream test |
| HTTP audio ingress | Documented /api/audio/* | HTTP | Missing | Orphaned contract | No crate present |
Surface Details
Section titled “Surface Details”Editor file commands
Section titled “Editor file commands”- Entry points:
vox.oratio.transcribeFile,vox.oratio.speechToCodeFileinapps/editor/vox-vscode/src/speech/registerOratioSpeechCommands.ts. - Capture format:
.wav,.mp3,.flac,.ogg,.m4a,.webmselected from disk. - Transport: workspace-relative path to
VoxMcpClient.oratioTranscribeorVoxMcpClient.speechToCode. - Target tools:
vox_oratio_transcribe,vox_speech_to_code. - Verified gap: no automated extension test proves file path to MCP to transcript or code.
Editor webview mic
Section titled “Editor webview mic”- Entry points:
vox.oratio.voiceCaptureTranscribe,vox.oratio.voiceCaptureSpeechToCode. - Capture format:
navigator.mediaDevices.getUserMedia({ audio: true }),ScriptProcessorNode, mono 16-bit PCM WAV. - Important finding: WAV header uses
audioCtx.sampleRate, commonly 48 kHz, not a forced 16 kHz stream. Correctness depends on Oratio decode/resample. - Transport: base64 WAV via
postMessage, written under.vox/tmp/vscode_voice_*.wav, then sent to MCP. - Verified gap: no synthetic audio injection harness or mic permission test exists.
Vox-language app speech
Section titled “Vox-language app speech”- Entry point:
Speech.transcribe_microphone()inapps/vox-mental-tracker/src/main.vox, lowered through TypeScript runtime shims inapps/vox-mental-tracker/src/runtime.ts. - Web capture:
SpeechRecognition/webkitSpeechRecognition. - Fallback:
window.prompt, which must be classified asASR=N/Arather than an ASR success. - Transport: none. The browser path does not call Oratio or the orchestrator.
- Coverage:
apps/vox-mental-tracker/tests/e2e/voice_flow.spec.tsuses__VOX_TEST_TRANSCRIPT__, so it verifies UI plumbing, not ASR.
Mobile app speech
Section titled “Mobile app speech”- Android entry point:
apps/vox-mental-tracker/plugins/vox-sherpa-transcribe/android/src/main/java/com/vox/plugins/voxsherpatranscribe/VoxSherpaTranscribePlugin.kt. - iOS entry point:
apps/vox-mental-tracker/plugins/vox-sherpa-transcribe/ios/AppleSpeechBackend.swift. - Backends: Sherpa ONNX on Android and Apple Speech on iOS.
- Verified gap: no shared corpus parity test compares mobile transcripts against Candle Whisper or Oratio fixtures.
Dashboard Loquela
Section titled “Dashboard Loquela”- Entry point:
crates/vox-dashboard/src/components/shell/SpeakPanel.tsx. - Current behavior: text area plus send button. The subtitle says
VOICE INTERFACE, but there is no microphone button,getUserMedia,MediaRecorder, Oratio tool, or speech-to-code call. - Transport:
useVoxChatsends text throughvox_chat_message. - Audit classification: product gap, not an ASR cell.
CLI and MCP
Section titled “CLI and MCP”- CLI eval:
crates/vox-ml-cli/src/commands/oratio_cmd.rsexposes WER/CER evaluation and persistence paths. - CLI mic:
crates/vox-ml-cli/src/commands/oratio_mic.rsis feature-gated byoratio-mic. - MCP Oratio:
crates/vox-orchestrator-mcp/src/oratio_tools.rs. - MCP speech-to-code:
crates/vox-orchestrator-mcp/src/speech_pipeline_tools.rs. - Coverage: schema parity, HIR fixture validation, and optional KPI canary exist, but no always-on runtime model canary exists.
Streaming and HTTP
Section titled “Streaming and HTTP”- Streaming server:
crates/vox-oratio/src/serve.rs, featureserve, expects s16le mono 16 kHz chunks. - HTTP audio:
examples/oratio/codexAudioTranscribe.tsandcontracts/codex-api.openapi.yamldescribe/api/audio/*, but nocrates/vox-audio-ingressexists in this checkout. - Env drift:
contracts/config/env-vars.v1.yamlstill assignsVOX_ORATIO_STREAM_MAX_BUFFER_MSandVOX_ORATIO_WORKSPACEtovox-audio-ingress.
Audit Consequences
Section titled “Audit Consequences”- The editor webview is the only mouse-driven desktop mic surface that currently reaches Oratio and
vox_speech_to_code. - The dashboard must be scored as a missing speech surface until a mic path exists.
- The Vox-language app path proves ordinary app speech syntax exists, but its web implementation is browser STT rather than Oratio.
- Backend accuracy comparisons must separate Candle Whisper, Sherpa, Apple Speech, Web Speech API, and prompt fallback instead of blending them.