Plugin System Audit (2026-05-08)

Scope: crates/vox-plugin-*/. Excluded: crates/vox-populi/src/mens/tensor/, crates/vox-tensor/, crates/vox-skills/src/registry.rs (other agents active).

Findings

Critical (must fix)

None identified. All code/composite plugins declare abi-version = 11, matching VOX_PLUGIN_ABI_VERSION = 11 in crates/vox-plugin-api/src/lib.rs. The noop-bad-abi fixture intentionally declares abi-version = 1 and is correctly excluded from plugin-abi-parity.

Important (should fix)

1. Duplicate NVML probe logic across vox-populi and vox-plugin-nvml-probe

crates/vox-populi/src/mens/hardware/nvml.rs (80 lines) — in-process NVML probe used when no plugin is loaded.
crates/vox-plugin-nvml-probe/src/probe.rs (174 lines) — out-of-process plugin version of the same logic.

Both depend on nvml_wrapper, implement the same probe_summary_json / device_metrics_json contract, and share near-identical GPU enumeration loops. The in-tree copy was retained for fallback but creates two maintenance surfaces for NVML API changes. Recommendation: extract shared NVML query helpers into a small internal crate (e.g. vox-nvml-queries) and have both consumers depend on it. Effort: ~1 day.

2. SP7 scaffold stubs still returning errors

Five plugins contain methods that unconditionally return "not yet implemented; SP7 scaffold":

Plugin	File	Stub count
`vox-plugin-cloud`	`src/sync.rs`	2 (`upload_artifact`, `download_artifact`)
`vox-plugin-oratio-mic`	`src/mic.rs`	3 (`open_device`, `start_capture`, `stop_capture`)
`vox-plugin-oratio`	`src/audio.rs`	3 (`open_device`, `start_capture`, `stop_capture`)
`vox-plugin-script-execution`	`src/executor.rs`	2 (`execute_script`, `execute_script_with_timeout`)
`vox-plugin-tensor-burn-wgpu`	`src/lib.rs`	(entire init)

These are not bugs per se — the SP7 milestone owns them — but they should be tracked. The oratio and oratio-mic stubs overlap: both implement AudioCapture; one should delegate to the other or they should share the same crate. Recommendation: file SP7 tickets referencing these file:line locations and add a // TODO(SP7): comment to enable rg TODO.*SP7 as a milestone completeness check.

3. vox-plugin-cuda-spike — abandoned spike crate with no consumers

crates/vox-plugin-cuda-spike/ is a gating spike crate (SP3) with no downstream dependencies (grep vox-plugin-cuda-spike crates/*/Cargo.toml returns nothing). It pulls in candle-core with the cuda feature unconditionally, which slows CUDA-feature resolution across the workspace. Recommendation: archive to crates/_spikes/ or delete. Effort: 30 min.

Minor (cleanup opportunities)

4. hakari.toml has no platform entries

crates/workspace-hack/Cargo.toml is generated by cargo hakari but the .config/hakari.toml has all platform triples commented out. This means the workspace-hack only optimises for the host platform, potentially missing deduplication opportunities for CI (Linux/macOS cross-compile targets). Recommendation: uncomment x86_64-unknown-linux-gnu and aarch64-apple-darwin and regenerate. Low risk, run cargo hakari generate && cargo hakari verify. Effort: 15 min.

5. [profile.release] missing codegen-units = 1

The release profile uses lto = "thin" but not codegen-units = 1. Thin-LTO with multiple codegen units can leave inlining on the table compared to codegen-units = 1 + lto = "fat". Given the CLI binary size and the typical single-shot usage model, switching to fat-LTO + codegen-units = 1 on the dedicated vox-cli binary target would reduce binary size by an estimated 5–15% and improve runtime throughput slightly. The workspace-level profile change affects all crates; a per-package override for vox-cli is safer. Effort: 30 min to benchmark.

6. [profile.release] uses strip = "debuginfo" not strip = "symbols"

strip = "debuginfo" retains symbol names (useful for crash reports) but ships larger binaries than strip = "symbols". For production CLI distribution, strip = "symbols" yields the smallest binary. If crash reporting is a priority, keep debuginfo; otherwise switch. Effort: trivial.

7. oratio + oratio-mic symphonia/rubato duplication

Both crates/vox-oratio/Cargo.toml and crates/vox-plugin-oratio/Cargo.toml independently declare optional symphonia and rubato dependencies at the same versions. These are only needed when the audio feature is active. The plugin version should delegate audio decoding to the vox-oratio crate rather than re-declaring the same heavy optional deps. Effort: 1–2 days (requires careful feature-flag plumbing).

Build / Distribution Optimization

Recommendation	Priority	Effort	Notes
Extract shared NVML helpers into `vox-nvml-queries`	High	1 day	Eliminates dual maintenance surface
Archive/delete `vox-plugin-cuda-spike`	High	30 min	Removes unconditional CUDA feature pull
Add SP7 TODO markers to scaffold stubs	Medium	1 hr	Enables `rg TODO.*SP7` milestone scan
Uncomment hakari platform triples + regenerate	Low	15 min	Better workspace-hack deduplication on CI
Benchmark `codegen-units = 1` + fat-LTO for `vox-cli`	Low	30 min	Potential 5–15% binary size improvement
Consolidate `symphonia`/`rubato` into `vox-oratio` only	Low	1–2 days	Reduces dep duplication; risk: feature-flag complexity

AgentSkills Compliance

The new vox ci agentskills-compliance guard (added this sprint) enforces:

Frontmatter block required
name: ^[a-z0-9][a-z0-9-]{0,63}$, matches crate directory short-name
description: present, ≤ 1024 chars

All 10 current skill files pass as of 2026-05-08. Guard runs on every PR via vox ci agentskills-compliance.

Recommendations (prioritized)

Delete vox-plugin-cuda-spike — high impact, trivial effort, immediate win on feature resolution time.
Extract vox-nvml-queries — prevents future divergence as NVML API evolves.
Add // TODO(SP7): markers — makes milestone completeness mechanically checkable.
Regenerate workspace-hack with platform triples — cheapest build-time win.
Benchmark fat-LTO + codegen-units = 1 on vox-cli profile before committing (may increase build time).