The Vox Contribution Loop
The Vox Contribution Loop
Section titled “The Vox Contribution Loop”Every quality gate in this repository has two jobs: (1) keep the codebase sound, and (2) keep the training corpus clean. This page explains the loop and what it means for your contributions.
The shipped loop (today)
Section titled “The shipped loop (today)”① WRITE .vox files, Rust code, golden examples │② VERIFY ← where most of your friction happens vox stub-check — zero stubs / hollow fns cargo check/test — compiler + unit tests green (prefer `-p <crate>` / nextest — see [AI dev loop overhead](../architecture/ai-dev-loop-overhead-2026.md)) vox corpus eval — .vox parse_rate ≥ 99.5% │ ↓ fails here → negative example pool③ INGEST examples/golden/**/*.vox ─→ vox corpus validate-batch synthetic.jsonl ─→ synthetic_valid.jsonl ─→ golden_validated.jsonl │④ TRAIN vox mens train (Candle QLoRA, local GPU or cloud) │⑤ IMPROVE Better .vox completions via LSP + local Populi serve └─→ back to ①The verify step is the filter. Contributions that pass become training data; contributions with stubs, hollow functions, or parse failures become negative examples — the model learns to avoid those patterns.
What makes a contribution training-eligible
Section titled “What makes a contribution training-eligible”To land in the positive training pool, a .vox file or Rust change must:
| Check | Command | Threshold |
|---|---|---|
| Zero stubs and hollow fns | vox stub-check --path <dir> | No Error findings |
| Compiler clean | cargo check -p <crate> | Zero errors |
| Tests present and passing | cargo test -p <crate> | Green |
.vox parse rate | vox corpus eval --mode ast | ≥ 99.5% |
| No CRLF line endings | vox ci line-endings | Zero CRLF |
| Docs code blocks valid | // vox:skip or {{#include}} | No bare snippets |
@test block exists for new .vox capability | written before the implementation | One @test per new exported fn |
@test-first for golden examples
Section titled “@test-first for golden examples”The highest-signal workflow for new .vox capabilities follows a Red → Green → Ingest loop:
- Red — write an
@testblock that calls the function you intend to add. The function doesn’t exist yet, sovox checkfails. That failure is the spec. - Green — implement the function until
vox checkandcargo testpass. - Refactor — clean up while keeping the test green.
- Ingest — run
vox corpus evaland commit. The@testblock raisesr_testin the planned GRPO reward signal (see §Planned additions).
Skipping step 1 (writing the implementation before the test) is not an error today, but it produces lower-quality corpus entries: the model learns the output without learning the intention. Agents are expected to follow @test-first for any new exported function added to examples/golden/.
What sends code to the negative pool
Section titled “What sends code to the negative pool”The system generates negative training examples from:
stub/todo,stub/unimplemented,skeleton/hollow-fnfindings- Missing
@testfor new exported.voxfunctions inexamples/golden/(flagged byskeleton/no-test-for-pub-fn) .voxparse failures duringvox corpus validate-batch- MCP pre-emit validation failures (planned — see roadmap section)
- Replans triggered by failed victory-condition tiers
This is not punitive — negative examples are essential for DPO training.
But it means AI-generated skeleton code that looks plausible does real harm if
it enters the corpus unchecked. The VictoryClaimDetector specifically watches
for “implementation complete” adjacent to unimplemented!().
The golden examples path
Section titled “The golden examples path”The highest-signal contribution you can make to MENS is a well-formed golden example that follows @test-first (see §@test-first for golden examples above):
examples/golden/<capability>.voxGolden files are compiled against the current compiler in CI
(cargo test -p vox-compiler --test golden_vox_examples), validated for
corpus quality, and have first-priority ingest into the training mix.
See the examples SSOT for the declared golden roots and the golden examples corpus guide for how to add one correctly.
Checking your own contribution’s quality
Section titled “Checking your own contribution’s quality”# 1. Stub check on your directorycargo run -p vox-cli --features stub-check -- stub-check crates/your-crate
# 2. Compiler + testscargo check -p your-cratecargo test -p your-crate
# 3. .vox corpus quality (if you touched .vox files)cargo run -p vox-cli -- corpus eval --mode ast examples/golden/
# 4. Push-ready parity (single command — mirrors merge-blocking CI subset)cargo run -p vox-cli -- ci pre-push# Narrow doc-only iteration: `vox ci pre-push --quick` still runs doc lint + doctest-md + drift-check# Optional timings artifact: add `--report-json target/local/pre-push-last.json`After merging a snapshot-touching PR
Section titled “After merging a snapshot-touching PR”The test suite uses insta for snapshot assertions. CI runs with
INSTA_UPDATE=unseen (.github/workflows/ci.yml) so
new snapshots auto-accept in CI without failing the build, and the resulting
tests/snapshots/ directories are uploaded as the insta-snapshots artifact. Changed
snapshots still fail.
If your PR added or moved snapshots, baselines do not commit themselves. Either:
- Run the suite locally before merging so the new
.snapfiles appear in your working tree, thengit addthem alongside your code changes. (Preferred — keeps the PR self-contained.) - After merge, download the
insta-snapshotsartifact from the merged CI run, drop the new.snapfiles intocrates/<crate>/tests/snapshots/, commit, and push as a follow-up.
Without one of the two, every later contributor sees the snapshot tests fail locally with
“snapshot assertion failed” on tests they didn’t touch — exactly because the baseline only
ever existed in the CI artifact, not in the repo. Three separate cleanup commits in
2026-05 (reactive_smoke_test, state_machine_integration_test, web_ir_lower_emit_test)
were needed to drain orphans accumulated from this gap; do not let it recur.
The orphan signal is a tests/snapshots/*.snap.new file with no matching .snap next to
it. To clear: spot-check the .snap.new content (does the recorded output match what your
test should produce?), then accept via cargo insta accept (or rename .snap.new →
.snap if cargo-insta isn’t installed). Run the suite again and repeat — each accepted
baseline can unblock previously-skipped tests that produce their own first-run snapshots.
Planned additions (roadmap — Wave 7–9)
Section titled “Planned additions (roadmap — Wave 7–9)”These are not yet shipped. They describe the direction from
vox_agentic_loop_and_mens_plan.md.
Scientia auto-ingest (Wave 7): IDE sessions will be observed by
ScientiaObserver. Sessions that produce valid .vox with high
worthiness_score auto-ingest as training rows without manual corpus tooling.
Sessions that trigger multiple replans auto-ingest as negative examples.
GRPO reward shaping (Wave 9): Instead of SFT-only training, the model will be scored on three signals per generated candidate:
r_syntax— parse passes (0/1)r_test—@testblock pass rater_coverage— AST construct richness
Combined reward: 0.6×parse + 0.3×test + 0.1×coverage. This makes test
coverage inside .vox files a first-class quality signal.
Related
Section titled “Related”- TOESTUB contributor guide — fix specific CI failures
- Vox source → MENS pipeline SSOT — authoritative technical crosswalk
- Mens native training SSOT — training pipeline reference
- AI agent panic and shortcut pathology — why shortcuts harm the corpus