SCIENTIA Self-Publication Gap Map (2026)
SCIENTIA Self-Publication Gap Map (2026)
Section titled “SCIENTIA Self-Publication Gap Map (2026)”Companion to: SCIENTIA Self-Publication Finalization Plan (2026). That plan’s Phases 0–10 are marked complete as of 2026-05-09. This document does not re-spec those phases — it inventories what is still missing for a developer on this codebase to go from
git logto a published, citable artifact in a single coherent flow.Per-gap implementation plans: See Phase index for outline plans A–H. Each plan is promoted from outline to detailed TDD-step fidelity when it becomes next-to-execute, matching the Finalization Plan’s §12 rhythm.
0. Orientation: what the Finalization Plan already provides
Section titled “0. Orientation: what the Finalization Plan already provides”To prevent redesign, the following are out of scope for this gap map because they are built and complete per the Finalization Plan:
- Signal ladder T0→T4 with rigor gates (§2 of the plan).
- Claim extraction: VeriScore + atomic + span integrity + MiniCheck +
calibrated ABSTAIN (
crates/vox-claim-extractor/). - Pre-registration as signed code object with deviation detection
(
crates/vox-prereg/). - Nanopublication emission and RO-Crate 1.2 packaging (
crates/vox-nanopub/,crates/vox-ro-crate/). - Atomic-NEI novelty against time-bounded corpus, SPECTER2 retrieval,
ChronoFact, EvidenceConflict family (
crates/vox-inspect-bridge/). - AISI Inspect adapter; symbolic verifiers wired into
confidence_fusion.rs. - 14-day right-of-reply window, retraction nanopubs, living-review version DOIs, COPE retraction workflow.
- Scholarly adapters: arXiv, OSF, Crossref deposit, Zenodo versioning, OpenReview revision, ORCID PKCE OAuth, F1000 publish-then-review.
- AtlasSubmissionGate with negative-result quota; venue catalog whitelist; predatory-journal refusal.
- AI-disclosure block auto-fill; CRediT + COI declarations.
- Format adaptation for short-form (Bluesky, schematic-only figures).
The Provider Atlas longitudinal AI-epidemiology paper is the plan’s terminal artifact — but it is one publication track, not the user-journey substrate.
1. The end-to-end user journey today (with gaps marked)
Section titled “1. The end-to-end user journey today (with gaps marked)”A developer working in this codebase wanting to self-publish takes this path. Steps marked (gap N) are not yet served by a single command or surface.
- Develop normally. Commits land, CI runs benchmarks, telemetry flows, training jobs finish. ✅ (observed by existing infra)
- “Is any of this publishable?” → (gap A) No
vox scientia scoutexists. Discovery scan operates on already-prepared manifests, not on raw repo/CI/telemetry state. - “What candidate did I just produce?” → (gap A continued, gap F)
The
finding-candidate.v1ledger row schema exists; the producers that mine commit graph, benchmark CI history, MENS training runs, and Vox-internal telemetry to emit candidate rows are thin. - Pre-register the hypothesis ✅ (Phase 2 complete).
- Run the experiment. ✅
- “Did the result reproduce?” → (gap B)
artifact_replayabilityis in the worthiness rubric as a[0,1]metric, but appears operator-asserted. No runner re-executes the RO-Crate in a clean sandbox and writes the measured score back. - Promote claims through the signal ladder. ✅
- “Draft the paper.” → (gap C) Phase 7 covers short-form (Bluesky/schematic). No long-form IMRaD scaffolder that fills only the safe slots from RO-Crate + verified claims and leaves provenance-bound TODOs for the human narrative.
- Get dual approval. ✅ (if you have a co-author). (gap D) Solo developer path requires either a recruited human or a designed audited-LLM-critic-as-second-approver flow under AI-author disclosure.
- Decide the venue. ✅ (journal-fit recommender exists). (gap E)
The recommender’s first-class output is the Provider Atlas track. The
AI/SWE micro-publication track —
algorithmic_improvement,reproducibility_infra,telemetry_trust,policy_governance— enumerates candidate classes but does not have a parallel publication-format + venue-routing track to match. - Submit. ✅
- Right-of-reply, peer review, revisions, retraction. ✅
- “Where does the published artifact live for readers?” → (gap G)
Zenodo + arXiv host the deposit;
vox-plugin-publicationsyndicates outward. A Vox-native landing surface (findings.vox-lang.org/<trusty-uri>or analogous in the docs site / dashboard) that is the publication’s canonical home is not yet in evidence. - “What’s in my queue?” → (gap H) No dashboard panel surfacing candidates awaiting prereg, claims awaiting verification, manifests in reply-window, retraction queue.
2. Gap inventory
Section titled “2. Gap inventory”Gap A — Self-observation candidate producers
Section titled “Gap A — Self-observation candidate producers”What. Detectors that turn the user’s own repo/CI/telemetry/training-run
activity into finding-candidate.v1 rows.
Evidence. contracts/scientia/finding-candidate.v1.schema.json defines
the candidate type with classes algorithmic_improvement,
reproducibility_infra, policy_governance, telemetry_trust. The
Finalization Plan’s Phase 6 wires ProviderObservation /
ModelCapabilityEvidence families (about external providers Vox calls)
into the candidate flow. The orthogonal flow — Vox observes itself — has
the schemas but missing producers. No grep hit for a commit-graph miner,
benchmark-CI-delta detector, or MENS-training novelty detector that
populates the ledger.
Why it matters. Without these, the candidate ledger is fed only by the Provider Atlas path. A developer making a real software-engineering or algorithmic contribution in this codebase has no automatic path from their work into a publication candidate.
Concretely missing.
- Commit-graph miner: perf-improving merges (use existing
ExecTimeRecord), novel-symbol detectors over recent diffs, MDL-style compression gain over unfamiliar regions of the AST. - Benchmark-CI history detector: regression and improvement spikes over a preregistered baseline.
- MENS-training-run hook (per the in-flight Mn-T1..T15 plan): emit candidates from new SOTA checkpoints, novel attention patterns, repeatable training regressions.
- Vox-internal telemetry detector for
reproducibility_infraandpolicy_governancecandidates (Socrates surface is collected but not routed to the ledger as candidates about Vox itself).
Dependencies. None upstream; reuses existing vox-research-events
emitter and the candidate schema.
Effort. Medium. One producer per source; each ~400–800 LoC plus detector heuristics; producers compose, so it can ship incrementally.
Gap B — Replay runner that measures artifact_replayability
Section titled “Gap B — Replay runner that measures artifact_replayability”What. A runner that, given a publication_manifest and its RO-Crate,
re-executes the experiment in a sandboxed environment and writes the
measured replayability score back to the worthiness signals.
Evidence. artifact_replayability appears in
crates/vox-publisher/src/publication_worthiness.rs,
worthiness_extraction.rs, and the rubric
(docs/src/reference/scientia-publication-worthiness-rules.md), but the
measurement source is absent. No vox-replay-runner crate; no
replay_artifact API surface; no CI hook that re-executes RO-Crate
deposits.
Why it matters. The Finalization Plan takes credibility-via-symbolic-
verification very seriously (the AlphaEvolve thesis, §3.4). An asserted
replayability number sits in tension with that posture. A measured score —
even a coarse one (“the manifest’s entry_point ran to a green exit code
and produced an output hash matching the deposited one”) — converts a soft
rubric field into a verifier-backed claim.
Dependencies. Needs RO-Crate entry_point convention (Phase 4) — ✅.
Sandbox isolation: reuse existing CI runner, or a thin
vox-sandbox-runner shim.
Effort. Medium-small. Sandbox + replay harness; metric writeback; worthiness rubric update.
Gap C — Long-form manuscript scaffolder
Section titled “Gap C — Long-form manuscript scaffolder”What. A constrained-grammar emitter that produces an IMRaD markdown
skeleton (Intro / Methods / Results / Discussion) from a FindingCandidate
- its verified claims + its RO-Crate, filling only the safe slots and leaving provenance-bound TODOs for the human narrative.
Evidence. Phase 7 covers short-form adaptation via
crates/vox-research-events/src/publication_format.rs
(ShortFormVariant, adapt_claim_to_platform). No long-form analogue;
no grep hit for manuscript, IMRaD, or paper-scaffolding in
non-archived code.
Why it matters. The rubric explicitly forbids auto-generating
novelty/significance assertions, causal mechanism claims, and
measurement-implying figures. Fine. But the blank-page barrier for a
solo developer is real, and the safe slots (methods derived from RO-Crate
declarations; results table lifted from verified claim envelopes; citation
stubs from the SPECTER2-verified prior-art set; AI-disclosure block from
AiDisclosureBlock::build) are all already provenance-bound. Filling them
mechanically does not violate the red lines.
Concretely. A vox scientia manuscript-draft --publication-id <id>
command emitting <id>.imrad.md with:
- Methods section auto-filled from RO-Crate
mainEntity+ preregeval_substrate. - Results table from claim envelopes (one row per verified atomic claim, with Trusty URI link).
- References block from the SPECTER2-retrieved prior-art set used for novelty.
- Acknowledgments, ORCID/ROR, AI-disclosure auto-filled.
- Discussion / Intro / Significance sections present as
<!-- TODO(narrative): -->blocks the human writes by hand.
Dependencies. Reuses RO-Crate (Phase 4), claim extractor (Phase 1),
prereg (Phase 2), vox-constrained-gen.
Effort. Medium. Templating + grammar; the verified-only constraint is the discipline.
Gap D — Solo-author critic-gate path
Section titled “Gap D — Solo-author critic-gate path”What. A designed path for a single developer to clear the dual-distinct- approver gate without requiring a second human, under explicit AI-author disclosure.
Evidence. Dual-approver is hard-required at the digest level in
crates/vox-db/src/store/ops_publication.rs and surfaced in the playbook.
machine-suggestion-block.schema.json carries
machine_suggested/requires_human_review flags. AI-disclosure block
exists (vox-ro-crate/src/ai_disclosure.rs). Phase 9 of the Finalization
Plan onboards an academic co-author — solo is not the primary mode.
Why it matters. A real Vox self-publication user is often solo. The plan’s posture (Galactica risk, GPT-4-grades-GPT-4 critique) requires that any non-human approver be (a) explicitly disclosed, (b) a different substrate than what produced the artifact, (c) bound to the content digest, and (d) auditable.
Concretely. A second-approver role of AuditedLLMCritic that:
- Submits a signed approval bound to
content_sha3_256from a key registered under an ORCID-distinct identity. - Is automatically disclosed in
AiDisclosureBlockas a contributor with the CRediT roleValidationonly (neverInvestigationorWriting — original draft). - Uses a model architecturally different from any used in the artifact pipeline (enforced by recording model fingerprints in both manifest and critic approval).
- Refuses to approve if the artifact’s worthiness signals include any contribution from the same model family — closes the GPT-4-grades-GPT-4 hole.
- The published artifact’s venue catalog must allow LLM-critic
approval (a new venue-catalog flag); IMC/MLSys/TMLR would set this
false, F1000-track and Zenodo-only deposits could set ittrue.
Dependencies. Venue catalog flag; ORCID-distinct critic identity; critic model fingerprinting.
Effort. Small-medium. Mostly policy plumbing plus the critic identity machinery.
Gap E — AI/SWE micro-publication track (non-Atlas)
Section titled “Gap E — AI/SWE micro-publication track (non-Atlas)”What. A publication track for individual algorithmic_improvement /
reproducibility_infra / telemetry_trust / policy_governance findings
that does not route through the quarterly Provider Atlas.
Evidence. FindingCandidateClass enumerates these classes
(crates/vox-research-events/src/schema_types.rs); the candidate ledger
accepts them. venue-catalog.v1.yaml has IMC/MLSys/TMLR plus Distill-style
and Living Reviews entries — but the journal-fit recommender is geared
toward longitudinal measurement papers.
Why it matters. The user’s question is broader than provider
measurement — it includes “I built a thing in this codebase and want to
publish about it.” Examples already in this repo: vox-prereg itself is
publishable (PL/SE methods venue); vox-arch-check is publishable
(software-architecture venue); the SSOT migration policy
(mesh-and-language-distribution-ssot-2026.md §5.5) is publishable
(empirical software engineering). None of these fit the Atlas mold.
Concretely.
- Per-class venue mappings: SWE-PL improvements → ICSE/FSE/OOPSLA/PLDI; reproducibility infra → REP/MSR/ICSE-SEIP; telemetry trust → MLSys/SOSP workshops; policy governance → AIES/FAccT.
- Per-class artifact-format profile: short systems papers ≠ Atlas measurement papers; figure norms differ; reproducibility-package expectations differ.
- Per-class default reply-window length (14 days is right for Atlas; open-review tracks differ).
Dependencies. Gap C makes this much more usable (each class implies a different IMRaD weight). Otherwise none.
Effort. Small (config + venue catalog rows) + medium (per-class artifact-format profiles).
Gap F — vox scientia scout single-command surface
Section titled “Gap F — vox scientia scout single-command surface”What. One command that, run in a Vox workspace, surveys recent
commits / CI history / telemetry / training-run outputs and prints a
ranked candidate list with proposed candidate_class, suggested venue,
and a “promote-to-ledger” affordance.
Evidence. Discovery surface today is
publication-discovery-scan/explain/refresh-evidence — all over
already-prepared manifests. No “scan repo state from scratch” command.
Why it matters. This is the front door for the user journey. Without it, the developer has to know to prepare a manifest before discovery ever runs. With it, the system surfaces opportunity proactively.
Concretely.
- Subcommand under
vox scientia. - Aggregates output from Gap A’s producers.
- Prints a table: candidate-id, class, top signals, suggested venue,
recommended next command (
publication-prepare,prereg-draft, etc.). - Optional
--watchmode for daemon-style monitoring; emits OS notifications when a strong-strength candidate appears.
Dependencies. Gap A is the prerequisite (no producers, nothing to report).
Effort. Small (CLI + table rendering); leverages Gap A.
Gap G — Vox-native publication reading surface
Section titled “Gap G — Vox-native publication reading surface”What. A canonical landing page for each published manifest, hosted on Vox-owned infrastructure, distinct from Zenodo/arXiv deposits.
Evidence. vox-plugin-publication does RSS/Atom + Reddit/YouTube
syndication. docs/src/index.mdx is the docs landing. No /findings/ or
/publications/ route surfaces canonical published artifacts. Phase 4
notes Highwire-style meta tags for Google Scholar pickup but doesn’t
specify the host page.
Why it matters. External deposits (Zenodo, arXiv) own the DOI but not the reading experience. Distill-style web-native artifacts — first-class in the Finalization Plan’s venue catalog — require an HTML host. Living Reviews need a stable canonical URL pointing at “latest version” with the version history table inline.
Concretely.
- A
findings/<trusty-uri>route on the docs SSG or dashboard, rendering the RO-Crate’s text body + claim table + version history + reply thread inline. - Highwire meta tags for Scholar pickup.
- Embedded nanopub viewer for atomic claims.
- Reply window status badge.
Dependencies. RO-Crate (Phase 4) — ✅.
Effort. Medium. SSG route + renderer; relies on existing docs infrastructure.
Gap H — Discovery dashboard panel
Section titled “Gap H — Discovery dashboard panel”What. A dashboard panel surfacing the publication-pipeline queue: candidates by class, claims awaiting verification, manifests in reply-window, retraction queue, cost dashboard rollup.
Evidence. Phase 6 closed the loop end-to-end but is backend. The
dashboard plan (docs/superpowers/plans/ci/2026-05-03-vox-dashboard-claude-design-port.md)
exists but I did not verify a Scientia panel.
Why it matters. Operator visibility into the pipeline is currently CLI JSON. A panel converts the workflow into a glanceable surface and surfaces stalls (e.g., a candidate stuck in evidence-incomplete for 30 days).
Dependencies. Existing tables (publication_manifests,
publication_status_events, external_submission_jobs,
finding_candidates ledger).
Effort. Small-medium. Mostly UI work over existing data.
3. Priority and dependency matrix
Section titled “3. Priority and dependency matrix”| Gap | Severity | Effort | Unlock value | Depends on |
|---|---|---|---|---|
| A — self-observation producers | high | medium | unlocks F, gates E credibility | none |
| B — replay runner | high | medium-small | closes credibility hole in rubric | RO-Crate ✅ |
| C — IMRaD scaffolder | high | medium | removes blank-page barrier | claim extractor ✅, RO-Crate ✅, prereg ✅ |
| D — solo critic-gate | high (for solo users) | small-medium | unlocks solo self-publication | AI-disclosure ✅, venue catalog ✅ |
| E — AI/SWE micro-track | medium | small + medium | broadens publication scope beyond Atlas | C |
F — vox scientia scout | medium | small | front-door UX | A |
| G — reading surface | medium | medium | hosts Distill-style + Living-Review | RO-Crate ✅ |
| H — dashboard panel | low | small-medium | operator UX | none |
Dependency chains
Section titled “Dependency chains”A ──> F (scout needs producers)A ──> E (credibility) (AI/SWE micropapers need own-repo candidates)C ──> E (usability) (per-class scaffolds make track usable)B ──> rubric credibility (measured replayability)D ──> solo workflow (independent of others)G, H ──> independent UX winsThe high-severity cluster is A + B + C + D. A and B are independent and can ship in parallel; C and D are independent and can ship in parallel; F is a thin wrapper over A once A exists; E is a config + per-class extension of C; G and H are UX wins that do not block any other gap.
4. Recommended first slice
Section titled “4. Recommended first slice”If this gap-map gets approved, the highest-leverage minimal first slice is:
Slice 1 — “From commits to a candidate row, with a measured replay.”
Scope:
- Gap A, narrowest cut: one producer for
algorithmic_improvementcandidates from commit-graph + benchmark CI history (reuseExecTimeRecord). - Gap B, narrowest cut: replay runner that re-executes the RO-Crate
mainEntityentry point in a fresh worktree and writes the measured replayability score (binary pass/fail + output-hash match) back to worthiness signals. - Gap F, thin wrapper:
vox scientia scoutlisting candidates from this one producer.
Why this slice: it makes the front of the user journey (steps 1–7 above) work end-to-end for the most common case (a developer made a perf improvement), with measured (not asserted) replayability. Everything downstream of step 7 is already strong.
Slice 2 candidates: Gap C (manuscript scaffolder) or Gap D (solo critic gate) — independent, pick by user demand.
5. Open questions
Section titled “5. Open questions”- Producer pluggability. Should signal producers be a plugin contract (so third parties can add detectors) or an L1 trait with a fixed set?
- Solo critic identity provenance. Is an ORCID-distinct identity
for an LLM critic ethically acceptable in the venues we care about?
(Atlas: probably not. F1000 / Zenodo-deposit: probably yes. Needs
per-venue declaration in
venue-catalog.v1.yaml.) - Replay determinism. What’s the minimal RO-Crate
mainEntitycontract to make replay tractable? GPU-bound experiments need a different cut than CPU-bound. - Reading surface hosting. Docs SSG or dashboard? Or both with the dashboard as edit surface and SSG as canonical reader?
- Negative-result quota for the AI/SWE micro-track. The Atlas has a 3-positive : 1-null quota. Does the SWE micro-track inherit this or set its own?
6. Cross-references
Section titled “6. Cross-references”- SCIENTIA Self-Publication Finalization Plan (2026) — supersedes this for Phases 0–10.
- SCIENTIA SSOT handbook — lifecycle and status vocabulary.
- Publication worthiness rules — rubric and red lines.
- How-to: Publish Scientia findings — operator flow.
- Where Things Live — concept → crate map.
- Mesh & Language Distribution SSOT (2026) — §3.5 Hopper track, §5.5 migration policy, §5.6 routes.