Telemetry and research_metrics contract
Telemetry & research_metrics contract
Section titled “Telemetry & research_metrics contract”Related SSOT
Section titled “Related SSOT”- Telemetry trust boundary and SSOT map
- Telemetry taxonomy and contracts SSOT (roadmap)
- Telemetry retention and sensitivity SSOT (roadmap)
- Telemetry client disclosure SSOT
- Telemetry implementation blueprint 2026 and backlog
- Optional explicit remote upload (local JSON spool, not
research_metrics): ADR 023, Telemetry remote sink specification, CLIvox telemetry
Code enforcement for row validation: validate_research_metric_row (called from append_research_metric). Repository-scoped producers should use TelemetryWriteOptions plus the METRIC_TYPE_* / SESSION_PREFIX_* / SESSION_ID_* constants in vox_db::research_metrics_contract.
Row shape
Section titled “Row shape”Table research_metrics columns: session_id, metric_type, metric_value (nullable REAL), metadata_json.
metric_value: optional scalar. SQLNULLmeans “no scalar” — APIs must not coerce NULL to0.0(aggregations skip nulls; seelist_research_metrics_by_type).metadata_json: structured payload; may include units and names that disambiguate mixed benchmarks.
Validation limits (writes)
Section titled “Validation limits (writes)”| Field | Rule |
|---|---|
session_id | Non-empty; max 512 UTF-8 characters. |
metric_type | Non-empty; max 128 characters; characters must be ASCII alphanumeric or _, ., -, : (colon allows MCP-linked namespaces such as foo:bar). |
metadata_json | Optional; if present, max 256 KiB serialized length. |
Session id namespaces (convention)
Section titled “Session id namespaces (convention)”Producers should prefix session_id so rollups and dashboards can group without colliding:
| Prefix | Example | Typical producer |
|---|---|---|
bench: | bench:<repository_id> | CLI / build timings |
syntaxk: | syntaxk:<repository_id> | Syntax-K eval fixtures |
mcp: | mcp:<repository_id> | MCP Socrates / surface telemetry |
mens: | mens:<repository_id> | Populi control-plane audit (populi_control_event) |
workflow: | workflow:<repository_id> | Interpreted workflow journal (workflow_journal_entry, versioned event payloads from the workflow durability contract) |
route: | route:<repository_id> | Routing policy and capability-gate telemetry (model_route_event) |
Fixed session (no repository in id): hybrid memory fusion uses session socrates:retrieval and metric type memory_hybrid_fusion (see SESSION_ID_MEMORY_HYBRID_FUSION in the Rust module).
Questioning / linked metrics: MCP may use opaque session_key strings for questioning_event and vox_db_research_metric_linked (not forced through TelemetryWriteOptions); those rows still must satisfy validation caps above.
Metric types (non-exhaustive)
Section titled “Metric types (non-exhaustive)”metric_type | Session prefix | Scalar semantics | Notes |
|---|---|---|---|
benchmark_event | bench:<repository_id> | Optional; unit in metadata metric_value_unit | CLI build timings use seconds for wall time. |
syntax_k_event | syntaxk:<repository_id> | Optional ratio / timing | Fixture id in metadata; optional support_metrics (representability / LLM surface / runtime projection summaries per contracts/eval/syntax-k-event.schema.json). |
socrates_surface | mcp:<repository_id> | Hallucination-risk proxy | Prefer metadata for interpretability; eval summaries inject explicit denominators (below). |
agent_exec_time | mcp:<repository_id> | Optional | Tool execution duration/budget events used for execution-time calibration. |
model_route_event | route:<repository_id> | Optional | Runtime/orchestrator route decisions. metadata_json must include trace_id and route_policy_profile. |
model_call_event | route: / mcp: / producer-defined | Optional | Per-call LLM / provider invocation audit row (latency, provider id, tokens when present). |
task.root_summary | producer-defined | Optional | High-level task / conversation rollup for dashboards. |
build.summary | bench: | Optional | Build or CI lane rollup (METRIC_TYPE_BUILD_SUMMARY_EVENT). |
telemetry.error | producer-defined | Optional | Structured error / fault envelope (METRIC_TYPE_ERROR_EVENT); use for HTTP failures, rate limits, and similar fault paths. |
orch.circuit_breaker.trip | route:<repository_id> or orchestrator session key | Optional counter / trip marker | Emitted when the orchestrator doom-loop circuit breaker transitions to Open (METRIC_TYPE_CIRCUIT_BREAKER_TRIP). |
orch.socrates.fusion | orchestrator / MCP session | Optional | Socrates eval fusion rollup (METRIC_TYPE_SOCRATES_FUSION). |
orch.routing.tier | route:<repository_id> | Optional | Tier routing decision telemetry (METRIC_TYPE_MODEL_TIER_ROUTE). |
orch.plan.mode_decision | orchestrator session | Optional | Plan-mode selection (METRIC_TYPE_PLAN_MODE_DECISION). |
orch.hitl.interrupt | orchestrator session | Optional | Human-in-the-loop interrupt signal (METRIC_TYPE_HITL_INTERRUPT). |
orch.risk.score | orchestrator session | Optional | Orchestrator risk scoring (METRIC_TYPE_RISK_SCORE). |
orch.privacy.route_decision | orchestrator session | Optional | Privacy-aware routing outcome (METRIC_TYPE_PRIVACY_ROUTE_DECISION). |
orch.cache.hit_prediction | orchestrator session | Optional | Cache hit / predictor telemetry (METRIC_TYPE_CACHE_HIT_PREDICTION). |
orch.budget.decision | orchestrator session | Optional | Budget gate decisions (METRIC_TYPE_BUDGET_DECISION). |
orch.calibration.run | orchestrator session | Optional | Calibration job lifecycle (METRIC_TYPE_CALIBRATION_RUN). |
orch.calibration.drift_alert | orchestrator session | Optional | Model / signal drift alert (METRIC_TYPE_DRIFT_ALERT). |
orch.calibration.bandit_update | orchestrator session | Optional | Bandit / Thompson-style policy update (METRIC_TYPE_BANDIT_UPDATE). |
orch.subagent.dispatch | orchestrator session | Optional | Sub-agent dispatch (METRIC_TYPE_SUBAGENT_DISPATCH). |
orch.subagent.chain_depth_alert | orchestrator session | Optional | Deep sub-agent chain warning (METRIC_TYPE_CHAIN_DEPTH_ALERT). |
orch.agentos.guardrail_deny | orchestrator session | Optional | AgentOS guardrail denial (METRIC_TYPE_AGENTOS_GUARDRAIL_DENY). |
socrates_surface aggregate metadata (record_socrates_eval_summary)
Section titled “socrates_surface aggregate metadata (record_socrates_eval_summary)”Rollups written to eval_runs include JSON with both raw counts and explicit denominators so downstream tools do not misread rates when some rows lack a scalar or parseable metadata:
rate_denominator: literal"parsed_metadata_rows"— rates (answer_rate,abstain_rate) use this count.abstain_rate_denominator_n/answer_rate_denominator_n: same asparsed_metadata_rows.mean_proxy_denominator_n:rows_with_metric_value— mean hallucination-risk proxy uses only rows wheremetric_valuewas non-NULL.rows_total_n:sample_size— allsocrates_surfacerows scanned.
Quality in eval_runs uses the mean proxy only when rows_with_metric_value > 0; otherwise quality is 0.0 (avoids implying a perfect score with no scalar signal).
benchmark_event metadata (BenchmarkEventMeta)
Section titled “benchmark_event metadata (BenchmarkEventMeta)”name: logical benchmark id (cargo_build_metrics, …).metric_value_unit: whenmetric_valueis set, unit SSOT (seconds,milliseconds,ratio, …).details: free-form JSON (per-crate timings, pass/fail flags).
Build timing producers (current)
Section titled “Build timing producers (current)”vox ci build-timings(shallow lanes) writesbenchmark_eventnameci_build_timingswith:metric_value: total wall time inseconds,metric_value_unit:seconds,details: lane rows (lane,ok,ms) plustotal_ms.
vox ci build-timings --deepwrites structured rows tobuild_run/build_crate_sample/build_warning; on structured-write fallback it writesbenchmark_eventnamecargo_build_metricswithmetric_value_unit = seconds.VOX_BENCHMARK_TELEMETRY=1controlsbenchmark_eventwrites; structuredbuild_*writes follow command persistence settings and VoxDB availability.
For cross-repo querying via MCP, benchmark_event may use name = "cross_repo_query" with metric_value_unit = "milliseconds" and details such as:
query_kindtrace_idcorrelation_idconversation_idworkspace_repository_idtarget_repository_idssource_planequery_backendresult_countskipped_count
Training JSONL (telemetry.jsonl)
Section titled “Training JSONL (telemetry.jsonl)”Envelope per line: { "ts_ms", "event", "payload" }. Payload keys are defined in crates/vox-populi/src/mens/tensor/telemetry_schema.rs (e.g. eta_seconds_remaining, steps_per_sec_ema). The CLI viewer vox mens watch-telemetry must track this schema (guarded by vox ci data-ssot-guards).
Mens training KPI ownership (decision-driving)
Section titled “Mens training KPI ownership (decision-driving)”- Tier 1 (gate-driving):
tokens_per_sec(withtokens_per_sec_is_proxywhen derived),valid_tokens,theoretical_tokens,supervised_ratio_pct.
- Tier 2 (diagnostic):
steps_per_sec_ema,eta_seconds_remaining,- skip counters (
skip_no_supervised_positions,skip_short_seq, …).
Deprecation / compatibility window
Section titled “Deprecation / compatibility window”- Consumers should prefer canonical fields above.
- Legacy aliases are still read with warnings (status / eval-gate paths), then normalized at read time.
steps_per_sec_emaas a throughput surrogate is considered deprecated for gates whentokens_per_secis present.
vox ci data-ssot-guards— asserts watch-telemetry references schema keys andresearch_metricslist API avoidsCOALESCE(metric_value, 0.0).- Web IR structural gate: workflow sets
VOX_WEBIR_VALIDATE=1and runscargo nextest run -p vox-compiler --test web_ir_lower_emit_test --run-ignored ignored-only(see.github/workflows/ci.yml).