Inference Tuning Resolution Research 2026
Inference Tuning Resolution Research 2026
Section titled “Inference Tuning Resolution Research 2026”Objective
Section titled “Objective”Establish a robust, secure, and predictable mechanism for propagating inference parameters (temperature, top_p) from diverse sources (MCP clients, environment variables, and tool defaults) to LLM providers.
The Problem
Section titled “The Problem”Prior to April 2026, inference tuning was fragmented. Some tools hardcoded 0.1 for stability, others used 0.7 for variety, and there was no way for a user to override these values without modifying the source code. This prevented power users from fine-tuning reasoning models or relaxing the temperature for creative tasks.
Resolution Model: The 3-Tier Precedence
Section titled “Resolution Model: The 3-Tier Precedence”We have adopted a strict 3-tier resolution model for all inference calls:
| Priority | Source | Description |
|---|---|---|
| 1 (Highest) | Request Override | Parameters passed directly in the MCP tool call (e.g., temperature field in ChatMessageParams). |
| 2 | Registry Secret | Provider-specific environment variables resolved via vox-secrets (e.g., GEMINI_TUNING_TEMPERATURE). |
| 3 (Baseline) | Tool Default | The “safe” baseline defined by the tool author (e.g., 0.1 for inline_edit). |
Implementation Strategy
Section titled “Implementation Strategy”1. Registry Integration
Section titled “1. Registry Integration”We extended the Secret Registry to include tuning overrides for major providers. This ensures that a single environment variable can shift the baseline for all tools using that provider.
GEMINI_TUNING_*OLLAMA_TUNING_*OPENAI_TUNING_*ANTHROPIC_TUNING_*
2. Signature Standardization
Section titled “2. Signature Standardization”The core mcp_infer_tool_completion was refactored to take base_temperature: f32 and Option<f32> overrides. This forces every call site to consciously provide a functional baseline while allowing the bridge to handle the complex resolution logic.
3. Telemetry and Observability
Section titled “3. Telemetry and Observability”To prevent “invisible tuning” (where a user is unsure which value was actually sent), we instrumented the vox.mcp.llm.tuning channel. This target logs the final resolved parameters for every request.
Findings
Section titled “Findings”- Anthropic Constraints: Anthropic’s API is sensitive to
top_pvalues when they are not explicitly needed. Our implementation usesOption<f32>to ensure we only serialize these fields when they have been explicitly overridden. - Provider Parity: Mapping
OpenRouterandCustomvariants to theOPENAI_TUNING_*secrets provides a consistent experience for cloud proxies.
Future Work
Section titled “Future Work”- Per-Model Tuning: Currently, tuning is provider-wide. Future iterations may allow
GEMINI_1_5_PRO_TUNING_TEMPERATUREfor more granular control. - User Preference Sync: Integration with a central user preference store to persist tuning settings across sessions.