Skip to main content

Model Registry & Routing

LiteLLM/Azure model definitions are declared once in a canonical YAML; per-tool model lists (Pi, OpenCode) are generated from it. This is the model-side counterpart to the MCP registry.

Use when adding a model, changing reasoning/cost metadata, or understanding how a model reaches Pi/OpenCode.

Registry: ai_models.yaml

Source of truth: home/.chezmoidata/ai_models.yaml. It holds two sections — litellm_models and azure_models — each a list of model dicts:

FieldPurpose
idProvider-qualified model id (e.g. llm-gateway/claude-opus-4-7)
nameHuman-readable display label
reasoningWhether the model supports a thinking/reasoning budget
thinkingBudgetsNamed token budgets (minimal/low/medium/high/xhigh)
contextWindowMax context tokens
maxTokensMax output tokens
costPer-model input/output/cacheRead/cacheWrite pricing

scripts/ai_models.py parses these sections (dependency-free), and scripts/model_display.py builds the shared display-name format (<name> [reasoning-emoji] [(cost)] (LiteLLM)).

Generators

GeneratorOutput
scripts/generate_pi_models.pyBuilds Pi models.json from a shared base plus work-only LiteLLM/Azure providers
scripts/merge_opencode_models.pyMerges LiteLLM/Azure models into the OpenCode JSONC config
scripts/probe_litellm_prompt_cache.pyDiagnostic: probes prompt-cache signals across LiteLLM models

These run inside the per-tool merge hooks (run_onchange_after_07-merge-pi-config.sh.tmpl, run_onchange_after_07-merge-opencode-config.sh.tmpl). See Tool configs.

LiteLLM integration (work profile)

Fish exports these values from pass when the entries exist (see home/dot_config/fish/readonly_config.fish.tmpl):

VariablePass pathNotes
LITELLM_PROXY_KEYlitellm/api/tokenAPI authentication
LITELLM_API_BASElitellm/api/baseNormalized to end in /v1
  • OpenCode: the work config (home/dot_config/opencode/readonly_opencode.work.jsonc) uses Google direct Gemini as the primary default (google/gemini-3.1-pro-preview-customtools); additional LiteLLM aliases remain available for explicit selection.
  • Pi: the work config is rendered by run_onchange_after_07-merge-pi-config.sh.tmpl into ~/.pi/agent/, starting from the shared base and adding work-only LiteLLM/Azure providers.

Local inference

The local-inference backend is llama.cpp via ,llama-cpp; see llama.cpp local inference.