LLM Providers Guide

LibreFang ships with a comprehensive model catalog covering 6 native LLM drivers, 32 providers, 90+ builtin models, and 23 aliases. Every provider uses one of six battle-tested drivers: the native Anthropic driver, the native Gemini driver, the ChatGPT session driver, the GitHub Copilot OAuth driver, the Claude Code subprocess driver, or the universal OpenAI-compatible driver. This guide is the single source of truth for configuring, selecting, and managing LLM providers in LibreFang.


Table of Contents

  1. Quick Setup
  2. Provider Reference
  3. Model Catalog
  4. Model Aliases
  5. Per-Agent Model Override
  6. Model Routing
  7. Cost Tracking
  8. Fallback Providers
  9. API Endpoints
  10. Channel Commands

Quick Setup

The fastest path from zero to running:

# Pick ONE provider — set its env var — done.
export GEMINI_API_KEY="your-key"        # Free tier available
# OR
export GROQ_API_KEY="your-key"          # Free tier available
# OR
export ANTHROPIC_API_KEY="your-key"
# OR
export OPENAI_API_KEY="your-key"

LibreFang auto-detects which providers have API keys configured at boot. Any model whose provider is authenticated becomes immediately available. Local providers (Ollama, vLLM, LM Studio) require no key at all.

For Gemini specifically, either GEMINI_API_KEY or GOOGLE_API_KEY will work.


Provider Reference

1. Anthropic

Display NameAnthropic
DriverNative Anthropic (Messages API)
Env VarANTHROPIC_API_KEY
Base URLhttps://api.anthropic.com
Key RequiredYes
Free TierNo
Authx-api-key header
Models3

Available Models:

  • claude-opus-4-20250514 (Frontier)
  • claude-sonnet-4-20250514 (Smart)
  • claude-haiku-4-5-20251001 (Fast)

Setup:

  1. Sign up at console.anthropic.com
  2. Create an API key under Settings > API Keys
  3. export ANTHROPIC_API_KEY="sk-ant-..."

2. OpenAI

Display NameOpenAI
DriverOpenAI-compatible
Env VarOPENAI_API_KEY
Base URLhttps://api.openai.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models6

Available Models:

  • gpt-4.1 (Frontier)
  • gpt-4o (Smart)
  • o3-mini (Smart)
  • gpt-4.1-mini (Balanced)
  • gpt-4o-mini (Fast)
  • gpt-4.1-nano (Fast)

Setup:

  1. Sign up at platform.openai.com
  2. Create an API key under API Keys
  3. export OPENAI_API_KEY="sk-..."

3. Google Gemini

Display NameGoogle Gemini
DriverNative Gemini (generateContent API)
Env VarGEMINI_API_KEY (or GOOGLE_API_KEY)
Base URLhttps://generativelanguage.googleapis.com
Key RequiredYes
Free TierYes (generous free tier)
Authx-goog-api-key header
Models3

Available Models:

  • gemini-2.5-pro (Frontier)
  • gemini-2.5-flash (Smart)
  • gemini-2.0-flash (Fast)

Setup:

  1. Go to aistudio.google.com
  2. Get an API key (free tier included)
  3. export GEMINI_API_KEY="AIza..." or export GOOGLE_API_KEY="AIza..."

Notes: The Gemini driver is a fully native implementation. It is not OpenAI-compatible. Model goes in the URL path, system prompt via systemInstruction, tools via functionDeclarations, streaming via streamGenerateContent?alt=sse.


4. DeepSeek

Display NameDeepSeek
DriverOpenAI-compatible
Env VarDEEPSEEK_API_KEY
Base URLhttps://api.deepseek.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models2

Available Models:

  • deepseek-chat (Smart) -- DeepSeek V3
  • deepseek-reasoner (Smart) -- DeepSeek R1, no tool support

Setup:

  1. Sign up at platform.deepseek.com
  2. Create an API key
  3. export DEEPSEEK_API_KEY="sk-..."

5. Groq

Display NameGroq
DriverOpenAI-compatible
Env VarGROQ_API_KEY
Base URLhttps://api.groq.com/openai/v1
Key RequiredYes
Free TierYes (rate-limited)
AuthAuthorization: Bearer header
Models4

Available Models:

  • llama-3.3-70b-versatile (Balanced)
  • mixtral-8x7b-32768 (Balanced)
  • llama-3.1-8b-instant (Fast)
  • gemma2-9b-it (Fast)

Setup:

  1. Sign up at console.groq.com
  2. Create an API key
  3. export GROQ_API_KEY="gsk_..."

Notes: Groq runs open-source models on custom LPU hardware. Extremely fast inference. Free tier has rate limits but is very usable.


6. OpenRouter

Display NameOpenRouter
DriverOpenAI-compatible
Env VarOPENROUTER_API_KEY
Base URLhttps://openrouter.ai/api/v1
Key RequiredYes
Free TierYes (limited credits for some models)
AuthAuthorization: Bearer header
Models10

Available Models:

  • openrouter/google/gemini-2.5-flash (Smart) -- cheap, fast, 1M context (default)
  • openrouter/anthropic/claude-sonnet-4 (Smart) -- strong reasoning + tools
  • openrouter/openai/gpt-4o (Smart) -- GPT-4o via OpenRouter
  • openrouter/deepseek/deepseek-chat (Smart) -- DeepSeek V3
  • openrouter/meta-llama/llama-3.3-70b-instruct (Balanced) -- Llama 3.3 70B
  • openrouter/qwen/qwen-2.5-72b-instruct (Balanced) -- Qwen 2.5 72B
  • openrouter/google/gemini-2.5-pro (Frontier) -- Gemini 2.5 Pro
  • openrouter/mistralai/mistral-large-latest (Smart) -- Mistral Large
  • openrouter/google/gemma-2-9b-it (Fast) -- Gemma 2 9B, free
  • openrouter/deepseek/deepseek-r1 (Frontier) -- DeepSeek R1 reasoning

Setup:

  1. Sign up at openrouter.ai
  2. Create an API key under Keys
  3. export OPENROUTER_API_KEY="sk-or-..."

Notes: OpenRouter is a unified gateway to 200+ models from many providers. Model IDs use the upstream format (e.g. google/gemini-2.5-flash). You can use any model from OpenRouter's catalog by specifying the full model path with the openrouter/ prefix.


7. Mistral AI

Display NameMistral AI
DriverOpenAI-compatible
Env VarMISTRAL_API_KEY
Base URLhttps://api.mistral.ai/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models3

Available Models:

  • mistral-large-latest (Smart)
  • codestral-latest (Smart)
  • mistral-small-latest (Fast)

Setup:

  1. Sign up at console.mistral.ai
  2. Create an API key
  3. export MISTRAL_API_KEY="..."

8. Together AI

Display NameTogether AI
DriverOpenAI-compatible
Env VarTOGETHER_API_KEY
Base URLhttps://api.together.xyz/v1
Key RequiredYes
Free TierYes (limited credits on signup)
AuthAuthorization: Bearer header
Models3

Available Models:

  • meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo (Frontier)
  • Qwen/Qwen2.5-72B-Instruct-Turbo (Smart)
  • mistralai/Mixtral-8x22B-Instruct-v0.1 (Balanced)

Setup:

  1. Sign up at api.together.ai
  2. Create an API key
  3. export TOGETHER_API_KEY="..."

9. Fireworks AI

Display NameFireworks AI
DriverOpenAI-compatible
Env VarFIREWORKS_API_KEY
Base URLhttps://api.fireworks.ai/inference/v1
Key RequiredYes
Free TierYes (limited credits on signup)
AuthAuthorization: Bearer header
Models2

Available Models:

  • accounts/fireworks/models/llama-v3p1-405b-instruct (Frontier)
  • accounts/fireworks/models/mixtral-8x22b-instruct (Balanced)

Setup:

  1. Sign up at fireworks.ai
  2. Create an API key
  3. export FIREWORKS_API_KEY="..."

10. Ollama

Display NameOllama
DriverOpenAI-compatible
Env VarOLLAMA_API_KEY (not required)
Base URLhttp://localhost:11434/v1
Key RequiredNo
Free TierFree (local)
AuthNone (local)
Models3 builtin + auto-discovered

Available Models (builtin):

  • llama3.2 (Local)
  • mistral:latest (Local)
  • phi3 (Local)

Setup:

  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Start the server: ollama serve
  4. No env var needed -- Ollama is always available

Notes: LibreFang auto-discovers models from a running Ollama instance and merges them into the catalog with Local tier and zero cost. Any model you pull becomes usable immediately.


11. vLLM

Display NamevLLM
DriverOpenAI-compatible
Env VarVLLM_API_KEY (not required)
Base URLhttp://localhost:8000/v1
Key RequiredNo
Free TierFree (self-hosted)
AuthNone (local)
Models1 builtin + auto-discovered

Available Models (builtin):

  • vllm-local (Local)

Setup:

  1. Install vLLM: pip install vllm
  2. Start the server: python -m vllm.entrypoints.openai.api_server --model <model-name>
  3. No env var needed

12. LM Studio

Display NameLM Studio
DriverOpenAI-compatible
Env VarLMSTUDIO_API_KEY (not required)
Base URLhttp://localhost:1234/v1
Key RequiredNo
Free TierFree (local)
AuthNone (local)
Models1 builtin + auto-discovered

Available Models (builtin):

  • lmstudio-local (Local)

Setup:

  1. Download LM Studio from lmstudio.ai
  2. Download a model from the built-in model browser
  3. Start the local server from the "Local Server" tab
  4. No env var needed

13. Perplexity AI

Display NamePerplexity AI
DriverOpenAI-compatible
Env VarPERPLEXITY_API_KEY
Base URLhttps://api.perplexity.ai
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models2

Available Models:

  • sonar-pro (Smart) -- online search-augmented
  • sonar (Balanced) -- online search-augmented

Setup:

  1. Sign up at perplexity.ai
  2. Go to API settings and generate a key
  3. export PERPLEXITY_API_KEY="pplx-..."

Notes: Perplexity models have built-in web search. They do not support tool use.


14. Cohere

Display NameCohere
DriverOpenAI-compatible
Env VarCOHERE_API_KEY
Base URLhttps://api.cohere.com/v2
Key RequiredYes
Free TierYes (rate-limited trial)
AuthAuthorization: Bearer header
Models2

Available Models:

  • command-r-plus (Smart)
  • command-r (Balanced)

Setup:

  1. Sign up at dashboard.cohere.com
  2. Create an API key
  3. export COHERE_API_KEY="..."

15. AI21 Labs

Display NameAI21 Labs
DriverOpenAI-compatible
Env VarAI21_API_KEY
Base URLhttps://api.ai21.com/studio/v1
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
Models1

Available Models:

  • jamba-1.5-large (Smart)

Setup:

  1. Sign up at studio.ai21.com
  2. Create an API key
  3. export AI21_API_KEY="..."

16. Cerebras

Display NameCerebras
DriverOpenAI-compatible
Env VarCEREBRAS_API_KEY
Base URLhttps://api.cerebras.ai/v1
Key RequiredYes
Free TierYes (generous free tier)
AuthAuthorization: Bearer header
Models2

Available Models:

  • cerebras/llama3.3-70b (Balanced)
  • cerebras/llama3.1-8b (Fast)

Setup:

  1. Sign up at cloud.cerebras.ai
  2. Create an API key
  3. export CEREBRAS_API_KEY="..."

Notes: Cerebras runs inference on wafer-scale chips. Ultra-fast and ultra-cheap ($0.06/M tokens for both input and output on the 70B model).


17. SambaNova

Display NameSambaNova
DriverOpenAI-compatible
Env VarSAMBANOVA_API_KEY
Base URLhttps://api.sambanova.ai/v1
Key RequiredYes
Free TierYes (limited credits)
AuthAuthorization: Bearer header
Models1

Available Models:

  • sambanova/llama-3.3-70b (Balanced)

Setup:

  1. Sign up at cloud.sambanova.ai
  2. Create an API key
  3. export SAMBANOVA_API_KEY="..."

18. Hugging Face

Display NameHugging Face
DriverOpenAI-compatible
Env VarHF_API_KEY
Base URLhttps://api-inference.huggingface.co/v1
Key RequiredYes
Free TierYes (rate-limited)
AuthAuthorization: Bearer header
Models1

Available Models:

  • hf/meta-llama/Llama-3.3-70B-Instruct (Balanced)

Setup:

  1. Sign up at huggingface.co
  2. Create a token under Settings > Access Tokens
  3. export HF_API_KEY="hf_..."

19. xAI

Display NamexAI
DriverOpenAI-compatible
Env VarXAI_API_KEY
Base URLhttps://api.x.ai/v1
Key RequiredYes
Free TierYes (limited free credits)
AuthAuthorization: Bearer header
Models2

Available Models:

  • grok-2 (Smart) -- supports vision
  • grok-2-mini (Fast)

Setup:

  1. Sign up at console.x.ai
  2. Create an API key
  3. export XAI_API_KEY="xai-..."

20. Replicate

Display NameReplicate
DriverOpenAI-compatible
Env VarREPLICATE_API_TOKEN
Base URLhttps://api.replicate.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models1

Available Models:

  • replicate/meta-llama-3.3-70b-instruct (Balanced)

Setup:

  1. Sign up at replicate.com
  2. Go to Account > API Tokens
  3. export REPLICATE_API_TOKEN="r8_..."

21. ChatGPT (Session Auth)

Display NameChatGPT (Session Auth)
DriverChatGPT session driver
Env VarCHATGPT_SESSION_TOKEN
Base URLhttps://chatgpt.com/backend-api
Key RequiredYes (session token)
Free TierUses your ChatGPT subscription
AuthSession-based cookie auth
Models5

Available Models:

  • gpt-5.4-codex (Frontier)
  • gpt-5.3-codex (Frontier)
  • gpt-5.2-codex (Smart)
  • gpt-5.1-codex (Smart)
  • gpt-5.1-codex-mini (Balanced)

Setup:

  1. Run librefang auth chatgpt to authenticate via browser session
  2. The session token is captured automatically
  3. Alternatively: export CHATGPT_SESSION_TOKEN="your-session-token"

Notes: This provider uses ChatGPT's internal backend API with session-based authentication. It leverages your existing ChatGPT subscription (Plus/Team/Enterprise). Models appear at zero API cost since they use your subscription. This is not the official OpenAI API -- it's the ChatGPT web interface backend.


22. GitHub Copilot

Display NameGitHub Copilot
DriverCopilot OAuth driver
Env VarGITHUB_TOKEN (auto via OAuth)
Base URLhttps://api.githubcopilot.com
Key RequiredYes (auto token exchange)
Free TierRequires GitHub Copilot subscription
AuthOAuth device flow + automatic token exchange
Models2

Available Models:

  • copilot/gpt-4 (Frontier)
  • copilot/gpt-4o (Smart)

Setup:

  1. Ensure you have an active GitHub Copilot subscription
  2. Run librefang auth copilot to authenticate via GitHub device flow
  3. Token exchange is handled automatically -- no manual key management needed

Notes: The Copilot driver implements GitHub's OAuth device flow and automatic token exchange. Your GitHub token is exchanged for a Copilot API token transparently. Models appear at zero API cost since they use your Copilot subscription.


23. Moonshot (Kimi)

Display NameMoonshot (Kimi)
DriverOpenAI-compatible
Env VarMOONSHOT_API_KEY
Base URLhttps://api.moonshot.ai/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Aliaseskimi, kimi2
Models5

Available Models:

  • kimi-k2.5 (Frontier) -- latest Kimi model with vision
  • kimi-k2 (Frontier) -- strong reasoning with vision
  • moonshot-v1-128k (Smart) -- 128K context
  • moonshot-v1-32k (Balanced) -- 32K context
  • moonshot-v1-8k (Fast) -- 8K context, cheapest

Setup:

  1. Sign up at platform.moonshot.cn
  2. Create an API key
  3. export MOONSHOT_API_KEY="sk-..."

Example config:

[default_model]
provider = "moonshot"
model = "kimi-k2"

24. Qwen (Alibaba DashScope)

Display NameQwen (Alibaba)
DriverOpenAI-compatible
Env VarDASHSCOPE_API_KEY
Base URLhttps://dashscope.aliyuncs.com/compatible-mode/v1
Key RequiredYes
Free TierYes (limited free credits)
AuthAuthorization: Bearer header
Aliasesdashscope, model_studio
Models11

Available Models:

  • qwen3-235b-a22b (Frontier) -- Qwen3 MoE 235B
  • qwen-max (Frontier) -- strongest Qwen model
  • qwen-vl-max (Frontier) -- vision flagship
  • qwen-plus (Smart) -- good balance of cost and capability
  • qwen-vl-plus (Smart) -- vision
  • qwen-coder-plus (Smart) -- code-specialized
  • qwen-coder-plus-latest (Smart) -- latest coder snapshot
  • qwen-long (Balanced) -- 1M context window
  • qwen2.5-coder-32b-instruct (Balanced) -- open-source coder
  • qwen3-30b-a3b (Fast) -- small MoE, very cheap
  • qwen-turbo (Fast) -- fastest Qwen model

Setup:

  1. Sign up at dashscope.console.aliyun.com
  2. Create an API key under DashScope console
  3. export DASHSCOPE_API_KEY="sk-..."

Example config:

[default_model]
provider = "qwen"
model = "qwen-plus"

Notes: Qwen uses Alibaba Cloud's DashScope platform. The API is OpenAI-compatible. The qwen-long model supports up to 1M token context window.


25. Zhipu AI (GLM)

Display NameZhipu AI (GLM)
DriverOpenAI-compatible
Env VarZHIPU_API_KEY
Base URLhttps://open.bigmodel.cn/api/paas/v4
Key RequiredYes
Free TierYes (GLM-4 Flash is free)
AuthAuthorization: Bearer header
Aliasesglm
Models6

Available Models:

  • glm-5-20250605 (Frontier) -- GLM-5, latest with vision
  • glm-4-plus (Smart) -- strong reasoning, 128K context
  • glm-4.7 (Smart) -- enhanced with vision
  • glm-4v-plus (Smart) -- vision-specialized
  • glm-4-long (Balanced) -- 1M context window
  • glm-4-flash (Fast) -- free tier, 128K context

Setup:

  1. Sign up at open.bigmodel.cn
  2. Create an API key
  3. export ZHIPU_API_KEY="..."

Example config:

[default_model]
provider = "zhipu"
model = "glm-4-plus"

Notes: GLM-4 Flash is completely free with no rate limits, making it excellent for development and testing.


26. MiniMax (International)

Display NameMiniMax (International)
DriverOpenAI-compatible
Env VarMINIMAX_API_KEY
Base URLhttps://api.minimax.io/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models6

Available Models:

  • MiniMax-M2.5 (Frontier) -- vision, 1M context
  • MiniMax-M2.5-highspeed (Smart) -- fast variant with vision
  • MiniMax-M2.1 (Smart) -- 1M context
  • minimax-text-01 (Smart) -- 1M context
  • abab7-chat (Smart) -- vision, 512K context
  • abab6.5-chat (Balanced) -- 245K context

Setup:

  1. Sign up at platform.minimax.chat
  2. Create an API key
  3. export MINIMAX_API_KEY="..."

Example config:

[default_model]
provider = "minimax"
model = "MiniMax-M2.5"

27. MiniMax (China)

Display NameMiniMax (China)
DriverOpenAI-compatible
Env VarMINIMAX_CN_API_KEY
Base URLhttps://api.minimaxi.com/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models6

Available Models: Same models as MiniMax International, served from China endpoints.

Setup:

  1. Sign up at platform.minimaxi.com (China)
  2. Create an API key
  3. export MINIMAX_CN_API_KEY="..."

Notes: This is the China-domestic version of MiniMax with servers in mainland China. Use this if you need lower latency from China or compliance with data residency requirements.


28. Baidu Qianfan (ERNIE)

Display NameBaidu Qianfan
DriverOpenAI-compatible
Env VarQIANFAN_API_KEY
Base URLhttps://qianfan.baidubce.com/v2
Key RequiredYes
Free TierYes (ERNIE Speed is free)
AuthAuthorization: Bearer header
Aliasesbaidu
Models3

Available Models:

  • ernie-4.5-8k (Smart) -- Baidu's flagship model
  • ernie-4.0-turbo-8k (Balanced) -- fast variant
  • ernie-speed-128k (Fast) -- free, 128K context

Setup:

  1. Sign up at qianfan.cloud.baidu.com
  2. Create an API key
  3. export QIANFAN_API_KEY="..."

Example config:

[default_model]
provider = "qianfan"
model = "ernie-4.5-8k"

Notes: ERNIE Speed 128K is free with no usage limits, making it a great option for development and high-volume tasks.


29. Volcano Engine (Doubao)

Display NameVolcano Engine (Doubao)
DriverOpenAI-compatible
Env VarVOLCENGINE_API_KEY
Base URLhttps://ark.cn-beijing.volces.com/api/v3
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Aliasesdoubao
Models4

Available Models:

  • doubao-seed-1-6-251015 (Smart) -- Doubao flagship, 256K context
  • doubao-seed-code (Smart) -- code-specialized
  • doubao-seed-2-0-lite (Balanced) -- cost-effective
  • doubao-seed-2-0-mini (Fast) -- cheapest option

Setup:

  1. Sign up at console.volcengine.com
  2. Enable the Ark (model service) product
  3. Create an API key
  4. export VOLCENGINE_API_KEY="..."

Example config:

[default_model]
provider = "volcengine"
model = "doubao-seed-1-6-251015"

Notes: Volcano Engine is ByteDance's cloud platform. Doubao models are ByteDance's in-house LLMs.


30. Venice.ai

Display NameVenice.ai
DriverOpenAI-compatible
Env VarVENICE_API_KEY
Base URLhttps://api.venice.ai/api/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models3

Available Models:

  • qwen3-235b-a22b-instruct-2507 (Smart) -- Qwen3 235B
  • llama-3.3-70b (Balanced) -- Llama 3.3 70B
  • venice-uncensored (Fast) -- uncensored model

Setup:

  1. Sign up at venice.ai
  2. Create an API key from dashboard
  3. export VENICE_API_KEY="..."

Example config:

[default_model]
provider = "venice"
model = "llama-3.3-70b"

Notes: Venice.ai is an OpenAI-compatible inference platform focused on privacy and uncensored models.


31. Chutes.ai

Display NameChutes.ai
DriverOpenAI-compatible
Env VarCHUTES_API_KEY
Base URLhttps://llm.chutes.ai/v1
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Models5

Available Models:

  • chutes/deepseek-ai/DeepSeek-V3 (Smart) -- DeepSeek V3
  • chutes/deepseek-ai/DeepSeek-R1 (Smart) -- DeepSeek R1 reasoning
  • chutes/Qwen/Qwen3-235B-A22B (Smart) -- Qwen3 235B
  • chutes/meta-llama/Llama-4-Maverick-17B-128E-Instruct (Balanced) -- Llama 4 Maverick
  • chutes/meta-llama/Llama-3.3-70B-Instruct (Balanced) -- Llama 3.3 70B

Setup:

  1. Sign up at chutes.ai
  2. Create an API key
  3. export CHUTES_API_KEY="..."

Example config:

[default_model]
provider = "chutes"
model = "chutes/deepseek-ai/DeepSeek-V3"

Notes: Chutes.ai provides serverless inference for popular open-source models at competitive pricing.


32. Z.AI

Display NameZ.AI
DriverOpenAI-compatible
Env VarZHIPU_API_KEY
Base URLhttps://api.z.ai/api/paas/v4
Key RequiredYes
Free TierNo
AuthAuthorization: Bearer header
Aliasesz.ai
ModelsShared with Zhipu catalog

Setup:

  1. Sign up at z.ai
  2. Uses the same API key as Zhipu AI
  3. export ZHIPU_API_KEY="..."

Example config:

[default_model]
provider = "zai"
model = "glm-4-plus"

Notes: Z.AI is an international endpoint for Zhipu AI models. It shares the same API key (ZHIPU_API_KEY) and model catalog as Zhipu.


Model Catalog

The complete catalog of all 90+ builtin models, sorted by provider. Pricing is per million tokens.

#Model IDDisplay NameProviderTierContext WindowMax OutputInput $/MOutput $/MToolsVision
1claude-opus-4-20250514Claude Opus 4anthropicFrontier200,00032,000$15.00$75.00YesYes
2claude-sonnet-4-20250514Claude Sonnet 4anthropicSmart200,00064,000$3.00$15.00YesYes
3claude-haiku-4-5-20251001Claude Haiku 4.5anthropicFast200,0008,192$0.25$1.25YesYes
4gpt-4.1GPT-4.1openaiFrontier1,047,57632,768$2.00$8.00YesYes
5gpt-4oGPT-4oopenaiSmart128,00016,384$2.50$10.00YesYes
6o3-minio3-miniopenaiSmart200,000100,000$1.10$4.40YesNo
7gpt-4.1-miniGPT-4.1 MiniopenaiBalanced1,047,57632,768$0.40$1.60YesYes
8gpt-4o-miniGPT-4o MiniopenaiFast128,00016,384$0.15$0.60YesYes
9gpt-4.1-nanoGPT-4.1 NanoopenaiFast1,047,57632,768$0.10$0.40YesNo
10gemini-2.5-proGemini 2.5 ProgeminiFrontier1,048,57665,536$1.25$10.00YesYes
11gemini-2.5-flashGemini 2.5 FlashgeminiSmart1,048,57665,536$0.15$0.60YesYes
12gemini-2.0-flashGemini 2.0 FlashgeminiFast1,048,5768,192$0.10$0.40YesYes
13deepseek-chatDeepSeek V3deepseekSmart64,0008,192$0.27$1.10YesNo
14deepseek-reasonerDeepSeek R1deepseekSmart64,0008,192$0.55$2.19NoNo
15llama-3.3-70b-versatileLlama 3.3 70BgroqBalanced128,00032,768$0.059$0.079YesNo
16mixtral-8x7b-32768Mixtral 8x7BgroqBalanced32,7684,096$0.024$0.024YesNo
17llama-3.1-8b-instantLlama 3.1 8BgroqFast128,0008,192$0.05$0.08YesNo
18gemma2-9b-itGemma 2 9BgroqFast8,1924,096$0.02$0.02NoNo
19openrouter/google/gemini-2.5-flashGemini 2.5 Flash (OpenRouter)openrouterSmart1,048,57665,536$0.15$0.60YesYes
20openrouter/anthropic/claude-sonnet-4Claude Sonnet 4 (OpenRouter)openrouterSmart200,00064,000$3.00$15.00YesYes
21openrouter/openai/gpt-4oGPT-4o (OpenRouter)openrouterSmart128,00016,384$2.50$10.00YesYes
22openrouter/deepseek/deepseek-chatDeepSeek V3 (OpenRouter)openrouterSmart128,00032,768$0.14$0.28YesNo
23openrouter/meta-llama/llama-3.3-70b-instructLlama 3.3 70B (OpenRouter)openrouterBalanced128,00032,768$0.39$0.39YesNo
24openrouter/qwen/qwen-2.5-72b-instructQwen 2.5 72B (OpenRouter)openrouterBalanced128,00032,768$0.36$0.36YesNo
25openrouter/google/gemini-2.5-proGemini 2.5 Pro (OpenRouter)openrouterFrontier1,048,57665,536$1.25$10.00YesYes
26openrouter/mistralai/mistral-large-latestMistral Large (OpenRouter)openrouterSmart128,0008,192$2.00$6.00YesNo
27openrouter/google/gemma-2-9b-itGemma 2 9B (OpenRouter)openrouterFast8,1924,096$0.00$0.00NoNo
28openrouter/deepseek/deepseek-r1DeepSeek R1 (OpenRouter)openrouterFrontier128,00032,768$0.55$2.19NoNo
29mistral-large-latestMistral LargemistralSmart128,0008,192$2.00$6.00YesNo
30codestral-latestCodestralmistralSmart32,0008,192$0.30$0.90YesNo
31mistral-small-latestMistral SmallmistralFast128,0008,192$0.10$0.30YesNo
32meta-llama/Meta-Llama-3.1-405B-Instruct-TurboLlama 3.1 405B (Together)togetherFrontier130,0004,096$3.50$3.50YesNo
33Qwen/Qwen2.5-72B-Instruct-TurboQwen 2.5 72B (Together)togetherSmart32,7684,096$0.20$0.60YesNo
34mistralai/Mixtral-8x22B-Instruct-v0.1Mixtral 8x22B (Together)togetherBalanced65,5364,096$0.60$0.60YesNo
35accounts/fireworks/models/llama-v3p1-405b-instructLlama 3.1 405B (Fireworks)fireworksFrontier131,07216,384$3.00$3.00YesNo
36accounts/fireworks/models/mixtral-8x22b-instructMixtral 8x22B (Fireworks)fireworksBalanced65,5364,096$0.90$0.90YesNo
37llama3.2Llama 3.2 (Ollama)ollamaLocal128,0004,096$0.00$0.00YesNo
38mistral:latestMistral (Ollama)ollamaLocal32,7684,096$0.00$0.00YesNo
39phi3Phi-3 (Ollama)ollamaLocal128,0004,096$0.00$0.00NoNo
40vllm-localvLLM Local ModelvllmLocal32,7684,096$0.00$0.00YesNo
41lmstudio-localLM Studio Local ModellmstudioLocal32,7684,096$0.00$0.00YesNo
42sonar-proSonar ProperplexitySmart200,0008,192$3.00$15.00NoNo
43sonarSonarperplexityBalanced128,0008,192$1.00$5.00NoNo
44command-r-plusCommand R+cohereSmart128,0004,096$2.50$10.00YesNo
45command-rCommand RcohereBalanced128,0004,096$0.15$0.60YesNo
46jamba-1.5-largeJamba 1.5 Largeai21Smart256,0004,096$2.00$8.00YesNo
47cerebras/llama3.3-70bLlama 3.3 70B (Cerebras)cerebrasBalanced128,0008,192$0.06$0.06YesNo
48cerebras/llama3.1-8bLlama 3.1 8B (Cerebras)cerebrasFast128,0008,192$0.01$0.01YesNo
49sambanova/llama-3.3-70bLlama 3.3 70B (SambaNova)sambanovaBalanced128,0008,192$0.06$0.06YesNo
50grok-2Grok 2xaiSmart131,07232,768$2.00$10.00YesYes
51grok-2-miniGrok 2 MinixaiFast131,07232,768$0.30$0.50YesNo
52hf/meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B (HF)huggingfaceBalanced128,0004,096$0.30$0.30NoNo
53replicate/meta-llama-3.3-70b-instructLlama 3.3 70B (Replicate)replicateBalanced128,0004,096$0.40$0.40NoNo

Model Tiers:

TierDescriptionTypical Use
FrontierMost capable, highest costOrchestration, architecture, security audits
SmartStrong reasoning, moderate costCoding, code review, research, analysis
BalancedGood cost/quality tradeoffPlanning, writing, DevOps, day-to-day tasks
FastCheapest cloud inferenceOps, translation, simple Q&A, health checks
LocalSelf-hosted, zero costPrivacy-first, offline, development

Notes:

  • Local providers (Ollama, vLLM, LM Studio) auto-discover models at runtime. Any model you download and serve will be merged into the catalog with Local tier and zero cost.
  • The 46 entries above are the builtin models. The total of 51 referenced in the catalog includes runtime auto-discovered models that vary per installation.

Model Aliases

All 23 aliases resolve to canonical model IDs. Aliases are case-insensitive.

AliasResolves To
sonnetclaude-sonnet-4-20250514
claude-sonnetclaude-sonnet-4-20250514
haikuclaude-haiku-4-5-20251001
claude-haikuclaude-haiku-4-5-20251001
opusclaude-opus-4-20250514
claude-opusclaude-opus-4-20250514
gpt4gpt-4o
gpt4ogpt-4o
gpt4-minigpt-4o-mini
flashgemini-2.5-flash
gemini-flashgemini-2.5-flash
gemini-progemini-2.5-pro
deepseekdeepseek-chat
llamallama-3.3-70b-versatile
llama-70bllama-3.3-70b-versatile
mixtralmixtral-8x7b-32768
mistralmistral-large-latest
codestralcodestral-latest
grokgrok-2
grok-minigrok-2-mini
sonarsonar-pro
jambajamba-1.5-large
command-rcommand-r-plus

You can use aliases anywhere a model ID is accepted: in config files, REST API calls, chat commands, and the model routing configuration.


Per-Agent Model Override

Each agent in your config.toml can specify its own model, overriding the global default:

# Global default model
[agents.defaults]
model = "claude-sonnet-4-20250514"

# Per-agent override: use an alias or full model ID
[[agents]]
name = "orchestrator"
model = "opus"                      # alias for claude-opus-4-20250514

[[agents]]
name = "ops"
model = "llama-3.3-70b-versatile"   # cheap Groq model for simple ops

[[agents]]
name = "coder"
model = "gemini-2.5-flash"          # fast + cheap + 1M context

[[agents]]
name = "researcher"
model = "sonar-pro"                 # Perplexity with built-in web search

# You can also pin a model in the agent manifest TOML
[[agents]]
name = "production-bot"
pinned_model = "claude-sonnet-4-20250514"  # never auto-routed

When pinned_model is set on an agent manifest, that agent always uses the specified model regardless of routing configuration. This is used in Stabilisation mode (KernelMode::Stable) where the model is frozen for production reliability.


Model Routing

LibreFang can automatically select the cheapest model capable of handling each query. This is configured per-agent via ModelRoutingConfig.

How It Works

  1. The ModelRouter scores each incoming CompletionRequest based on heuristics
  2. The score maps to a TaskComplexity tier: Simple, Medium, or Complex
  3. Each tier has a pre-configured model

Scoring Heuristics

SignalWeightLogic
Total message length1 point per ~4 charsRough token proxy
Tool availability+20 per tool definedTools imply multi-step work
Code markers+30 per marker foundBackticks, fn, def, class, import, function, async, await, struct, impl, return
Conversation depth+15 per message > 10Deep context = harder reasoning
System prompt length+1 per 10 chars > 500Long system prompts imply complex tasks

Thresholds

ComplexityScore RangeDefault Model
Simplescore < 100claude-haiku-4-5-20251001
Medium100 <= score < 500claude-sonnet-4-20250514
Complexscore >= 500claude-sonnet-4-20250514

Configuration

# In agent manifest or config.toml
[routing]
simple_model = "claude-haiku-4-5-20251001"
medium_model = "gemini-2.5-flash"
complex_model = "claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

The router also integrates with the model catalog:

  • validate_models() checks that all configured model IDs exist in the catalog
  • resolve_aliases() expands aliases to canonical IDs (e.g., "sonnet" becomes "claude-sonnet-4-20250514")

Cost Tracking

LibreFang tracks the cost of every LLM call and can enforce per-agent spending quotas.

Per-Response Cost Estimation

After each LLM call, cost is calculated as:

cost = (input_tokens / 1,000,000) * input_rate + (output_tokens / 1,000,000) * output_rate

The MeteringEngine first checks the model catalog for exact pricing. If the model is not found, it falls back to a pattern-matching heuristic.

Cost Rates (per million tokens)

Model PatternInput $/MOutput $/M
*haiku*$0.25$1.25
*sonnet*$3.00$15.00
*opus*$15.00$75.00
gpt-4o-mini$0.15$0.60
gpt-4o$2.50$10.00
gpt-4.1-nano$0.10$0.40
gpt-4.1-mini$0.40$1.60
gpt-4.1$2.00$8.00
o3-mini$1.10$4.40
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.15$0.60
gemini-2.0-flash$0.10$0.40
deepseek-reasoner / deepseek-r1$0.55$2.19
*deepseek*$0.27$1.10
*cerebras*$0.06$0.06
*sambanova*$0.06$0.06
*replicate*$0.40$0.40
*llama* / *mixtral*$0.05$0.10
*qwen*$0.20$0.60
mistral-large*$2.00$6.00
*mistral* (other)$0.10$0.30
command-r-plus$2.50$10.00
command-r$0.15$0.60
sonar-pro$3.00$15.00
*sonar* (other)$1.00$5.00
grok-2-mini / grok-mini$0.30$0.50
*grok* (other)$2.00$10.00
*jamba*$2.00$8.00
Default (unknown)$1.00$3.00

Quota Enforcement

Quotas are checked on every LLM call. If the agent exceeds its hourly limit, the call is rejected with a QuotaExceeded error.

# Per-agent quota in config.toml
[[agents]]
name = "chatbot"
[agents.resources]
max_cost_per_hour_usd = 5.00   # cap at $5/hour

The usage footer (when enabled) appends cost information to each response:

> Cost: $0.0042 | Tokens: 1,200 in / 340 out | Model: claude-sonnet-4-20250514

Fallback Providers

The FallbackDriver wraps multiple LLM drivers in a chain. If the primary driver fails, the next driver in the chain is tried automatically.

Behavior

  • On success: returns immediately
  • On rate limit / overload errors (429, 529): bubbles up for retry logic (does NOT failover, because the primary should be retried after backoff)
  • On all other errors: logs a warning and tries the next driver in the chain
  • If all drivers fail: returns the last error

Configuration

Fallback chains are configured in your agent manifest or config.toml. The FallbackDriver is used automatically when an agent is in Stabilisation mode (KernelMode::Stable) or when multiple providers are configured for reliability.

# Example: primary Anthropic, fallback to Gemini, then Groq
[[agents]]
name = "production-bot"
model = "claude-sonnet-4-20250514"
fallback_models = ["gemini-2.5-flash", "llama-3.3-70b-versatile"]

The fallback driver creates a chain: AnthropicDriver -> GeminiDriver -> OpenAIDriver(Groq).


API Endpoints

List All Models

GET /api/models

Returns the complete model catalog with metadata, pricing, and feature flags.

Response:

[
  {
    "id": "claude-sonnet-4-20250514",
    "display_name": "Claude Sonnet 4",
    "provider": "anthropic",
    "tier": "Smart",
    "context_window": 200000,
    "max_output_tokens": 64000,
    "input_cost_per_m": 3.0,
    "output_cost_per_m": 15.0,
    "supports_tools": true,
    "supports_vision": true,
    "supports_streaming": true,
    "aliases": ["sonnet", "claude-sonnet"]
  }
]

Get Specific Model

GET /api/models/{id}

Returns a single model entry. Supports both canonical IDs and aliases.

GET /api/models/sonnet
GET /api/models/claude-sonnet-4-20250514

List Aliases

GET /api/models/aliases

Returns a map of all alias-to-canonical-ID mappings.

Response:

{
  "sonnet": "claude-sonnet-4-20250514",
  "haiku": "claude-haiku-4-5-20251001",
  "flash": "gemini-2.5-flash",
  "grok": "grok-2"
}

List Providers

GET /api/providers

Returns all 32 providers with auth status and model counts.

Response:

[
  {
    "id": "anthropic",
    "display_name": "Anthropic",
    "api_key_env": "ANTHROPIC_API_KEY",
    "base_url": "https://api.anthropic.com",
    "key_required": true,
    "auth_status": "Configured",
    "model_count": 3
  },
  {
    "id": "ollama",
    "display_name": "Ollama",
    "api_key_env": "OLLAMA_API_KEY",
    "base_url": "http://localhost:11434/v1",
    "key_required": false,
    "auth_status": "NotRequired",
    "model_count": 5
  }
]

Auth status values: Configured, Missing, NotRequired.

Set Provider API Key

POST /api/providers/{name}/key
Content-Type: application/json

{ "api_key": "sk-..." }

Configures an API key for a provider at runtime (stored as a Zeroizing<String>, wiped from memory on drop).

Remove Provider API Key

DELETE /api/providers/{name}/key

Removes the configured API key for a provider.

Test Provider Connection

POST /api/providers/{name}/test

Sends a minimal test request to verify the provider is reachable and the API key is valid.


Channel Commands

Two chat commands are available in any channel for inspecting models and providers:

/models

Lists all available models with their tier, provider, and context window. Only shows models from providers that have authentication configured (or do not require it).

/models

Example output:

Available models (12):

Frontier:
  claude-opus-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-pro (Google Gemini) — 1M ctx

Smart:
  claude-sonnet-4-20250514 (Anthropic) — 200K ctx
  gemini-2.5-flash (Google Gemini) — 1M ctx
  deepseek-chat (DeepSeek) — 64K ctx

Balanced:
  llama-3.3-70b-versatile (Groq) — 128K ctx

Fast:
  claude-haiku-4-5-20251001 (Anthropic) — 200K ctx
  gemini-2.0-flash (Google Gemini) — 1M ctx

Local:
  llama3.2 (Ollama) — 128K ctx

/providers

Lists all 32 providers with their authentication status.

/providers

Example output:

LLM Providers (32):

  Anthropic          ANTHROPIC_API_KEY       Configured    3 models
  OpenAI             OPENAI_API_KEY          Missing       6 models
  Google Gemini      GEMINI_API_KEY          Configured    3 models
  DeepSeek           DEEPSEEK_API_KEY        Missing       2 models
  Groq               GROQ_API_KEY            Configured    4 models
  Ollama             (no key needed)         Ready         3 models
  vLLM               (no key needed)         Ready         1 model
  LM Studio          (no key needed)         Ready         1 model
  ...

Environment Variables Summary

Quick reference for all provider environment variables:

ProviderEnv VarRequired
AnthropicANTHROPIC_API_KEYYes
OpenAIOPENAI_API_KEYYes
Google GeminiGEMINI_API_KEY or GOOGLE_API_KEYYes
DeepSeekDEEPSEEK_API_KEYYes
GroqGROQ_API_KEYYes
OpenRouterOPENROUTER_API_KEYYes
Mistral AIMISTRAL_API_KEYYes
Together AITOGETHER_API_KEYYes
Fireworks AIFIREWORKS_API_KEYYes
OllamaOLLAMA_API_KEYNo
vLLMVLLM_API_KEYNo
LM StudioLMSTUDIO_API_KEYNo
Perplexity AIPERPLEXITY_API_KEYYes
CohereCOHERE_API_KEYYes
AI21 LabsAI21_API_KEYYes
CerebrasCEREBRAS_API_KEYYes
SambaNovaSAMBANOVA_API_KEYYes
Hugging FaceHF_API_KEYYes
xAIXAI_API_KEYYes
ReplicateREPLICATE_API_TOKENYes
ChatGPTCHATGPT_SESSION_TOKENYes (session)
GitHub CopilotGITHUB_TOKENYes (OAuth)
Moonshot (Kimi)MOONSHOT_API_KEYYes
Qwen (Alibaba)DASHSCOPE_API_KEYYes
Zhipu AI (GLM)ZHIPU_API_KEYYes
MiniMaxMINIMAX_API_KEYYes
MiniMax (China)MINIMAX_CN_API_KEYYes
Baidu QianfanQIANFAN_API_KEYYes
Volcano EngineVOLCENGINE_API_KEYYes
Venice.aiVENICE_API_KEYYes
Chutes.aiCHUTES_API_KEYYes
Z.AIZHIPU_API_KEYYes

Security Notes

  • All API keys are stored as Zeroizing<String> -- the key material is automatically overwritten with zeros when the value is dropped from memory.
  • Auth detection (detect_auth()) only checks std::env::var() for presence -- it never reads or logs the actual secret value.
  • Provider API keys set via the REST API (POST /api/providers/{name}/key) follow the same zeroization policy.
  • The health endpoint (/api/health) never exposes provider auth status or API keys. Detailed info is behind /api/health/detail which requires authentication.
  • All DriverConfig and KernelConfig structs implement Debug with secret redaction -- API keys are printed as "***" in logs.