v1.81.14-stable - Claude Sonnet 4.6, Guardrail Garden & Major Performance Improvements

February 21, 2026

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.81.14-stable

pip install litellm
pip install litellm==1.81.14

Key Highlights

Use Claude Sonnet 4.6 on day 0 — reasoning, computer use, prompt caching, and 200K context, working across Anthropic and Vertex AI from the moment it launched
Deploy guardrails without writing code — Guardrail Garden lets you browse a marketplace of pre-built policies (competitor blockers, GDPR PII, EU AI Act, prompt injection) and deploy in one click
Test guardrail policies before shipping — upload a CSV dataset to the compliance playground and validate policies against real traffic; get AI-generated policy suggestions with latency overhead estimates
Turn any OpenAPI spec into an MCP server — paste a spec and get a working MCP server instantly, via API or UI
Call any prompt management system from a single API — the new Prompt Management API works with Langfuse, LangSmith, and others without requiring per-integration code
Major performance batch — 20+ targeted optimizations across router algorithms, logging overhead, cost calculator, and connection management — meaningfully lower latency and CPU overhead on every request

This release includes the largest single batch of performance work since v1.74. The most impactful change moves async/sync callback sorting from per-request to registration time (~30% speedup for callback-heavy deployments). On top of that: Pydantic round-trips eliminated from the logging hot path, OpenAI client init params pre-computed once at startup, quadratic deployment scan removed from usage-based routing, and several O(n²) → O(1) fixes in the router's team filter and model list lookups. Combined, these changes add up for high-throughput deployments that were hitting CPU ceilings.

New Providers and Endpoints

New Providers (1 new provider)

Provider	Supported LiteLLM Endpoints	Description
IBM watsonx.ai	`/rerank`	Rerank support for IBM watsonx.ai models

New LLM API Endpoints (1 new endpoint)

Endpoint	Method	Description	Documentation
`/v1/evals`	POST/GET	OpenAI-compatible Evals API for model evaluation	Docs

New Models / Updated Models

New Model Support (13 new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
Anthropic	`claude-sonnet-4-6`	200K	$3.00	$15.00	Reasoning, computer use, prompt caching, vision, PDF
Vertex AI	`vertex_ai/claude-opus-4-6@default`	1M	$5.00	$25.00	Reasoning, computer use, prompt caching
Google Gemini	`gemini/gemini-3.1-pro-preview`	1M	$2.00	$12.00	Audio, video, images, PDF
Google Gemini	`gemini/gemini-3.1-pro-preview-customtools`	1M	$2.00	$12.00	Custom tools
GitHub Copilot	`github_copilot/gpt-5.3-codex`	128K	-	-	Responses API, function calling, vision
GitHub Copilot	`github_copilot/claude-opus-4.6-fast`	128K	-	-	Chat completions, function calling, vision
Mistral	`mistral/devstral-small-latest`	256K	$0.10	$0.30	Function calling, response schema
Mistral	`mistral/devstral-latest`	256K	$0.40	$2.00	Function calling, response schema
Mistral	`mistral/devstral-medium-latest`	256K	$0.40	$2.00	Function calling, response schema
OpenRouter	`openrouter/minimax/minimax-m2.5`	196K	$0.30	$1.10	Function calling, reasoning, prompt caching
Fireworks AI	`fireworks_ai/accounts/fireworks/models/glm-4p7`	-	-	-	Chat completions
Fireworks AI	`fireworks_ai/accounts/fireworks/models/minimax-m2p1`	-	-	-	Chat completions
Fireworks AI	`fireworks_ai/accounts/fireworks/models/kimi-k2p5`	-	-	-	Chat completions

Features

Anthropic
- Day 0 support for Claude Sonnet 4.6 with reasoning, computer use, and 200K context - PR #21401
- Add Claude Sonnet 4.6 pricing - PR #21395
- Add day 0 feature support for Claude Sonnet 4.6 (streaming, function calling, vision) - PR #21448
- Add reasoning effort and extended thinking support for Sonnet 4.6 - PR #21598
- Fix empty system messages in translate_system_message - PR #21630
- Sanitize Anthropic messages for multi-turn compatibility - PR #21464
- Map websearch tool from /v1/messages to /chat/completions - PR #21465
- Forward reasoning field as reasoning_content in delta streaming - PR #21468
- Add server-side compaction translation from OpenAI to Anthropic format - PR #21555
AWS Bedrock
- Native structured outputs API support (outputConfig.textFormat) - PR #21222
- Support nova/ and nova-2/ spec prefixes for custom imported models - PR #21359
- Broaden Nova 2 model detection to support all nova-2-* variants - PR #21358
- Add Accept header for AgentCore MCP server requests - PR #21551
- Clamp thinking.budget_tokens to minimum 1024 - PR #21306
- Fix parallel_tool_calls mapping for Bedrock Converse - PR #21659
Google Gemini / Vertex AI
- Day 0 support for gemini-3.1-pro-preview - PR #21568
- Fix _map_reasoning_effort_to_thinking_level for all Gemini 3 family models - PR #21654
- Add reasoning support via config for Gemini models - PR #21663
Databricks
- Add Databricks to supported providers for response schema - PR #21368
- Native Responses API support for Databricks GPT models - PR #21460
GitHub Copilot
- Add github_copilot/gpt-5.3-codex and github_copilot/claude-opus-4.6-fast models - PR #21316
- Fix unsupported params for ChatGPT Codex - PR #21209
- Allow GitHub model aliases to reuse upstream model metadata - PR #21497
Mistral
- Add devstral-2512 model aliases (devstral-small-latest, devstral-latest, devstral-medium-latest) - PR #21372
IBM watsonx.ai
- Add native rerank support - PR #21303
xAI
- Fix usage object in xAI responses - PR #21559
Dashscope
- Remove list-to-str transformation that caused incorrect request formatting - PR #21547
hosted_vllm
- Convert thinking blocks to content blocks for multi-turn conversations - PR #21557
OCI / Oracle
- Fix Grok output pricing - PR #21329
AU Anthropic
- Fix au.anthropic.claude-opus-4-6-v1 model ID - PR #20731
General
- Add routing based on reasoning support — skip deployments that don't support reasoning when thinking params are present - PR #21302
- Add stop as supported param for OpenAI and Azure - PR #21539
- Add store and other missing params to OPENAI_CHAT_COMPLETION_PARAMS - PR #21195, PR #21360
- Preserve provider_specific_fields from proxy responses - PR #21220
- Add default usage data configuration - PR #21550

Bug Fixes

AWS Bedrock
- Fix service_tier cost propagation - PR #21172
- Fix per-image pricing for multimodal embeddings - PR #21646
- Use batch_ prefix for Vertex AI batch IDs in encode_file_id_with_model - PR #21624
Bedrock Converse
- Fix Anthropic usage object to match v1/messages spec - PR #21295
Fireworks AI
- Add missing model pricing for glm-4p7, minimax-m2p1, kimi-k2p5 - PR #21642
Responses API
- Fix use None instead of Reasoning() for reasoning parameter - PR #21103
- Preserve metadata for custom callbacks on codex/responses path - PR #21243

LLM API Endpoints

Features

Responses API
- Return finish_reason='tool_calls' when response contains function_call items - PR #19745
- Eliminate per-chunk thread spawning in async streaming path for significantly better throughput - PR #21709
Evals API
- Add support for OpenAI Evals API - PR #21375
Batch API
- Add file deletion criteria with batch references - PR #21456
- Misc bug fixes for managed batches - PR #21157
Pass-Through Endpoints
- Add method-based routing for passthrough endpoints - PR #21543
- Preserve and forward OAuth Authorization headers through proxy layer - PR #19912
Websearch / Tool Calling
- Add DuckDuckGo as a search tool - PR #21467
- Fix pre_call_deployment_hook not triggering via proxy router for websearch - PR #21433
General
- Exclude tool params for models without function calling support - PR #21244
- Add store param to OpenAI chat completion params - PR #21195
- Add reasoning support via config for per-model reasoning configuration - PR #21663

Bugs

General
- Fix api_base resolution error for models with multiple potential endpoints - PR #21658
- Fix session grouping broken for dict rows from query_raw - PR #21435

Management Endpoints / UI

Features

Access Groups
- Add Access Group Selector to Create and Edit flow for Keys/Teams - PR #21234
Virtual Keys
- Fix virtual key grace period from env/UI - PR #20321
- Fix key expiry default duration - PR #21362
- Key Last Active Tracking — see when a key was last used - PR #21545
- Fix /v1/models returning wildcard instead of expanded models for BYOK team keys - PR #21408
- Return failed_tokens in delete_verification_tokens response - PR #21609
Models + Endpoints
- Add Model Settings Modal to Models & Endpoints page - PR #21516
- Allow store_model_in_db to be set via database (not just config) - PR #21511
- Fix input_cost_per_token masked/hidden in Model Info UI - PR #21723
- Fix credentials for UI-created models in batch file uploads - PR #21502
- Resolve credentials for UI-created models - PR #21502
Teams
- Allow team members to view entire team usage - PR #21537
- Fix service account visibility for team members - PR #21627
- Organization Info page: show member email, AntD tabs, reusable MemberTable - PR #21745
Usage / Spend Logs
- Allow filtering Usage by User - PR #21351
- Inject Credential Name as Tag for Usage Page filtering - PR #21715
- Prefix credential tags and update Tag usage banner - PR #21739
- Show retry count for requests in Logs view - PR #21704
- Fix Aggregated Daily Activity Endpoint performance - PR #21613
SSO / Auth
- Fix SSO PKCE support in multi-pod Kubernetes deployments - PR #20314
- Preserve SSO role regardless of role_mappings config - PR #21503
Proxy CLI / Master Key
- Fix master key rotation Prisma validation errors - PR #21330
- Handle missing DATABASE_URL in append_query_params - PR #21239
Project Management
- Add Project Management APIs for organizing resources - PR #21078
UI Improvements
- Content Filters: help edit/view categories and 1-click add with pagination - PR #21223
- Playground: test fallbacks with UI - PR #21007
- Add forward_client_headers_to_llm_api toggle to general settings - PR #21776
- Fix is_premium() debug log spam on every request - PR #20841

Bugs

Spend Logs: Fix cost calculation - PR #21152
Logs: Fix table not updating and pagination issues - PR #21708
Fix /get_image ignoring UI_LOGO_PATH when cached_logo.jpg exists - PR #21637
Fix duplicate URL in tagsSpendLogsCall query string - PR #20909
Preserve key_alias and team_id metadata in /user/daily/activity/aggregated after key deletion or regeneration - PR #20684
Uncomment response_model in user_info endpoint - PR #17430
Allow internal_user_viewer to access RAG endpoints; restrict ingest to existing vector stores - PR #21508
Suppress warning for litellm-dashboard team in agent permission handler - PR #21721

AI Integrations

Logging

DataDog
- Add team tag to logs, metrics, and cost management - PR #21449
Prometheus
- Fix double-counting of litellm_proxy_total_requests_metric - PR #21159
- Guard against None metadata in Prometheus metrics - PR #21489
- Add ASGI middleware for improved Prometheus metrics collection - PR #20434
Langfuse
- Improve Langfuse test isolation (multiple stability fixes) - PR #21214
General
- Fix cost to 0 for cached responses in logging - PR #21816
- Improve streaming proxy throughput by fixing middleware and logging bottlenecks - PR #21501
- Reduce proxy overhead for large base64 payloads - PR #21594
- Close streaming connections to prevent connection pool exhaustion - PR #21213

Guardrails

Guardrail Garden
- Launch Guardrail Garden — a marketplace for pre-built guardrails deployable in one click - PR #21732
- Redesign guardrail creation form with vertical stepper UI - PR #21727
- Add guardrail jump link in log detail view - PR #21437
- Guardrail tracing UI: show policy, detection method, and match details - PR #21349
AI Policy Templates
- Seven new ready-to-deploy policy templates ship in this release:
  - GDPR Art. 32 EU PII Protection - PR #21340
  - EU AI Act Article 5 (5 sub-guardrails, with French language support) - PR #21342, PR #21453, PR #21427
  - Prompt injection detection - PR #21520
  - Aviation and UAE topic filters with tag-based routing - PR #21518
  - Airline off-topic restriction - PR #21607
  - SQL injection - PR #21806
- AI-powered policy template suggestions with latency overhead estimates - PR #21589, PR #21608, PR #21620
Compliance Checker
- Add compliance checker endpoints + UI panel - PR #21432
- CSV dataset upload to compliance playground for batch testing - PR #21526
Built-in Guardrails
- Competitor name blocker: blocks by name, handles streaming, supports name variations, and splits pre/post call - PR #21719, PR #21533
- Topic blocker with both keyword and embedding-based implementations - PR #21713
- Insults content filter - PR #21729
- MCP Security guardrail to block unregistered MCP servers - PR #21429
Generic Guardrails
- Add configurable fallback to handle generic guardrail endpoint connection failures - PR #21245
Presidio
- Fix Presidio controls configuration - PR #21798
LakeraAI
- Avoid KeyError on missing LAKERA_API_KEY during initialization - PR #21422

Prompt Management

Prompt Management API
- New API to interact with prompt management integrations without requiring a PR - PR #17800, PR #17946
- Fix prompt registry configuration issues - PR #21402

Spend Tracking, Budgets and Rate Limiting

Fix Bedrock service_tier cost propagation — costs from service-tier responses now correctly flow through to spend tracking - PR #21172
Fix cost for cached responses — cached responses now correctly log $0 cost instead of re-billing - PR #21816
Aggregate daily activity endpoint performance — faster queries for /user/daily/activity/aggregated - PR #21613
Preserve key_alias and team_id metadata in /user/daily/activity/aggregated after key deletion or regeneration - PR #20684
Inject Credential Name as Tag for granular usage page filtering by credential - PR #21715

MCP Gateway

OpenAPI-to-MCP — Convert any OpenAPI spec to an MCP server via API or UI - PR #21575, PR #21662
MCP User Permissions — Fine-grained permissions for end users on MCP servers - PR #21462
MCP Security Guardrail — Block calls to unregistered MCP servers - PR #21429
Fix StreamableHTTPSessionManager — Revert to stateless mode to prevent session state issues - PR #21323
Fix Bedrock AgentCore Accept header — Add required Accept header for AgentCore MCP server requests - PR #21551

Performance / Loadbalancing / Reliability improvements

Logging & callback overhead

Move async/sync callback separation from per-request to callback registration time — ~30% speedup for callback-heavy deployments - PR #20354
Skip Pydantic Usage round-trip in logging payload — reduces serialization overhead per request - PR #21003
Skip duplicate get_standard_logging_object_payload calls for non-streaming requests - PR #20440
Reuse LiteLLM_Params object across the request lifecycle - PR #20593
Optimize add_litellm_data_to_request hot path - PR #20526
Optimize model_dump_with_preserved_fields - PR #20882
Pre-compute OpenAI client init params at module load instead of per-request - PR #20789
Reduce proxy overhead for large base64 payloads - PR #21594
Improve streaming proxy throughput by fixing middleware and logging bottlenecks - PR #21501
Eliminate per-chunk thread spawning in Responses API async streaming - PR #21709

Cost calculation

Optimize completion_cost() with early-exit and caching - PR #20448
Cost calculator: reduce repeated lookups and dict copies - PR #20541

Router & load balancing

Remove quadratic deployment scan in usage-based routing v2 - PR #21211
Avoid O(n²) membership scans in team deployment filter - PR #21210
Avoid O(n) alias scan for non-alias get_model_list lookups - PR #21136
Increase default LRU cache size to reduce multi-model cache thrash - PR #21139
Cache get_model_access_groups() no-args result on Router - PR #20374
Deployment affinity routing callback — route to the same deployment for a session - PR #19143
Complexity-based auto routing — new router strategy that routes based on request complexity - PR #21789
Session-ID-based routing — use session_id for consistent routing within a session - PR #21763

Connection management & reliability

Fix Redis connection pool reliability — prevent connection exhaustion under load - PR #21717
Fix Prisma connection self-heal for auth and runtime reconnection (reverted, will be re-introduced with fixes) - PR #21706
Make PodLockManager.release_lock atomic compare-and-delete - PR #21226

Database Changes

Schema Updates

Table	Change Type	Description	PR
`LiteLLM_DeletedVerificationToken`	New Column	Added `project_id` column	PR #21587
`LiteLLM_ProjectTable`	New Table	Project management for organizing resources	PR #21078
`LiteLLM_VerificationToken`	New Column	Added `last_active` timestamp for key activity tracking	PR #21545
`LiteLLM_ManagedVectorStoreTable`	Migration	Make vector store migration idempotent	PR #21325

Documentation Updates

Add OpenAI Agents SDK with LiteLLM guide - PR #21311
Access Groups documentation - PR #21236
Anthropic beta headers documentation - PR #21320
Latency overhead troubleshooting guide - PR #21600, PR #21603
Add rollback safety check guide - PR #21743
Incident report: vLLM Embeddings broken by encoding_format parameter - PR #21474
Incident report: Claude Code beta headers - PR #21485
Mark v1.81.12 as stable - PR #21809

New Contributors

@mjkam made their first contribution in PR #21306
@saneroen made their first contribution in PR #21243
@vincentkoc made their first contribution in PR #21239
@felixti made their first contribution in PR #19745
@anttttti made their first contribution in PR #20731
@ndgigliotti made their first contribution in PR #21222
@iamadamreed made their first contribution in PR #19912
@sahukanishka made their first contribution in PR #21220
@namabile made their first contribution in PR #21195
@stronk7 made their first contribution in PR #21372
@ZeroAurora made their first contribution in PR #21547
@SolitudePy made their first contribution in PR #21497
@SherifWaly made their first contribution in PR #21557
@dkindlund made their first contribution in PR #21633
@cagojeiger made their first contribution in PR #21664

Full Changelog

v1.81.12.rc.1...v1.81.14.rc.1

Deploy this version​

Key Highlights​

New Providers and Endpoints​

New Providers (1 new provider)​

New LLM API Endpoints (1 new endpoint)​

New Models / Updated Models​

New Model Support (13 new models)​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Prompt Management​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

Database Changes​

Schema Updates​

Documentation Updates​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

New Providers and Endpoints

New Providers (1 new provider)

New LLM API Endpoints (1 new endpoint)

New Models / Updated Models

New Model Support (13 new models)

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Prompt Management

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

Database Changes

Schema Updates

Documentation Updates

New Contributors

Full Changelog