Skip to main content

v1.81.14-stable - Claude Sonnet 4.6, Guardrail Garden & Major Performance Improvements

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.81.14-stable

Key Highlights​


This release includes the largest single batch of performance work since v1.74. The most impactful change moves async/sync callback sorting from per-request to registration time (~30% speedup for callback-heavy deployments). On top of that: Pydantic round-trips eliminated from the logging hot path, OpenAI client init params pre-computed once at startup, quadratic deployment scan removed from usage-based routing, and several O(n²) → O(1) fixes in the router's team filter and model list lookups. Combined, these changes add up for high-throughput deployments that were hitting CPU ceilings.


New Providers and Endpoints​

New Providers (1 new provider)​

ProviderSupported LiteLLM EndpointsDescription
IBM watsonx.ai/rerankRerank support for IBM watsonx.ai models

New LLM API Endpoints (1 new endpoint)​

EndpointMethodDescriptionDocumentation
/v1/evalsPOST/GETOpenAI-compatible Evals API for model evaluationDocs

New Models / Updated Models​

New Model Support (13 new models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Anthropicclaude-sonnet-4-6200K$3.00$15.00Reasoning, computer use, prompt caching, vision, PDF
Vertex AIvertex_ai/claude-opus-4-6@default1M$5.00$25.00Reasoning, computer use, prompt caching
Google Geminigemini/gemini-3.1-pro-preview1M$2.00$12.00Audio, video, images, PDF
Google Geminigemini/gemini-3.1-pro-preview-customtools1M$2.00$12.00Custom tools
GitHub Copilotgithub_copilot/gpt-5.3-codex128K--Responses API, function calling, vision
GitHub Copilotgithub_copilot/claude-opus-4.6-fast128K--Chat completions, function calling, vision
Mistralmistral/devstral-small-latest256K$0.10$0.30Function calling, response schema
Mistralmistral/devstral-latest256K$0.40$2.00Function calling, response schema
Mistralmistral/devstral-medium-latest256K$0.40$2.00Function calling, response schema
OpenRouteropenrouter/minimax/minimax-m2.5196K$0.30$1.10Function calling, reasoning, prompt caching
Fireworks AIfireworks_ai/accounts/fireworks/models/glm-4p7---Chat completions
Fireworks AIfireworks_ai/accounts/fireworks/models/minimax-m2p1---Chat completions
Fireworks AIfireworks_ai/accounts/fireworks/models/kimi-k2p5---Chat completions

Features​

  • Anthropic

    • Day 0 support for Claude Sonnet 4.6 with reasoning, computer use, and 200K context - PR #21401
    • Add Claude Sonnet 4.6 pricing - PR #21395
    • Add day 0 feature support for Claude Sonnet 4.6 (streaming, function calling, vision) - PR #21448
    • Add reasoning effort and extended thinking support for Sonnet 4.6 - PR #21598
    • Fix empty system messages in translate_system_message - PR #21630
    • Sanitize Anthropic messages for multi-turn compatibility - PR #21464
    • Map websearch tool from /v1/messages to /chat/completions - PR #21465
    • Forward reasoning field as reasoning_content in delta streaming - PR #21468
    • Add server-side compaction translation from OpenAI to Anthropic format - PR #21555
  • AWS Bedrock

    • Native structured outputs API support (outputConfig.textFormat) - PR #21222
    • Support nova/ and nova-2/ spec prefixes for custom imported models - PR #21359
    • Broaden Nova 2 model detection to support all nova-2-* variants - PR #21358
    • Add Accept header for AgentCore MCP server requests - PR #21551
    • Clamp thinking.budget_tokens to minimum 1024 - PR #21306
    • Fix parallel_tool_calls mapping for Bedrock Converse - PR #21659
  • Google Gemini / Vertex AI

    • Day 0 support for gemini-3.1-pro-preview - PR #21568
    • Fix _map_reasoning_effort_to_thinking_level for all Gemini 3 family models - PR #21654
    • Add reasoning support via config for Gemini models - PR #21663
  • Databricks

    • Add Databricks to supported providers for response schema - PR #21368
    • Native Responses API support for Databricks GPT models - PR #21460
  • GitHub Copilot

    • Add github_copilot/gpt-5.3-codex and github_copilot/claude-opus-4.6-fast models - PR #21316
    • Fix unsupported params for ChatGPT Codex - PR #21209
    • Allow GitHub model aliases to reuse upstream model metadata - PR #21497
  • Mistral

    • Add devstral-2512 model aliases (devstral-small-latest, devstral-latest, devstral-medium-latest) - PR #21372
  • IBM watsonx.ai

  • xAI

    • Fix usage object in xAI responses - PR #21559
  • Dashscope

    • Remove list-to-str transformation that caused incorrect request formatting - PR #21547
  • hosted_vllm

    • Convert thinking blocks to content blocks for multi-turn conversations - PR #21557
  • OCI / Oracle

  • AU Anthropic

    • Fix au.anthropic.claude-opus-4-6-v1 model ID - PR #20731
  • General

    • Add routing based on reasoning support — skip deployments that don't support reasoning when thinking params are present - PR #21302
    • Add stop as supported param for OpenAI and Azure - PR #21539
    • Add store and other missing params to OPENAI_CHAT_COMPLETION_PARAMS - PR #21195, PR #21360
    • Preserve provider_specific_fields from proxy responses - PR #21220
    • Add default usage data configuration - PR #21550

Bug Fixes​


LLM API Endpoints​

Features​

  • Responses API

    • Return finish_reason='tool_calls' when response contains function_call items - PR #19745
    • Eliminate per-chunk thread spawning in async streaming path for significantly better throughput - PR #21709
  • Evals API

    • Add support for OpenAI Evals API - PR #21375
  • Batch API

    • Add file deletion criteria with batch references - PR #21456
    • Misc bug fixes for managed batches - PR #21157
  • Pass-Through Endpoints

    • Add method-based routing for passthrough endpoints - PR #21543
    • Preserve and forward OAuth Authorization headers through proxy layer - PR #19912
  • Websearch / Tool Calling

    • Add DuckDuckGo as a search tool - PR #21467
    • Fix pre_call_deployment_hook not triggering via proxy router for websearch - PR #21433
  • General

    • Exclude tool params for models without function calling support - PR #21244
    • Add store param to OpenAI chat completion params - PR #21195
    • Add reasoning support via config for per-model reasoning configuration - PR #21663

Bugs​

  • General
    • Fix api_base resolution error for models with multiple potential endpoints - PR #21658
    • Fix session grouping broken for dict rows from query_raw - PR #21435

Management Endpoints / UI​

Features​

  • Access Groups

    • Add Access Group Selector to Create and Edit flow for Keys/Teams - PR #21234
  • Virtual Keys

    • Fix virtual key grace period from env/UI - PR #20321
    • Fix key expiry default duration - PR #21362
    • Key Last Active Tracking — see when a key was last used - PR #21545
    • Fix /v1/models returning wildcard instead of expanded models for BYOK team keys - PR #21408
    • Return failed_tokens in delete_verification_tokens response - PR #21609
  • Models + Endpoints

    • Add Model Settings Modal to Models & Endpoints page - PR #21516
    • Allow store_model_in_db to be set via database (not just config) - PR #21511
    • Fix input_cost_per_token masked/hidden in Model Info UI - PR #21723
    • Fix credentials for UI-created models in batch file uploads - PR #21502
    • Resolve credentials for UI-created models - PR #21502
  • Teams

    • Allow team members to view entire team usage - PR #21537
    • Fix service account visibility for team members - PR #21627
    • Organization Info page: show member email, AntD tabs, reusable MemberTable - PR #21745
  • Usage / Spend Logs

    • Allow filtering Usage by User - PR #21351
    • Inject Credential Name as Tag for Usage Page filtering - PR #21715
    • Prefix credential tags and update Tag usage banner - PR #21739
    • Show retry count for requests in Logs view - PR #21704
    • Fix Aggregated Daily Activity Endpoint performance - PR #21613
  • SSO / Auth

    • Fix SSO PKCE support in multi-pod Kubernetes deployments - PR #20314
    • Preserve SSO role regardless of role_mappings config - PR #21503
  • Proxy CLI / Master Key

    • Fix master key rotation Prisma validation errors - PR #21330
    • Handle missing DATABASE_URL in append_query_params - PR #21239
  • Project Management

    • Add Project Management APIs for organizing resources - PR #21078
  • UI Improvements

    • Content Filters: help edit/view categories and 1-click add with pagination - PR #21223
    • Playground: test fallbacks with UI - PR #21007
    • Add forward_client_headers_to_llm_api toggle to general settings - PR #21776
    • Fix is_premium() debug log spam on every request - PR #20841

Bugs​

  • Spend Logs: Fix cost calculation - PR #21152
  • Logs: Fix table not updating and pagination issues - PR #21708
  • Fix /get_image ignoring UI_LOGO_PATH when cached_logo.jpg exists - PR #21637
  • Fix duplicate URL in tagsSpendLogsCall query string - PR #20909
  • Preserve key_alias and team_id metadata in /user/daily/activity/aggregated after key deletion or regeneration - PR #20684
  • Uncomment response_model in user_info endpoint - PR #17430
  • Allow internal_user_viewer to access RAG endpoints; restrict ingest to existing vector stores - PR #21508
  • Suppress warning for litellm-dashboard team in agent permission handler - PR #21721

AI Integrations​

Logging​

  • DataDog

    • Add team tag to logs, metrics, and cost management - PR #21449
  • Prometheus

    • Fix double-counting of litellm_proxy_total_requests_metric - PR #21159
    • Guard against None metadata in Prometheus metrics - PR #21489
    • Add ASGI middleware for improved Prometheus metrics collection - PR #20434
  • Langfuse

    • Improve Langfuse test isolation (multiple stability fixes) - PR #21214
  • General

    • Fix cost to 0 for cached responses in logging - PR #21816
    • Improve streaming proxy throughput by fixing middleware and logging bottlenecks - PR #21501
    • Reduce proxy overhead for large base64 payloads - PR #21594
    • Close streaming connections to prevent connection pool exhaustion - PR #21213

Guardrails​

  • Guardrail Garden

    • Launch Guardrail Garden — a marketplace for pre-built guardrails deployable in one click - PR #21732
    • Redesign guardrail creation form with vertical stepper UI - PR #21727
    • Add guardrail jump link in log detail view - PR #21437
    • Guardrail tracing UI: show policy, detection method, and match details - PR #21349
  • AI Policy Templates

  • Compliance Checker

    • Add compliance checker endpoints + UI panel - PR #21432
    • CSV dataset upload to compliance playground for batch testing - PR #21526
  • Built-in Guardrails

    • Competitor name blocker: blocks by name, handles streaming, supports name variations, and splits pre/post call - PR #21719, PR #21533
    • Topic blocker with both keyword and embedding-based implementations - PR #21713
    • Insults content filter - PR #21729
    • MCP Security guardrail to block unregistered MCP servers - PR #21429
  • Generic Guardrails

    • Add configurable fallback to handle generic guardrail endpoint connection failures - PR #21245
  • Presidio

    • Fix Presidio controls configuration - PR #21798
  • LakeraAI

    • Avoid KeyError on missing LAKERA_API_KEY during initialization - PR #21422

Prompt Management​

  • Prompt Management API
    • New API to interact with prompt management integrations without requiring a PR - PR #17800, PR #17946
    • Fix prompt registry configuration issues - PR #21402

Spend Tracking, Budgets and Rate Limiting​

  • Fix Bedrock service_tier cost propagation — costs from service-tier responses now correctly flow through to spend tracking - PR #21172
  • Fix cost for cached responses — cached responses now correctly log $0 cost instead of re-billing - PR #21816
  • Aggregate daily activity endpoint performance — faster queries for /user/daily/activity/aggregated - PR #21613
  • Preserve key_alias and team_id metadata in /user/daily/activity/aggregated after key deletion or regeneration - PR #20684
  • Inject Credential Name as Tag for granular usage page filtering by credential - PR #21715

MCP Gateway​

  • OpenAPI-to-MCP — Convert any OpenAPI spec to an MCP server via API or UI - PR #21575, PR #21662
  • MCP User Permissions — Fine-grained permissions for end users on MCP servers - PR #21462
  • MCP Security Guardrail — Block calls to unregistered MCP servers - PR #21429
  • Fix StreamableHTTPSessionManager — Revert to stateless mode to prevent session state issues - PR #21323
  • Fix Bedrock AgentCore Accept header — Add required Accept header for AgentCore MCP server requests - PR #21551

Performance / Loadbalancing / Reliability improvements​

Logging & callback overhead

  • Move async/sync callback separation from per-request to callback registration time — ~30% speedup for callback-heavy deployments - PR #20354
  • Skip Pydantic Usage round-trip in logging payload — reduces serialization overhead per request - PR #21003
  • Skip duplicate get_standard_logging_object_payload calls for non-streaming requests - PR #20440
  • Reuse LiteLLM_Params object across the request lifecycle - PR #20593
  • Optimize add_litellm_data_to_request hot path - PR #20526
  • Optimize model_dump_with_preserved_fields - PR #20882
  • Pre-compute OpenAI client init params at module load instead of per-request - PR #20789
  • Reduce proxy overhead for large base64 payloads - PR #21594
  • Improve streaming proxy throughput by fixing middleware and logging bottlenecks - PR #21501
  • Eliminate per-chunk thread spawning in Responses API async streaming - PR #21709

Cost calculation

  • Optimize completion_cost() with early-exit and caching - PR #20448
  • Cost calculator: reduce repeated lookups and dict copies - PR #20541

Router & load balancing

  • Remove quadratic deployment scan in usage-based routing v2 - PR #21211
  • Avoid O(n²) membership scans in team deployment filter - PR #21210
  • Avoid O(n) alias scan for non-alias get_model_list lookups - PR #21136
  • Increase default LRU cache size to reduce multi-model cache thrash - PR #21139
  • Cache get_model_access_groups() no-args result on Router - PR #20374
  • Deployment affinity routing callback — route to the same deployment for a session - PR #19143
  • Complexity-based auto routing — new router strategy that routes based on request complexity - PR #21789
  • Session-ID-based routing — use session_id for consistent routing within a session - PR #21763

Connection management & reliability

  • Fix Redis connection pool reliability — prevent connection exhaustion under load - PR #21717
  • Fix Prisma connection self-heal for auth and runtime reconnection (reverted, will be re-introduced with fixes) - PR #21706
  • Make PodLockManager.release_lock atomic compare-and-delete - PR #21226

Database Changes​

Schema Updates​

TableChange TypeDescriptionPR
LiteLLM_DeletedVerificationTokenNew ColumnAdded project_id columnPR #21587
LiteLLM_ProjectTableNew TableProject management for organizing resourcesPR #21078
LiteLLM_VerificationTokenNew ColumnAdded last_active timestamp for key activity trackingPR #21545
LiteLLM_ManagedVectorStoreTableMigrationMake vector store migration idempotentPR #21325

Documentation Updates​

  • Add OpenAI Agents SDK with LiteLLM guide - PR #21311
  • Access Groups documentation - PR #21236
  • Anthropic beta headers documentation - PR #21320
  • Latency overhead troubleshooting guide - PR #21600, PR #21603
  • Add rollback safety check guide - PR #21743
  • Incident report: vLLM Embeddings broken by encoding_format parameter - PR #21474
  • Incident report: Claude Code beta headers - PR #21485
  • Mark v1.81.12 as stable - PR #21809

New Contributors​

  • @mjkam made their first contribution in PR #21306
  • @saneroen made their first contribution in PR #21243
  • @vincentkoc made their first contribution in PR #21239
  • @felixti made their first contribution in PR #19745
  • @anttttti made their first contribution in PR #20731
  • @ndgigliotti made their first contribution in PR #21222
  • @iamadamreed made their first contribution in PR #19912
  • @sahukanishka made their first contribution in PR #21220
  • @namabile made their first contribution in PR #21195
  • @stronk7 made their first contribution in PR #21372
  • @ZeroAurora made their first contribution in PR #21547
  • @SolitudePy made their first contribution in PR #21497
  • @SherifWaly made their first contribution in PR #21557
  • @dkindlund made their first contribution in PR #21633
  • @cagojeiger made their first contribution in PR #21664

Full Changelog​

v1.81.12.rc.1...v1.81.14.rc.1