Tokens saved per day
Recent requests
| Time | Provider | Raw | Saved | Ratio |
|---|
Your Plan
Upgrade your plan
Unlock all 6 strategies, LLMLingua compression, and unlimited savings.
- All 6 optimization strategies
- 200K token savings/day
- Basic dashboard
- No account required
- All 6 strategies
- LLMLingua-2 compression
- Unlimited token savings
- Full analytics dashboard
- Everything in Pro
- Shared context pools
- SSO + audit logs
- On-premise deployment
Documentation
Trimli AI is a transparent optimization proxy that reduces token consumption across AI coding tools. It intercepts API requests, compresses messages using 6 strategies, and forwards to the upstream provider. Your tools work exactly as before — just faster and cheaper.
Quickstart
1. Install the VS Code extension
Search "Trimli AI" in the VS Code Marketplace, or install from the command line:
code --install-extension trimliai.trimli-vscode
The extension starts a local optimization proxy on http://localhost:8765 and auto-configures supported tools.
2. Sign in (optional)
Open the Command Palette (Cmd+Shift+P) and run "Trimli AI: Sign In". This links your account for dashboard analytics and tier upgrades. The optimizer works without signing in — you just won't see stats here.
3. Use your AI tools as normal
That's it. The proxy optimizes messages transparently. Check the status bar in VS Code for a live token savings counter, or visit this dashboard to see detailed analytics.
How it works
Trimli operates as a reverse proxy between your AI tool and the provider's API. When a tool sends a request:
- The request arrives at
localhost:8765 - Trimli detects the API format (OpenAI, Anthropic, or Google)
- Messages are compressed using up to 6 strategies (cheapest first, stops when token budget is met)
- The optimized request is forwarded to the real API
- The response streams back to your tool unchanged
Non-message endpoints (/v1/embeddings, /v1/models, /v1/audio, etc.) pass through without modification.
tool_use blocks (structured JSON) are never modified. Only text content within tool_result blocks is optimized. System messages are never compressed by LLMLingua to preserve critical instructions.
VS Code extension
The VS Code extension is the primary way to use Trimli. It manages the proxy lifecycle, auto-configures tools, and provides a dashboard.
Installation
- Open VS Code
- Go to Extensions (
Cmd+Shift+X) - Search "Trimli AI"
- Click Install
The proxy starts automatically on activation. You'll see a ⚡ icon in the status bar showing cumulative token savings.
Commands
Trimli AI: Show Dashboard — Open the savings dashboard
Trimli AI: Sign In — Link your account via magic link
Trimli AI: Toggle Forward Proxy — Enable env var injection for terminal tools
Trimli AI: Optimize Now — Optimize selected text in the editor
Settings
tokOptimizer.enabled — Enable/disable optimization (default: true)
tokOptimizer.pythonServiceUrl — Custom Python service URL (default: localhost:8766)
tokOptimizer.hostedServiceUrl — Hosted service URL (default: Railway)
tokOptimizer.forwardProxy.enabled — Enable forward proxy mode (default: true)
Claude Code
Claude Code Auto-configured
Claude Code picks up the ANTHROPIC_BASE_URL environment variable automatically when launched from a VS Code terminal.
Setup
- Make sure the Trimli VS Code extension is installed and running
- Open a terminal inside VS Code (
Ctrl+`) - Run
claudeas usual
The extension automatically injects ANTHROPIC_BASE_URL=http://localhost:8765 into VS Code terminal sessions. Claude Code reads this and routes all API traffic through the optimizer.
curl http://localhost:8765/health from the terminal to confirm the proxy is running.
Manual setup (outside VS Code)
If you run Claude Code outside VS Code (e.g., in iTerm, Terminal.app, or Warp), set the environment variable manually:
# Add to your shell profile (~/.zshrc, ~/.bashrc, etc.)
export ANTHROPIC_BASE_URL=http://localhost:8765
# Then run Claude Code as normal
claude
Continue
Continue Auto-configured
Trimli auto-configures Continue's config.json on activation. You can also set it manually.
Automatic setup
When the Trimli extension activates, it checks ~/.continue/config.json and sets apiBase to the proxy URL if the field is empty or points to localhost. No action needed.
Manual setup
Edit your Continue config file:
// ~/.continue/config.json
{
"models": [
{
"title": "GPT-4o",
"provider": "openai",
"model": "gpt-4o",
"apiKey": "sk-...",
"apiBase": "http://localhost:8765/v1/"
}
]
}
For Anthropic models in Continue:
{
"title": "Claude Sonnet",
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"apiKey": "sk-ant-...",
"apiBase": "http://localhost:8765"
}
/v1/ prefix. The proxy handles both /v1/chat/completions and /chat/completions.
Cline
Cline Auto-configured
Trimli updates Cline's VS Code settings on activation when in OpenAI-compatible mode.
Manual setup
- Open Cline settings in VS Code
- Select OpenAI Compatible as the provider
- Set the API Base URL to
http://localhost:8765/v1 - Enter your API key as normal
Cursor
Cursor Own API key only
Cursor supports custom base URLs only in "own API key" mode. Built-in models route through Cursor's servers and cannot be intercepted.
Setup
- Open Cursor Settings
- Go to AI > Advanced
- Enable "Use own API key"
- Set OpenAI Base URL to
http://localhost:8765 - Enter your OpenAI API key
OpenAI-compatible tools
Any tool that supports a custom OpenAI base URL works with Trimli. Set the base URL to http://localhost:8765 and use your API key as normal.
Environment variable method
# Works for any tool that reads OPENAI_BASE_URL
export OPENAI_BASE_URL=http://localhost:8765
# For Anthropic-compatible tools
export ANTHROPIC_BASE_URL=http://localhost:8765
Forward proxy method
Enable forward proxy mode via Command Palette: "Trimli AI: Toggle Forward Proxy". This injects OPENAI_BASE_URL and ANTHROPIC_BASE_URL into all VS Code terminal sessions automatically.
Supported API endpoints
| Endpoint | Provider | Optimized |
|---|---|---|
POST /v1/chat/completions | OpenAI | Yes |
POST /v1/responses | OpenAI | Yes |
POST /v1/messages | Anthropic | Yes |
POST /v1/messages/batches | Anthropic | Yes (each request) |
POST /v1/threads/*/messages | OpenAI Assistants | Yes |
POST /v1/threads/*/runs | OpenAI Assistants | Yes |
POST /v1/messages/count_tokens | Anthropic | Pass-through |
/v1/embeddings, /v1/audio, etc. | Any | Pass-through |
Optimization strategies
Strategies run in cheapest-first order and stop early once the token budget (default 8,000) is met. Lossless strategies always run; lossy strategies only fire when the message exceeds the budget.
| Strategy | Type | Typical savings | Description |
|---|---|---|---|
whitespace-normalize | Lossless | 3-8% | Collapses excessive spaces, blank lines, and trailing whitespace |
deduplicate | Lossless | 5-20% | Removes repeated sentences across messages |
intent-distill | Lossy | 10-30% | Strips filler phrases ("I was wondering if you could please...") |
reference-substitute | Lossy | 10-25% | Aliases long strings (file paths, URLs) that appear 2+ times |
history-summarize | Lossy | 30-50% | Compresses old conversation turns into a summary (no LLM call) |
context-prune | Lossy | 20-40% | Drops low-relevance messages when over budget |
On Pro and Enterprise tiers, the Python service adds two additional ML-powered strategies:
- LLMLingua-2 — semantic compression using Microsoft's model (preserves meaning while removing redundant tokens)
- Vector relevance scoring — uses sentence-transformers to score message relevance and drop irrelevant content
API reference
Health check
GET http://localhost:8765/health
Response:
{
"status": "ok",
"version": "0.1.0",
"mode": "reverse+forward",
"stateless": true
}
Model list
GET http://localhost:8765/v1/models
Returns a list of supported models (for Cursor compatibility).
Optimized endpoints
All optimized endpoints add these response headers:
X-TokOptimizer-Saved: <number>— tokens saved by optimizationX-Tok-Optimizer-Tier: free|pro|enterprise— current tier
# OpenAI chat completions
POST http://localhost:8765/v1/chat/completions
Authorization: Bearer sk-...
# Anthropic messages
POST http://localhost:8765/v1/messages
x-api-key: sk-ant-...
anthropic-version: 2023-06-01
# OpenAI Responses API
POST http://localhost:8765/v1/responses
Authorization: Bearer sk-...
FAQ
Does Trimli affect response quality?
No. Our test suite includes 59 accuracy tests that compare optimized vs direct responses across factual, code generation, reasoning, and multi-turn categories. All tests pass with zero quality degradation at ~46% average compression.
What if the proxy is not running?
If you set ANTHROPIC_BASE_URL or OPENAI_BASE_URL and the proxy is down, your tool will fail to connect. Either start VS Code (which starts the proxy) or remove the environment variables.
Can I use it with Azure OpenAI?
Yes. The proxy detects Azure requests by the api-version query parameter. Set your Azure base URL to the proxy and include an x-original-host header with your Azure endpoint hostname so the proxy knows where to forward.
Does it work with streaming?
Yes. Streaming responses (SSE) pass through the proxy unchanged. Only the input messages are optimized — the response stream is piped directly from the upstream API.
Is my data sent to Trimli servers?
The optimization runs locally in the VS Code extension. On Pro/Enterprise tiers, messages may be sent to the hosted Python service for ML-powered compression (LLMLingua). No messages are stored or logged. API keys are never seen by Trimli — the Authorization header is forwarded transparently.
What's the difference between Free and Pro?
Free and Pro get the same 6 optimization strategies. The only difference is the daily token savings cap: Free is limited to 200,000 tokens/day, Pro is unlimited. Pro also gets access to LLMLingua ML compression on the hosted service.