FREE
Tokens Saved
Cost Saved
Compression
Requests

Tokens saved per day

Recent requests

TimeProviderRawSavedRatio

Upgrade your plan

Unlock all 6 strategies, LLMLingua compression, and unlimited savings.

Free
$0
  • All 6 optimization strategies
  • 200K token savings/day
  • Basic dashboard
  • No account required
Enterprise
$30/seat/mo
  • Everything in Pro
  • Shared context pools
  • SSO + audit logs
  • On-premise deployment
Can I cancel anytime? Yes. Cancel from the Account page — no lock-in, no questions.

Documentation

Trimli AI is a transparent optimization proxy that reduces token consumption across AI coding tools. It intercepts API requests, compresses messages using 6 strategies, and forwards to the upstream provider. Your tools work exactly as before — just faster and cheaper.

Quickstart

1. Install the VS Code extension

Search "Trimli AI" in the VS Code Marketplace, or install from the command line:

code --install-extension trimliai.trimli-vscode

The extension starts a local optimization proxy on http://localhost:8765 and auto-configures supported tools.

2. Sign in (optional)

Open the Command Palette (Cmd+Shift+P) and run "Trimli AI: Sign In". This links your account for dashboard analytics and tier upgrades. The optimizer works without signing in — you just won't see stats here.

3. Use your AI tools as normal

That's it. The proxy optimizes messages transparently. Check the status bar in VS Code for a live token savings counter, or visit this dashboard to see detailed analytics.

No API keys stored. The proxy forwards your Authorization header to the upstream API unchanged. Trimli never sees, stores, or logs your API keys.

How it works

Trimli operates as a reverse proxy between your AI tool and the provider's API. When a tool sends a request:

  1. The request arrives at localhost:8765
  2. Trimli detects the API format (OpenAI, Anthropic, or Google)
  3. Messages are compressed using up to 6 strategies (cheapest first, stops when token budget is met)
  4. The optimized request is forwarded to the real API
  5. The response streams back to your tool unchanged

Non-message endpoints (/v1/embeddings, /v1/models, /v1/audio, etc.) pass through without modification.

Important: tool_use blocks (structured JSON) are never modified. Only text content within tool_result blocks is optimized. System messages are never compressed by LLMLingua to preserve critical instructions.

VS Code extension

The VS Code extension is the primary way to use Trimli. It manages the proxy lifecycle, auto-configures tools, and provides a dashboard.

Installation

  1. Open VS Code
  2. Go to Extensions (Cmd+Shift+X)
  3. Search "Trimli AI"
  4. Click Install

The proxy starts automatically on activation. You'll see a icon in the status bar showing cumulative token savings.

Commands

Trimli AI: Show Dashboard        — Open the savings dashboard
Trimli AI: Sign In               — Link your account via magic link
Trimli AI: Toggle Forward Proxy  — Enable env var injection for terminal tools
Trimli AI: Optimize Now          — Optimize selected text in the editor

Settings

tokOptimizer.enabled              — Enable/disable optimization (default: true)
tokOptimizer.pythonServiceUrl     — Custom Python service URL (default: localhost:8766)
tokOptimizer.hostedServiceUrl     — Hosted service URL (default: Railway)
tokOptimizer.forwardProxy.enabled — Enable forward proxy mode (default: true)

Claude Code

Claude Code Auto-configured

Claude Code picks up the ANTHROPIC_BASE_URL environment variable automatically when launched from a VS Code terminal.

Setup

  1. Make sure the Trimli VS Code extension is installed and running
  2. Open a terminal inside VS Code (Ctrl+`)
  3. Run claude as usual

The extension automatically injects ANTHROPIC_BASE_URL=http://localhost:8765 into VS Code terminal sessions. Claude Code reads this and routes all API traffic through the optimizer.

Verify it's working: After your first message, check the VS Code status bar — the ⚡ counter should increase. You can also run curl http://localhost:8765/health from the terminal to confirm the proxy is running.

Manual setup (outside VS Code)

If you run Claude Code outside VS Code (e.g., in iTerm, Terminal.app, or Warp), set the environment variable manually:

# Add to your shell profile (~/.zshrc, ~/.bashrc, etc.)
export ANTHROPIC_BASE_URL=http://localhost:8765

# Then run Claude Code as normal
claude
Note: The proxy must be running (VS Code extension active) for this to work. If the proxy is not running, Claude Code will fail to connect. Remove the env var if you uninstall the extension.

Continue

Continue Auto-configured

Trimli auto-configures Continue's config.json on activation. You can also set it manually.

Automatic setup

When the Trimli extension activates, it checks ~/.continue/config.json and sets apiBase to the proxy URL if the field is empty or points to localhost. No action needed.

Manual setup

Edit your Continue config file:

// ~/.continue/config.json
{
  "models": [
    {
      "title": "GPT-4o",
      "provider": "openai",
      "model": "gpt-4o",
      "apiKey": "sk-...",
      "apiBase": "http://localhost:8765/v1/"
    }
  ]
}

For Anthropic models in Continue:

{
  "title": "Claude Sonnet",
  "provider": "anthropic",
  "model": "claude-sonnet-4-6",
  "apiKey": "sk-ant-...",
  "apiBase": "http://localhost:8765"
}
Path note: Continue may send requests without the /v1/ prefix. The proxy handles both /v1/chat/completions and /chat/completions.

Cline

Cline Auto-configured

Trimli updates Cline's VS Code settings on activation when in OpenAI-compatible mode.

Manual setup

  1. Open Cline settings in VS Code
  2. Select OpenAI Compatible as the provider
  3. Set the API Base URL to http://localhost:8765/v1
  4. Enter your API key as normal

Cursor

Cursor Own API key only

Cursor supports custom base URLs only in "own API key" mode. Built-in models route through Cursor's servers and cannot be intercepted.

Setup

  1. Open Cursor Settings
  2. Go to AI > Advanced
  3. Enable "Use own API key"
  4. Set OpenAI Base URL to http://localhost:8765
  5. Enter your OpenAI API key
Limitation: Only requests made with your own API key are routed through the proxy. Cursor's built-in models (when using Cursor's subscription) route through Cursor's servers directly and cannot be optimized.

OpenAI-compatible tools

Any tool that supports a custom OpenAI base URL works with Trimli. Set the base URL to http://localhost:8765 and use your API key as normal.

Environment variable method

# Works for any tool that reads OPENAI_BASE_URL
export OPENAI_BASE_URL=http://localhost:8765

# For Anthropic-compatible tools
export ANTHROPIC_BASE_URL=http://localhost:8765

Forward proxy method

Enable forward proxy mode via Command Palette: "Trimli AI: Toggle Forward Proxy". This injects OPENAI_BASE_URL and ANTHROPIC_BASE_URL into all VS Code terminal sessions automatically.

Supported API endpoints

EndpointProviderOptimized
POST /v1/chat/completionsOpenAIYes
POST /v1/responsesOpenAIYes
POST /v1/messagesAnthropicYes
POST /v1/messages/batchesAnthropicYes (each request)
POST /v1/threads/*/messagesOpenAI AssistantsYes
POST /v1/threads/*/runsOpenAI AssistantsYes
POST /v1/messages/count_tokensAnthropicPass-through
/v1/embeddings, /v1/audio, etc.AnyPass-through

Optimization strategies

Strategies run in cheapest-first order and stop early once the token budget (default 8,000) is met. Lossless strategies always run; lossy strategies only fire when the message exceeds the budget.

StrategyTypeTypical savingsDescription
whitespace-normalizeLossless3-8%Collapses excessive spaces, blank lines, and trailing whitespace
deduplicateLossless5-20%Removes repeated sentences across messages
intent-distillLossy10-30%Strips filler phrases ("I was wondering if you could please...")
reference-substituteLossy10-25%Aliases long strings (file paths, URLs) that appear 2+ times
history-summarizeLossy30-50%Compresses old conversation turns into a summary (no LLM call)
context-pruneLossy20-40%Drops low-relevance messages when over budget

On Pro and Enterprise tiers, the Python service adds two additional ML-powered strategies:

  • LLMLingua-2 — semantic compression using Microsoft's model (preserves meaning while removing redundant tokens)
  • Vector relevance scoring — uses sentence-transformers to score message relevance and drop irrelevant content

API reference

Health check

GET http://localhost:8765/health

Response:
{
  "status": "ok",
  "version": "0.1.0",
  "mode": "reverse+forward",
  "stateless": true
}

Model list

GET http://localhost:8765/v1/models

Returns a list of supported models (for Cursor compatibility).

Optimized endpoints

All optimized endpoints add these response headers:

  • X-TokOptimizer-Saved: <number> — tokens saved by optimization
  • X-Tok-Optimizer-Tier: free|pro|enterprise — current tier
# OpenAI chat completions
POST http://localhost:8765/v1/chat/completions
Authorization: Bearer sk-...

# Anthropic messages
POST http://localhost:8765/v1/messages
x-api-key: sk-ant-...
anthropic-version: 2023-06-01

# OpenAI Responses API
POST http://localhost:8765/v1/responses
Authorization: Bearer sk-...

FAQ

Does Trimli affect response quality?

No. Our test suite includes 59 accuracy tests that compare optimized vs direct responses across factual, code generation, reasoning, and multi-turn categories. All tests pass with zero quality degradation at ~46% average compression.

What if the proxy is not running?

If you set ANTHROPIC_BASE_URL or OPENAI_BASE_URL and the proxy is down, your tool will fail to connect. Either start VS Code (which starts the proxy) or remove the environment variables.

Can I use it with Azure OpenAI?

Yes. The proxy detects Azure requests by the api-version query parameter. Set your Azure base URL to the proxy and include an x-original-host header with your Azure endpoint hostname so the proxy knows where to forward.

Does it work with streaming?

Yes. Streaming responses (SSE) pass through the proxy unchanged. Only the input messages are optimized — the response stream is piped directly from the upstream API.

Is my data sent to Trimli servers?

The optimization runs locally in the VS Code extension. On Pro/Enterprise tiers, messages may be sent to the hosted Python service for ML-powered compression (LLMLingua). No messages are stored or logged. API keys are never seen by Trimli — the Authorization header is forwarded transparently.

What's the difference between Free and Pro?

Free and Pro get the same 6 optimization strategies. The only difference is the daily token savings cap: Free is limited to 200,000 tokens/day, Pro is unlimited. Pro also gets access to LLMLingua ML compression on the hosted service.