FREE
Tokens Saved
Cost Saved
Compression
Requests

Tokens saved per day

Recent requests

TimeProviderRawSavedRatio

Upgrade your plan

Unlock all 6 strategies, LLMLingua compression, and unlimited savings.

Free
$0
  • All 6 optimization strategies
  • 200K token savings/day
  • Basic dashboard
  • No account required
Enterprise
$30/seat/mo
  • Everything in Pro
  • Shared context pools
  • SSO + audit logs
  • On-premise deployment
Can I cancel anytime? Yes. Cancel from the Account page — no lock-in, no questions.

Documentation

Trimli AI is a transparent optimization proxy that reduces token consumption across AI coding tools. It intercepts API requests, compresses messages using 6 strategies, and forwards to the upstream provider. Your tools work exactly as before — just faster and cheaper.

Quickstart

1. Install the VS Code extension

Search "Trimli AI" in the VS Code Marketplace, or install from the command line:

code --install-extension trimliai.trimli-vscode

The extension starts a local optimization proxy on http://localhost:8765 and auto-configures supported tools.

2. Sign in (optional)

Open the Command Palette (Cmd+Shift+P) and run "Trimli AI: Sign In". This links your account for dashboard analytics and tier upgrades. The optimizer works without signing in — you just won't see stats here.

3. Use your AI tools as normal

That's it. The proxy optimizes messages transparently. Check the status bar in VS Code for a live token savings counter, or visit this dashboard to see detailed analytics.

No API keys stored. The proxy forwards your Authorization header to the upstream API unchanged. Trimli never sees, stores, or logs your API keys.

How it works

Trimli operates as a reverse proxy between your AI tool and the provider's API. When a tool sends a request:

  1. The request arrives at localhost:8765
  2. Trimli detects the API format (OpenAI, Anthropic, or Google)
  3. Messages are compressed using up to 6 strategies (cheapest first, stops when token budget is met)
  4. The optimized request is forwarded to the real API
  5. The response streams back to your tool unchanged

Non-message endpoints (/v1/embeddings, /v1/models, /v1/audio, etc.) pass through without modification.

Important: tool_use blocks (structured JSON) are never modified. Only text content within tool_result blocks is optimized. System messages are never compressed by LLMLingua to preserve critical instructions.

VS Code extension

The VS Code extension is the primary way to use Trimli. It manages the proxy lifecycle, auto-configures tools, and provides a dashboard.

Installation

  1. Open VS Code
  2. Go to Extensions (Cmd+Shift+X)
  3. Search "Trimli AI"
  4. Click Install

The proxy starts automatically on activation. You'll see a icon in the status bar showing cumulative token savings.

Commands

Trimli AI: Show Dashboard        — Open the savings dashboard
Trimli AI: Sign In               — Link your account via magic link
Trimli AI: Toggle Forward Proxy  — Enable env var injection for terminal tools
Trimli AI: Optimize Now          — Optimize selected text in the editor

Settings

tokOptimizer.enabled              — Enable/disable optimization (default: true)
tokOptimizer.pythonServiceUrl     — Custom Python service URL (default: localhost:8766)
tokOptimizer.hostedServiceUrl     — Hosted service URL (default: Railway)
tokOptimizer.forwardProxy.enabled — Enable forward proxy mode (default: true)

Claude Code

Claude Code Auto-configured

Claude Code picks up the ANTHROPIC_BASE_URL environment variable automatically when launched from a VS Code terminal.

Setup

  1. Make sure the Trimli VS Code extension is installed and running
  2. Open a terminal inside VS Code (Ctrl+`)
  3. Run claude as usual

The extension automatically injects ANTHROPIC_BASE_URL=http://localhost:8765 into VS Code terminal sessions. Claude Code reads this and routes all API traffic through the optimizer.

Verify it's working: After your first message, check the VS Code status bar — the ⚡ counter should increase. You can also run curl http://localhost:8765/health from the terminal to confirm the proxy is running.

Manual setup (outside VS Code)

If you run Claude Code outside VS Code (e.g., in iTerm, Terminal.app, or Warp), set the environment variable manually:

# Add to your shell profile (~/.zshrc, ~/.bashrc, etc.)
export ANTHROPIC_BASE_URL=http://localhost:8765

# Then run Claude Code as normal
claude
Note: The proxy must be running (VS Code extension active) for this to work. If the proxy is not running, Claude Code will fail to connect. Remove the env var if you uninstall the extension.

Continue

Continue Auto-configured

Trimli auto-configures Continue's config.json on activation. You can also set it manually.

Automatic setup

When the Trimli extension activates, it checks ~/.continue/config.json and sets apiBase to the proxy URL if the field is empty or points to localhost. No action needed.

Manual setup

Edit your Continue config file:

// ~/.continue/config.json
{
  "models": [
    {
      "title": "GPT-4o",
      "provider": "openai",
      "model": "gpt-4o",
      "apiKey": "sk-...",
      "apiBase": "http://localhost:8765/v1/"
    }
  ]
}

For Anthropic models in Continue:

{
  "title": "Claude Sonnet",
  "provider": "anthropic",
  "model": "claude-sonnet-4-6",
  "apiKey": "sk-ant-...",
  "apiBase": "http://localhost:8765"
}
Path note: Continue may send requests without the /v1/ prefix. The proxy handles both /v1/chat/completions and /chat/completions.

Cline

Cline Auto-configured

Trimli updates Cline's VS Code settings on activation when in OpenAI-compatible mode.

OpenAI mode

  1. Open Cline settings in VS Code
  2. Select OpenAI Compatible as the provider
  3. Set the Base URL to http://localhost:8765/v1
  4. Enter your OpenAI API key
  5. Use models like gpt-4.1-mini or gpt-4.1
Note: Newer OpenAI models (gpt-5-codex, gpt-5.4) only support the /v1/responses endpoint, not /v1/chat/completions. Cline uses chat completions, so use gpt-4.1-mini or gpt-4.1 instead.

Anthropic mode

  1. Open Cline settings in VS Code
  2. Select Anthropic as the provider
  3. Set the Base URL to http://localhost:8765 (no /v1 suffix)
  4. Enter your Anthropic API key
Base URL rule: OpenAI mode uses localhost:8765/v1 (the tool appends /chat/completions). Anthropic mode uses localhost:8765 (the SDK appends /v1/messages itself). Using the wrong format causes double /v1 path errors.

Cursor

Cursor Not supported

Cursor does not reliably support custom base URLs. All API traffic routes through Cursor's servers, bypassing local proxies.

Why Cursor doesn't work: Even in "own API key" mode, Cursor routes requests through its own servers and does not honor custom base URL settings. This is a Cursor limitation — there is no workaround. We're monitoring for changes in future Cursor releases.

OpenAI-compatible tools

Any tool that supports a custom OpenAI base URL works with Trimli. Set the base URL to http://localhost:8765 and use your API key as normal.

Environment variable method

# Works for any tool that reads OPENAI_BASE_URL
export OPENAI_BASE_URL=http://localhost:8765

# For Anthropic-compatible tools
export ANTHROPIC_BASE_URL=http://localhost:8765

Forward proxy method

Enable forward proxy mode via Command Palette: "Trimli AI: Toggle Forward Proxy". This injects OPENAI_BASE_URL and ANTHROPIC_BASE_URL into all VS Code terminal sessions automatically.

Supported API endpoints

EndpointProviderOptimized
POST /v1/chat/completionsOpenAIYes
POST /v1/responsesOpenAIYes
POST /v1/messagesAnthropicYes
POST /v1/messages/batchesAnthropicYes (each request)
POST /v1/threads/*/messagesOpenAI AssistantsYes
POST /v1/threads/*/runsOpenAI AssistantsYes
POST /v1/messages/count_tokensAnthropicPass-through
/v1/embeddings, /v1/audio, etc.AnyPass-through

Optimization strategies

Strategies run in cheapest-first order and stop early once the token budget (default 8,000) is met. Lossless strategies always run; lossy strategies only fire when the message exceeds the budget.

StrategyTypeTypical savingsDescription
whitespace-normalizeLossless3-8%Collapses excessive spaces, blank lines, and trailing whitespace
deduplicateLossless5-20%Removes repeated sentences across messages
intent-distillLossy10-30%Strips filler phrases ("I was wondering if you could please...")
reference-substituteLossy10-25%Aliases long strings (file paths, URLs) that appear 2+ times
history-summarizeLossy30-50%Compresses old conversation turns into a summary (no LLM call)
context-pruneLossy20-40%Drops low-relevance messages when over budget

All tiers (Free, Pro, Enterprise) get the same 6 strategies plus ML-powered compression:

  • LLMLingua-2 — semantic compression using Microsoft's model (preserves meaning while removing redundant tokens)
  • Vector relevance scoring — uses sentence-transformers to score message relevance and drop irrelevant content

The only difference between tiers is the daily token savings cap: Free is limited to 200,000 tokens/day, Pro and Enterprise are unlimited.

FAQ

Does Trimli affect response quality?

No. Our test suite includes 59 accuracy tests that compare optimized vs direct responses across factual, code generation, reasoning, and multi-turn categories. All tests pass with zero quality degradation at ~46% average compression.

What if the proxy is not running?

If you set ANTHROPIC_BASE_URL or OPENAI_BASE_URL and the proxy is down, your tool will fail to connect. Either start VS Code (which starts the proxy) or remove the environment variables.

Can I use it with Azure OpenAI?

Yes. The proxy detects Azure requests by the api-version query parameter. Set your Azure base URL to the proxy and include an x-original-host header with your Azure endpoint hostname so the proxy knows where to forward.

Does it work with streaming?

Yes. Streaming responses (SSE) pass through the proxy unchanged. Only the input messages are optimized — the response stream is piped directly from the upstream API.

Is my data sent to Trimli servers?

The optimization runs locally in the VS Code extension. On Pro/Enterprise tiers, messages may be sent to the hosted Python service for ML-powered compression (LLMLingua). No messages are stored or logged. API keys are never seen by Trimli — the Authorization header is forwarded transparently.

What's the difference between Free and Pro?

Free and Pro get the same optimization quality — all 6 heuristic strategies plus LLMLingua ML compression. The only difference is the daily token savings cap: Free is limited to 200,000 tokens/day, Pro and Enterprise are unlimited.