Closed Beta · Limited Spots

Whichis
burning your LLM bill?

Toolken is the LLM gateway that tags every token with the dimension you care about, feature, customer, agent, or user, so you finally see where your AI spend goes.

Join the Beta See how it works

No SDK 5-minute setup Free under 1M tokens/month

Three lines. That's the whole integration.

One URL change.
Every provider.

Point your existing OpenAI client at Toolken. No new SDK. No code refactor. Full attribution and budget control from day one.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://gateway.toolken.ai/v1",
  defaultHeaders: {
    "X-Toolken-Key": process.env.TOOLKEN_KEY,
  },
});

const response = await client.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: prompt }],
});

Who it's for

Built for your team

One gateway. Every buyer finally seeing the same numbers.

For SaaS teams

Per-feature & per-customer attribution

Which feature is eating my OpenAI bill? Which customer is unprofitable on AI cost?

✓ Tag any request with X-Toolken-Key + metadata
✓ See margins by feature in real time
✓ See which enterprise customer's 'AI assistant' is bleeding you dry

For agent builders

Per-agent usage breakdown

My agent loops. I have no idea which agent in the swarm is burning tokens or stuck in a retry storm.

✓ Tag each agent run, see which agents consume most tokens
✓ Slowest p95 and highest cost-per-task at a glance
✓ Built for OpenClaw / Hermes / LangGraph / CrewAI users

For sysadmins running shared keys

Per-user team usage

My 80-person eng team has a shared OpenAI key. I have no idea who's using what or why my bill jumped 40%.

✓ One team key, see breakdown by individual seat
✓ Set per-user budgets
✓ Catch the engineer running batch evals on the company account

How it works

From zero to attributing in 5 minutes

One URL change. No SDK. Full visibility across every feature, team, and customer.

Point your requests at Toolken

Change your LLM client's base URL to gateway.toolken.ai/v1. Pass X-Toolken-Key and optionally X-Toolken-Feature in headers. Zero other changes required.

We attribute and forward

Toolken reads your metadata headers, records token counts and cost, enforces budget rules, then forwards to the provider. Response arrives untouched.

See the breakdown in real time

Your dashboard shows spend by feature, tenant, model, and time — updated in seconds. Export via CSV or API. Set alerts on any budget threshold.

Simple, transparent pricing

Start free, scale when you need to. No surprise invoices.

Free

$0forever

For solo builders and early experiments.

Start free

Up to 1M tokens/month
1 project, 2 team keys
7-day log retention
Per-feature attribution

Starter

$15

From side-project to first paying user.

Start 14-day trial

Up to 10M tokens/month
3 projects
30-day log retention
Hard budget stops

Pro

$39

For funded startups, agent shops, and mid-size SaaS.

Start 14-day trial

Up to 50M tokens/month ($0.50 per additional 1M)
Unlimited projects and team keys
90-day log retention
Hard budget stops + Slack/email alerts

Enterprise

Custom

Volume, security, and control for regulated teams.

Contact sales

Unlimited tokens, volume pricing
SSO (SAML, Okta, Google Workspace)
Audit logs, role-based access, SOC 2 Type II report
Dedicated region (US, EU) and 99.95% SLA

Failed and upstream-error calls aren't billed across every paid plan.

Closed Beta — Apply Now

Know what every LLM call costs
before your next invoice.

Join the teams already using Toolken to attribute costs, enforce budgets, and ship AI features with financial confidence.

Join the Beta Talk to us

Already have an account? Sign in

FAQ

Common questions

Does Toolken add latency to my LLM requests?

p50 < 1ms. Cloudflare edge, 320+ POPs.

What happens if Toolken goes down?

Fail-open mode forwards directly to the provider — you only lose the log line, never the request.

Which LLM providers do you support?

OpenAI, Anthropic, Google Gemini, Mistral, and Cohere on the Free plan. All supported providers — including Llama via Groq/Together — are available on Pro and Enterprise.

Do you store my prompt content?

By default we store token counts, model names, cost, and your metadata headers — not the prompt or completion text. Enterprise plans can opt into full request logging with a custom DPA.

How do hard budget stops work?

When a tenant or feature hits its monthly budget ceiling, Toolken returns a 429 with a clear budget-exceeded error before the request reaches the provider. You choose the action per rule: block, alert-only, or reroute to a cheaper model.

Can I self-host the gateway?

Yes — on Enterprise plans we provide the Cloudflare Worker source and deployment scripts so you can run it in your own account. Data never leaves your environment.

Do you route between providers automatically?

Not yet — on the roadmap. Today Toolken is a visibility + budget layer; you pick the provider per request, we attribute the cost. Smart routing, semantic cache, and prompt analysis are next.

What's coming next?

Smart routing across providers, semantic cache, and prompt-bloat analysis with dollar-savings projections. Drop your email and we'll ping you when each ships.

Still have questions? Email us — hello@toolken.ai

Stay in the loop

LLM cost intelligence, delivered monthly

Patterns from real teams: what's burning their budgets, which optimizations shipped, and what we're building next.

No spam. Unsubscribe anytime.

Whichisburning your LLM bill?

One URL change.Every provider.