Hello, Toolken — Why We Built an LLM Cost Layer

Every team building on LLMs hits the same wall: the bill arrives and no one can explain it.

You know your OpenAI spend went up 3x last month. But was it the new summarization feature? The chatbot you shipped to enterprise customers? A prompt that silently doubled in size after a refactor? Without attribution, every cost is a mystery.

What we tried first

We added logging. We dumped every request to a table and wrote SQL queries to find the expensive ones. This worked for one week until the table had 50 million rows and our queries took four minutes to run.

We added sampling. We tracked 10% of requests and multiplied. This was useless for the long tail — the features that cost $800/month were invisible in a 10% sample.

We added alerts. We set a Slack webhook on the OpenAI billing API. It fired on the 28th of the month, after the damage was done.

The attribution gap

The real problem is not observability — it is attribution. You need to know, in real time, which feature and which customer are responsible for each token. Not the aggregate. The specific call.

This requires a layer between your application and the LLM provider. One that understands your routing metadata, not just the raw HTTP request.

What Toolken does

Toolken is an edge gateway that sits between your code and every LLM provider. You change one base URL. We tag every request with the X-Toolken-Key header you send, and we attribute every token to the feature and tenant context you provide.

The result: a dashboard where you can answer “why did costs go up?” in under 30 seconds.

We also enforce budgets at the edge — before the request hits the provider — so a runaway prompt loop can’t drain your monthly allocation overnight.

Try it

Toolken is in closed beta. If you’re spending more than $2k/month on LLMs and can’t answer where it goes, join the beta.

No SDK. 5-minute setup. Free under 1M tokens/month.