ClevAgent
← All posts
2026-03-30cost-trackingllmproductiontutorial

How to Track LLM Token Costs in Production

Your AI agent looks healthy but your API bill is exploding. Here's how to track token costs per work cycle and catch runaway loops before they drain your budget.

You wake up to a $200 API bill. Your agent ran all night. It looked healthy — heartbeat green, no errors, process running. But token usage went from 200/min to 40,000/min because it was stuck re-parsing a malformed response in a loop.

This is the most expensive failure mode in AI agent operations, and traditional monitoring won't catch it.

Why cost tracking matters for AI agents

Traditional services have relatively predictable costs. A web server handles N requests per second, each costing roughly the same in compute.

AI agents are different. A single LLM call can cost anywhere from $0.001 to $2.00 depending on the model, context size, and output length. A logic loop that retries the same failing operation can burn through hundreds of dollars in minutes.

The key insight: for LLM-backed agents, cost is a health metric, not just a billing metric.

The pattern: cost per heartbeat cycle

Instead of tracking total spend, track cost per work cycle:

while True:
    start_tokens = get_token_count()

result = do_llm_work()

end_tokens = get_token_count() tokens_used = end_tokens - start_tokens cost = calculate_cost(tokens_used)

heartbeat(tokens=tokens_used, cost_usd=cost) sleep(interval)

Now you have a time series of cost-per-cycle. Normal is ~200 tokens. If it jumps to 40,000, you know immediately.

What to track

MetricWhyAlert threshold |--------|-----|----------------| Tokens per cycleCatch loops10x above 24h average Cost per hourBudget protectionFixed dollar amount Tool calls per cycleCatch recursive tool use5x above baseline

Auto-tracking with SDK monkey-patching

If you use OpenAI or Anthropic SDKs, you can patch the API client to automatically track every call:

Before your agent code

import clevagent clevagent.init(api_key="cv_...", agent="my-bot")

Your existing code — no changes needed

from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "..."}], )

Cost is automatically tracked via SDK monkey-patch

The SDK intercepts the API call, extracts usage.total_tokens from the response, estimates cost based on the model, and includes it in the next heartbeat.

Cost alerting strategies

1. Absolute threshold

Alert if hourly cost exceeds $X. Simple, catches catastrophic loops.

2. Relative spike

Alert if current cycle cost is 10x+ above the rolling 24-hour average. Catches loops that start gradually.

3. Budget gate

Hard-stop the agent if daily spend exceeds a configured limit. Last line of defense.

The real-world numbers

From running three production agents with cost tracking:

  • Normal operation: $0.01-0.05/day per agent (gpt-4o-mini, ~50 tokens/cycle)
  • Loop incident: $50 in 40 minutes (40,000 tokens/min)
  • Detection time with cost tracking: < 60 seconds
  • Detection time without: 6+ hours (discovered via billing alert next morning)
  • The difference between a $0.50 incident and a $200 incident is whether you detect the cost spike in real time.

    Summary

  • Track tokens per work cycle, not just total spend
  • Alert on 10x spikes above baseline
  • Use SDK auto-tracking if available
  • Set a hard daily budget gate as last resort
  • Cost isn't just a billing concern for AI agents — it's the single best health signal for catching the failure modes that traditional monitoring misses.

    Start monitoring free →

    3 agents free · No credit card · Setup in 30 seconds