Your AI agent looks healthy but your API bill is exploding. Here's how to track token costs per work cycle and catch runaway loops before they drain your budget.
You wake up to a $200 API bill. Your agent ran all night. It looked healthy — heartbeat green, no errors, process running. But token usage went from 200/min to 40,000/min because it was stuck re-parsing a malformed response in a loop.
This is the most expensive failure mode in AI agent operations, and traditional monitoring won't catch it.
Traditional services have relatively predictable costs. A web server handles N requests per second, each costing roughly the same in compute.
AI agents are different. A single LLM call can cost anywhere from $0.001 to $2.00 depending on the model, context size, and output length. A logic loop that retries the same failing operation can burn through hundreds of dollars in minutes.
The key insight: for LLM-backed agents, cost is a health metric, not just a billing metric.
Instead of tracking total spend, track cost per work cycle:
while True:
start_tokens = get_token_count() result = do_llm_work()
end_tokens = get_token_count()
tokens_used = end_tokens - start_tokens
cost = calculate_cost(tokens_used)
heartbeat(tokens=tokens_used, cost_usd=cost)
sleep(interval)
Now you have a time series of cost-per-cycle. Normal is ~200 tokens. If it jumps to 40,000, you know immediately.
If you use OpenAI or Anthropic SDKs, you can patch the API client to automatically track every call:
Before your agent code
import clevagent
clevagent.init(api_key="cv_...", agent="my-bot")Your existing code — no changes needed
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "..."}],
)
Cost is automatically tracked via SDK monkey-patch
The SDK intercepts the API call, extracts usage.total_tokens from the response, estimates cost based on the model, and includes it in the next heartbeat.
From running three production agents with cost tracking:
The difference between a $0.50 incident and a $200 incident is whether you detect the cost spike in real time.
Cost isn't just a billing concern for AI agents — it's the single best health signal for catching the failure modes that traditional monitoring misses.
3 agents free · No credit card · Setup in 30 seconds