A developer posted on Reddit last month: "My agent got stuck in a loop overnight. Woke up to a $500 OpenAI bill." The comments were full of similar stories — $200 here, $800 there. All from agents that looked fine from the outside.

This is the hidden cost of production AI agents. Your CPU metrics are normal. Your memory is fine. But your agent is calling gpt-4 in an infinite retry loop, and every call costs money.

How agents get stuck in loops

AI agents loop for reasons that traditional monitoring can't catch:

Retry storms: The agent gets a 429 (rate limit) or 500 error from an API. It retries. Gets the same error. Retries again. Each retry costs tokens.

Reasoning loops: The LLM decides it needs more information, calls a tool, gets a result, decides it needs more information, calls the same tool... The logic is "correct" — the agent is reasoning. It just never converges.

Output parsing failures: The agent expects JSON but gets markdown. It asks the LLM to fix the output. The LLM produces the same format. The agent asks again. Loop.

What loop detection looks like

ClevAgent watches three signals simultaneously:

SignalWhat it detectsThreshold |--------|----------------|-----------| Tool call rateSame tool called too frequentlyConfigurable (default: 10/min) Message repetitionSame message appearing in consecutive heartbeats5+ repeats Token spikeCurrent heartbeat uses 3x+ average tokens3x moving average

When any signal fires, ClevAgent:

Sends an alert to your Telegram, Slack, or Discord

Marks the agent as "warning" on the dashboard

Optionally auto-pauses or auto-stops the agent (Starter+ plans)

Setting up cost protection

With the Python SDK

import clevagentclevagent.init(
    api_key=os.environ["CLEVAGENT_API_KEY"],
    agent="my-expensive-agent",
    on_loop="alert_only",  # default — alerts but doesn't stop
)

For stricter protection:

clevagent.init(
    api_key=os.environ["CLEVAGENT_API_KEY"],
    agent="my-expensive-agent",
    on_loop="stop",  # kills the process if loop is detected
)

With the Runner (zero code)

export CLEVAGENT_API_KEY=cv_your_key
clevagent-runner start --watch docker:my-agent

The Runner monitors heartbeat patterns and detects loops even without SDK integration.

Real-world impact

In our own dogfooding (monitoring 2 production agents 24/7):

•5 crash events detected in the first week

•Auto-restart recovered all 5 within 60 seconds

•Zero missed alerts — every crash triggered a Telegram notification

•Total downtime: under 5 minutes across all incidents

The cost of not monitoring? One undetected loop can cost more than a year of ClevAgent's Starter plan ($19/mo).

Start protecting your budget

ClevAgent is free for up to 3 agents. Loop detection is included in all plans.

Start monitoring free →

AI Agent Cost Tracking: Stop Overpaying for Runaway Loops