Add heartbeat monitoring and cost tracking to your FastAPI AI agent. Catch silent failures that health checks miss.
Your FastAPI app has a /health endpoint that returns 200 OK. Your uptime monitor shows 100% green. But the /api/run-agent endpoint — the one that actually calls the LLM and does the work — has been silently failing for 6 hours. The LLM returns malformed JSON, your agent's inner loop retries forever, and FastAPI keeps accepting new requests while every single one times out internally.
The health check never notices because the *server* is healthy. The *agent logic* is not.
This is the most common failure mode for FastAPI apps that wrap AI agents. The HTTP layer and the agent layer fail independently, and traditional monitoring only watches the HTTP layer.
Here's a typical FastAPI app that runs an AI agent:
main.py
from fastapi import FastAPI
from openai import OpenAIapp = FastAPI()
client = OpenAI()
@app.get("/health")
def health():
return {"status": "ok"}
@app.post("/api/run-agent")
def run_agent(prompt: str):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
result = response.choices[0].message.content
return {"result": result}
This has two independent failure surfaces:
Traditional monitoring covers #1. For #2, you need something that watches the agent work itself.
Try it now — monitor your agent in 2 lines:
pip install clevagent
import clevagent
clevagent.init(api_key="cv_xxx", agent="my-agent")
Free for 3 agents. No credit card required. Get your API key →
Install the SDK:
pip install clevagent
Then wire it into your FastAPI lifespan and your agent endpoint:
main.py
import os
from contextlib import asynccontextmanagerfrom fastapi import FastAPI
from openai import OpenAI
import clevagent
@asynccontextmanager
async def lifespan(app: FastAPI):
# Start heartbeat monitoring on startup
clevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="fastapi-agent",
)
yield
# Cleanup on shutdown (optional — process exit stops heartbeats automatically)
app = FastAPI(lifespan=lifespan)
client = OpenAI()
@app.get("/health")
def health():
return {"status": "ok"}
@app.post("/api/run-agent")
def run_agent(prompt: str):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
result = response.choices[0].message.content
# Signal that the agent completed real work
clevagent.ping()
return {"result": result}
Two additions:
clevagent.init() in the lifespan — starts a background heartbeat thread. Every 60 seconds, it pings ClevAgent to prove the process is alive.clevagent.ping() after each task — signals that the agent actually completed work, not just that the process exists.The difference matters. A heartbeat tells you the process is running. A ping tells you the agent is *doing its job*. If heartbeats continue but pings stop, the agent is stuck — alive but not working.
LLM costs are unpredictable. A normal request costs $0.02. A request that triggers a long chain-of-thought or a retry loop costs $5.00. Without per-request cost tracking, you won't know until the invoice arrives.
Pass token counts to clevagent.ping():
@app.post("/api/run-agent")
def run_agent(prompt: str):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
result = response.choices[0].message.content # Track tokens used in this request
usage = response.usage
clevagent.ping(
tokens=usage.total_tokens,
cost_usd=(usage.prompt_tokens * 0.0025 + usage.completion_tokens * 0.01) / 1000,
meta={"model": "gpt-4o", "prompt_length": len(prompt)},
)
return {"result": result}
Now the ClevAgent dashboard shows cost per request over time. You can set alerts: if a single request exceeds $1.00, or if hourly spend crosses $20, you get notified before the bill spirals.
Real agents do more than one LLM call. Here's a complete FastAPI app with a multi-step agent, cost tracking, and proper error handling:
main.py
import os
from contextlib import asynccontextmanagerfrom fastapi import FastAPI, HTTPException
from openai import OpenAI
import clevagent
@asynccontextmanager
async def lifespan(app: FastAPI):
clevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="research-agent",
)
yield
app = FastAPI(lifespan=lifespan)
client = OpenAI()
@app.get("/health")
def health():
return {"status": "ok"}
@app.post("/api/run-agent")
def run_agent(prompt: str):
total_tokens = 0
total_cost = 0.0
try:
# Step 1: Plan
plan_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Create a research plan."},
{"role": "user", "content": prompt},
],
)
plan = plan_response.choices[0].message.content
total_tokens += plan_response.usage.total_tokens
# Step 2: Execute
exec_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Execute the research plan."},
{"role": "user", "content": plan},
],
)
result = exec_response.choices[0].message.content
total_tokens += exec_response.usage.total_tokens
# Step 3: Summarize
summary_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Summarize the findings."},
{"role": "user", "content": result},
],
)
summary = summary_response.choices[0].message.content
total_tokens += summary_response.usage.total_tokens
# Calculate total cost (gpt-4o pricing)
total_cost = total_tokens * 0.005 / 1000 # simplified average
# Report success with full cost data
clevagent.ping(
tokens=total_tokens,
cost_usd=total_cost,
meta={"steps": 3, "model": "gpt-4o"},
)
return {"summary": summary, "tokens_used": total_tokens}
except Exception as e:
# Report the failure — ClevAgent tracks error frequency
clevagent.ping(
error=str(e),
meta={"step": "unknown", "prompt_length": len(prompt)},
)
raise HTTPException(status_code=500, detail="Agent failed")
The key pattern: clevagent.ping() is called on both success *and* failure. On success, it carries token counts and cost. On failure, it carries the error message. ClevAgent uses this to build a per-agent error rate — so you can see "research-agent failed 40% of requests in the last hour" on the dashboard.
Here's the scenario that health checks will never catch:
/api/run-agent. The LLM call hangs — the API is overloaded, the connection is open but no response comes.ClevAgent catches this because clevagent.ping() only fires after a *completed* request. If pings stop but heartbeats continue, the dashboard marks the agent as degraded — alive but not completing work. You get an alert:
⚠️ research-agent: heartbeat OK but no task completions for 5 minutes
Last successful ping: 12:34:02 UTC
Heartbeats: normal (60s interval)
Likely cause: agent logic stalled or upstream API unresponsive
This is the signal you can't get from /health returning 200.
If your FastAPI app runs in Docker, you can pair the ClevAgent heartbeat with Docker's health check to get automatic restarts on agent-level failures:
docker-compose.yml
services:
fastapi-agent:
build: .
ports:
- "8000:8000"
environment:
- CLEVAGENT_API_KEY=${CLEVAGENT_API_KEY}
- OPENAI_API_KEY=${OPENAI_API_KEY}
restart: unless-stopped
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9191/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
The ClevAgent SDK exposes a local health endpoint on port 9191 that reflects agent-level health, not just process-level health. If pings have stopped (agent is stuck), this endpoint returns 503 — and Docker restarts the container.
Two layers of defense:
FastAPI makes it easy to build AI agent endpoints. But the HTTP layer and the agent layer are different things. A 200 OK on /health doesn't mean your agent is working — it means your server is running.
Add clevagent.init() to your startup. Add clevagent.ping() to your endpoint. Now you know when the agent is actually doing its job, what it costs, and when it gets stuck.
Free for 3 agents — start monitoring →
Related reading:
3 agents free · No credit card · Setup in 30 seconds