Your FastAPI app has a /health endpoint that returns 200 OK. Your uptime monitor shows 100% green. But the /api/run-agent endpoint — the one that actually calls the LLM and does the work — has been silently failing for 6 hours. The LLM returns malformed JSON, your agent's inner loop retries forever, and FastAPI keeps accepting new requests while every single one times out internally.

The health check never notices because the *server* is healthy. The *agent logic* is not.

This is the most common failure mode for FastAPI apps that wrap AI agents. The HTTP layer and the agent layer fail independently, and traditional monitoring only watches the HTTP layer.

The problem: two things that fail separately

Here's a typical FastAPI app that runs an AI agent:

main.py
from fastapi import FastAPI
from openai import OpenAI
app = FastAPI()
client = OpenAI()
@app.get("/health")
def health():
    return {"status": "ok"}@app.post("/api/run-agent")
def run_agent(prompt: str):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    result = response.choices[0].message.content
    return {"result": result}

This has two independent failure surfaces:

FastAPI process crashes — Uvicorn dies, port stops responding. Any uptime monitor catches this.

Agent logic fails — The LLM call hangs, returns garbage, or enters a retry loop. FastAPI is still alive. Health checks pass. Nobody knows.

Traditional monitoring covers #1. For #2, you need something that watches the agent work itself.

Try it now — monitor your agent in 2 lines:

pip install clevagent

import clevagent
clevagent.init(api_key="cv_xxx", agent="my-agent")

Free for 3 agents. No credit card required. Get your API key →

Adding ClevAgent to a FastAPI app

Install the SDK:

pip install clevagent

Then wire it into your FastAPI lifespan and your agent endpoint:

main.py
import os
from contextlib import asynccontextmanager
from fastapi import FastAPI
from openai import OpenAI
import clevagent
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Start heartbeat monitoring on startup
    clevagent.init(
        api_key=os.environ["CLEVAGENT_API_KEY"],
        agent="fastapi-agent",
    )
    yield
    # Cleanup on shutdown (optional — process exit stops heartbeats automatically)
app = FastAPI(lifespan=lifespan)
client = OpenAI()
@app.get("/health")
def health():
    return {"status": "ok"}
@app.post("/api/run-agent")
def run_agent(prompt: str):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    result = response.choices[0].message.content
    # Signal that the agent completed real work
    clevagent.ping()    return {"result": result}

Two additions:

clevagent.init() in the lifespan — starts a background heartbeat thread. Every 60 seconds, it pings ClevAgent to prove the process is alive.

clevagent.ping() after each task — signals that the agent actually completed work, not just that the process exists.

The difference matters. A heartbeat tells you the process is running. A ping tells you the agent is *doing its job*. If heartbeats continue but pings stop, the agent is stuck — alive but not working.

Tracking costs per request

LLM costs are unpredictable. A normal request costs $0.02. A request that triggers a long chain-of-thought or a retry loop costs $5.00. Without per-request cost tracking, you won't know until the invoice arrives.

Pass token counts to clevagent.ping():

@app.post("/api/run-agent")
def run_agent(prompt: str):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    result = response.choices[0].message.content
    # Track tokens used in this request
    usage = response.usage
    clevagent.ping(
        tokens=usage.total_tokens,
        cost_usd=(usage.prompt_tokens * 0.0025 + usage.completion_tokens * 0.01) / 1000,
        meta={"model": "gpt-4o", "prompt_length": len(prompt)},
    )    return {"result": result}

Now the ClevAgent dashboard shows cost per request over time. You can set alerts: if a single request exceeds $1.00, or if hourly spend crosses $20, you get notified before the bill spirals.

Complete example: multi-step agent with error handling

Real agents do more than one LLM call. Here's a complete FastAPI app with a multi-step agent, cost tracking, and proper error handling:

main.py
import os
from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPException
from openai import OpenAI
import clevagent
@asynccontextmanager
async def lifespan(app: FastAPI):
    clevagent.init(
        api_key=os.environ["CLEVAGENT_API_KEY"],
        agent="research-agent",
    )
    yield
app = FastAPI(lifespan=lifespan)
client = OpenAI()
@app.get("/health")
def health():
    return {"status": "ok"}
@app.post("/api/run-agent")
def run_agent(prompt: str):
    total_tokens = 0
    total_cost = 0.0
    try:
        # Step 1: Plan
        plan_response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Create a research plan."},
                {"role": "user", "content": prompt},
            ],
        )
        plan = plan_response.choices[0].message.content
        total_tokens += plan_response.usage.total_tokens
        # Step 2: Execute
        exec_response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Execute the research plan."},
                {"role": "user", "content": plan},
            ],
        )
        result = exec_response.choices[0].message.content
        total_tokens += exec_response.usage.total_tokens
        # Step 3: Summarize
        summary_response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Summarize the findings."},
                {"role": "user", "content": result},
            ],
        )
        summary = summary_response.choices[0].message.content
        total_tokens += summary_response.usage.total_tokens
        # Calculate total cost (gpt-4o pricing)
        total_cost = total_tokens * 0.005 / 1000  # simplified average
        # Report success with full cost data
        clevagent.ping(
            tokens=total_tokens,
            cost_usd=total_cost,
            meta={"steps": 3, "model": "gpt-4o"},
        )
        return {"summary": summary, "tokens_used": total_tokens}    except Exception as e:
        # Report the failure — ClevAgent tracks error frequency
        clevagent.ping(
            error=str(e),
            meta={"step": "unknown", "prompt_length": len(prompt)},
        )
        raise HTTPException(status_code=500, detail="Agent failed")

The key pattern: clevagent.ping() is called on both success *and* failure. On success, it carries token counts and cost. On failure, it carries the error message. ClevAgent uses this to build a per-agent error rate — so you can see "research-agent failed 40% of requests in the last hour" on the dashboard.

Catching the hung endpoint

Here's the scenario that health checks will never catch:

Your FastAPI app starts normally. Health check returns 200.

A request hits /api/run-agent. The LLM call hangs — the API is overloaded, the connection is open but no response comes.

FastAPI's thread pool is occupied. New requests queue up.

Health check endpoint still returns 200 because it runs on a separate thread.

From the outside, the server looks fine. From the inside, every agent request is stuck.

ClevAgent catches this because clevagent.ping() only fires after a *completed* request. If pings stop but heartbeats continue, the dashboard marks the agent as degraded — alive but not completing work. You get an alert:

⚠️ research-agent: heartbeat OK but no task completions for 5 minutes
   Last successful ping: 12:34:02 UTC
   Heartbeats: normal (60s interval)
   Likely cause: agent logic stalled or upstream API unresponsive

This is the signal you can't get from /health returning 200.

Auto-restart for Docker deployments

If your FastAPI app runs in Docker, you can pair the ClevAgent heartbeat with Docker's health check to get automatic restarts on agent-level failures:

docker-compose.yml
services:
  fastapi-agent:
    build: .
    ports:
      - "8000:8000"
    environment:
      - CLEVAGENT_API_KEY=${CLEVAGENT_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9191/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

The ClevAgent SDK exposes a local health endpoint on port 9191 that reflects agent-level health, not just process-level health. If pings have stopped (agent is stuck), this endpoint returns 503 — and Docker restarts the container.

Two layers of defense:

•Docker health check — restarts the container locally, no network dependency

•ClevAgent dashboard — alerts you remotely, tracks the incident, shows the pattern over time

The bottom line

FastAPI makes it easy to build AI agent endpoints. But the HTTP layer and the agent layer are different things. A 200 OK on /health doesn't mean your agent is working — it means your server is running.

Add clevagent.init() to your startup. Add clevagent.ping() to your endpoint. Now you know when the agent is actually doing its job, what it costs, and when it gets stuck.

Free for 3 agents — start monitoring →

•How to Monitor AI Agents in Production

How to Add Runtime Monitoring to Your FastAPI AI Agent

The problem: two things that fail separately

main.py

Adding ClevAgent to a FastAPI app

main.py

Tracking costs per request

Complete example: multi-step agent with error handling

main.py

Catching the hung endpoint

Auto-restart for Docker deployments

docker-compose.yml

The bottom line