ClevAgent
← All posts
2026-04-01fastapimonitoringtutorialpython

How to Add Runtime Monitoring to Your FastAPI AI Agent

Add heartbeat monitoring and cost tracking to your FastAPI AI agent. Catch silent failures that health checks miss.

Your FastAPI app has a /health endpoint that returns 200 OK. Your uptime monitor shows 100% green. But the /api/run-agent endpoint — the one that actually calls the LLM and does the work — has been silently failing for 6 hours. The LLM returns malformed JSON, your agent's inner loop retries forever, and FastAPI keeps accepting new requests while every single one times out internally.

The health check never notices because the *server* is healthy. The *agent logic* is not.

This is the most common failure mode for FastAPI apps that wrap AI agents. The HTTP layer and the agent layer fail independently, and traditional monitoring only watches the HTTP layer.

The problem: two things that fail separately

Here's a typical FastAPI app that runs an AI agent:

main.py

from fastapi import FastAPI from openai import OpenAI

app = FastAPI() client = OpenAI()

@app.get("/health") def health(): return {"status": "ok"}

@app.post("/api/run-agent") def run_agent(prompt: str): response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}], ) result = response.choices[0].message.content return {"result": result}

This has two independent failure surfaces:

  • FastAPI process crashes — Uvicorn dies, port stops responding. Any uptime monitor catches this.
  • Agent logic fails — The LLM call hangs, returns garbage, or enters a retry loop. FastAPI is still alive. Health checks pass. Nobody knows.
  • Traditional monitoring covers #1. For #2, you need something that watches the agent work itself.


    Try it now — monitor your agent in 2 lines:

    pip install clevagent
    

    import clevagent
    clevagent.init(api_key="cv_xxx", agent="my-agent")
    

    Free for 3 agents. No credit card required. Get your API key →


    Adding ClevAgent to a FastAPI app

    Install the SDK:

    pip install clevagent
    

    Then wire it into your FastAPI lifespan and your agent endpoint:

    main.py

    import os from contextlib import asynccontextmanager

    from fastapi import FastAPI from openai import OpenAI import clevagent

    @asynccontextmanager async def lifespan(app: FastAPI): # Start heartbeat monitoring on startup clevagent.init( api_key=os.environ["CLEVAGENT_API_KEY"], agent="fastapi-agent", ) yield # Cleanup on shutdown (optional — process exit stops heartbeats automatically)

    app = FastAPI(lifespan=lifespan) client = OpenAI()

    @app.get("/health") def health(): return {"status": "ok"}

    @app.post("/api/run-agent") def run_agent(prompt: str): response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}], ) result = response.choices[0].message.content

    # Signal that the agent completed real work clevagent.ping()

    return {"result": result}

    Two additions:

  • clevagent.init() in the lifespan — starts a background heartbeat thread. Every 60 seconds, it pings ClevAgent to prove the process is alive.
  • clevagent.ping() after each task — signals that the agent actually completed work, not just that the process exists.
  • The difference matters. A heartbeat tells you the process is running. A ping tells you the agent is *doing its job*. If heartbeats continue but pings stop, the agent is stuck — alive but not working.

    Tracking costs per request

    LLM costs are unpredictable. A normal request costs $0.02. A request that triggers a long chain-of-thought or a retry loop costs $5.00. Without per-request cost tracking, you won't know until the invoice arrives.

    Pass token counts to clevagent.ping():

    @app.post("/api/run-agent")
    def run_agent(prompt: str):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
        )
        result = response.choices[0].message.content

    # Track tokens used in this request usage = response.usage clevagent.ping( tokens=usage.total_tokens, cost_usd=(usage.prompt_tokens * 0.0025 + usage.completion_tokens * 0.01) / 1000, meta={"model": "gpt-4o", "prompt_length": len(prompt)}, )

    return {"result": result}

    Now the ClevAgent dashboard shows cost per request over time. You can set alerts: if a single request exceeds $1.00, or if hourly spend crosses $20, you get notified before the bill spirals.

    Complete example: multi-step agent with error handling

    Real agents do more than one LLM call. Here's a complete FastAPI app with a multi-step agent, cost tracking, and proper error handling:

    main.py

    import os from contextlib import asynccontextmanager

    from fastapi import FastAPI, HTTPException from openai import OpenAI import clevagent

    @asynccontextmanager async def lifespan(app: FastAPI): clevagent.init( api_key=os.environ["CLEVAGENT_API_KEY"], agent="research-agent", ) yield

    app = FastAPI(lifespan=lifespan) client = OpenAI()

    @app.get("/health") def health(): return {"status": "ok"}

    @app.post("/api/run-agent") def run_agent(prompt: str): total_tokens = 0 total_cost = 0.0

    try: # Step 1: Plan plan_response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Create a research plan."}, {"role": "user", "content": prompt}, ], ) plan = plan_response.choices[0].message.content total_tokens += plan_response.usage.total_tokens

    # Step 2: Execute exec_response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Execute the research plan."}, {"role": "user", "content": plan}, ], ) result = exec_response.choices[0].message.content total_tokens += exec_response.usage.total_tokens

    # Step 3: Summarize summary_response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Summarize the findings."}, {"role": "user", "content": result}, ], ) summary = summary_response.choices[0].message.content total_tokens += summary_response.usage.total_tokens

    # Calculate total cost (gpt-4o pricing) total_cost = total_tokens * 0.005 / 1000 # simplified average

    # Report success with full cost data clevagent.ping( tokens=total_tokens, cost_usd=total_cost, meta={"steps": 3, "model": "gpt-4o"}, )

    return {"summary": summary, "tokens_used": total_tokens}

    except Exception as e: # Report the failure — ClevAgent tracks error frequency clevagent.ping( error=str(e), meta={"step": "unknown", "prompt_length": len(prompt)}, ) raise HTTPException(status_code=500, detail="Agent failed")

    The key pattern: clevagent.ping() is called on both success *and* failure. On success, it carries token counts and cost. On failure, it carries the error message. ClevAgent uses this to build a per-agent error rate — so you can see "research-agent failed 40% of requests in the last hour" on the dashboard.

    Catching the hung endpoint

    Here's the scenario that health checks will never catch:

  • Your FastAPI app starts normally. Health check returns 200.
  • A request hits /api/run-agent. The LLM call hangs — the API is overloaded, the connection is open but no response comes.
  • FastAPI's thread pool is occupied. New requests queue up.
  • Health check endpoint still returns 200 because it runs on a separate thread.
  • From the outside, the server looks fine. From the inside, every agent request is stuck.
  • ClevAgent catches this because clevagent.ping() only fires after a *completed* request. If pings stop but heartbeats continue, the dashboard marks the agent as degraded — alive but not completing work. You get an alert:

    ⚠️ research-agent: heartbeat OK but no task completions for 5 minutes
       Last successful ping: 12:34:02 UTC
       Heartbeats: normal (60s interval)
       Likely cause: agent logic stalled or upstream API unresponsive
    

    This is the signal you can't get from /health returning 200.

    Auto-restart for Docker deployments

    If your FastAPI app runs in Docker, you can pair the ClevAgent heartbeat with Docker's health check to get automatic restarts on agent-level failures:

    docker-compose.yml

    services: fastapi-agent: build: . ports: - "8000:8000" environment: - CLEVAGENT_API_KEY=${CLEVAGENT_API_KEY} - OPENAI_API_KEY=${OPENAI_API_KEY} restart: unless-stopped healthcheck: test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9191/health')"] interval: 30s timeout: 10s retries: 3 start_period: 15s

    The ClevAgent SDK exposes a local health endpoint on port 9191 that reflects agent-level health, not just process-level health. If pings have stopped (agent is stuck), this endpoint returns 503 — and Docker restarts the container.

    Two layers of defense:

  • Docker health check — restarts the container locally, no network dependency
  • ClevAgent dashboard — alerts you remotely, tracks the incident, shows the pattern over time
  • The bottom line

    FastAPI makes it easy to build AI agent endpoints. But the HTTP layer and the agent layer are different things. A 200 OK on /health doesn't mean your agent is working — it means your server is running.

    Add clevagent.init() to your startup. Add clevagent.ping() to your endpoint. Now you know when the agent is actually doing its job, what it costs, and when it gets stuck.


    Free for 3 agents — start monitoring →

    Related reading:

  • How to Monitor AI Agents in Docker Compose
  • How to Monitor AI Agents in Production
  • Get your API key in 30s →

    3 agents free · No credit card · Setup in 30 seconds