ClevAgent
← All posts
2026-04-01crewaimonitoringtutorialai-agents

How to Monitor CrewAI Agents in Production

Add heartbeat monitoring, loop detection, and cost tracking to your CrewAI crews. 3 lines of code, zero config.

You built a CrewAI crew. A researcher agent finds data, an analyst agent writes the report, and crew.kickoff() ties them together. It works perfectly on your laptop.

Then you deploy it. It runs on a schedule — every morning at 6 AM. On day three, the researcher agent gets stuck in a retry loop against a rate-limited API. The analyst agent never receives input, so it sits idle. Your cron job exits with code 0 because CrewAI doesn't treat agent-level hangs as fatal errors. Nobody notices until Friday when someone asks why the reports stopped.

This is the core problem with multi-agent frameworks in production: a crew can fail without crashing.

Why CrewAI crews need dedicated monitoring

CrewAI orchestrates multiple agents that call LLMs, use tools, and pass context to each other. Each agent is a potential failure point:

  • Agent hangs: One agent waits indefinitely for an LLM response that never comes. The crew stalls, but the process stays alive.
  • Infinite loops: An agent retries a failed tool call endlessly. Your token meter spins, but no work output appears.
  • Silent quality degradation: The LLM returns garbage, the next agent processes it anyway, and the final output is subtly wrong. No error thrown.
  • Cost spikes: A single crew run normally costs $0.15. One bad run costs $12 because an agent kept rephrasing the same request.
  • Traditional process monitoring (systemd, Docker health checks) only tells you the process is alive. It tells you nothing about whether the *crew* is making progress.


    Try it now — monitor your agent in 2 lines:

    pip install clevagent
    

    import clevagent
    clevagent.init(api_key="cv_xxx", agent="my-agent")
    

    Free for 3 agents. No credit card required. Get your API key →


    Add ClevAgent to your CrewAI crew in 3 lines

    ClevAgent monitors your crew at the agent level — heartbeats, loop detection, and per-run cost tracking. Setup takes about 30 seconds.

    Step 1: Install

    pip install clevagent
    

    Step 2: Initialize before kickoff

    import os
    import clevagent

    clevagent.init( api_key=os.environ["CLEVAGENT_API_KEY"], agent="my-research-crew", )

    That's it. ClevAgent starts sending heartbeats automatically. If your crew hangs or the process dies, you get alerted within 120 seconds.

    Step 3 (optional): Add a step callback for per-agent tracking

    CrewAI supports a step_callback on each agent. Wire it to ClevAgent to get visibility into each agent's work:

    def track_step(step_output):
        clevagent.ping(
            status="step_complete",
            meta={
                "agent": step_output.agent,
                "output_length": len(str(step_output.output)),
            },
        )
    

    Pass this callback when defining your agents:

    researcher = Agent(
        role="Research Analyst",
        goal="Find the latest market data",
        backstory="You are a senior research analyst...",
        llm=llm,
        step_callback=track_step,
    )
    

    Now every agent step shows up on your dashboard with timing and metadata.

    Complete example: 2-agent crew with monitoring

    Here's a full working example — a research crew with two agents, monitored by ClevAgent:

    import os
    from crewai import Agent, Task, Crew, Process
    import clevagent

    Initialize monitoring

    clevagent.init( api_key=os.environ["CLEVAGENT_API_KEY"], agent="daily-research-crew", )

    def track_step(step_output): clevagent.ping( status="step_complete", meta={ "agent": step_output.agent, "output_length": len(str(step_output.output)), }, )

    Define agents

    researcher = Agent( role="Research Analyst", goal="Find the 3 most important tech news stories today", backstory="You are a senior research analyst who reads dozens of sources daily.", verbose=True, step_callback=track_step, )

    writer = Agent( role="Report Writer", goal="Write a concise morning briefing from the research", backstory="You are a technical writer who distills complex topics into clear summaries.", verbose=True, step_callback=track_step, )

    Define tasks

    research_task = Task( description="Search for today's top 3 tech news stories. Include source URLs.", expected_output="A list of 3 news items with title, summary, and source URL.", agent=researcher, )

    writing_task = Task( description="Write a 200-word morning briefing based on the research.", expected_output="A formatted briefing email ready to send.", agent=writer, )

    Assemble and run

    crew = Crew( agents=[researcher, writer], tasks=[research_task, writing_task], process=Process.sequential, verbose=True, )

    result = crew.kickoff()

    Report completion with output metadata

    clevagent.ping( status="crew_complete", meta={ "output_length": len(str(result)), "agents_used": 2, }, )

    print(result)

    The entire monitoring integration is 8 lines — the init(), the track_step callback, and the final ping(). Your existing CrewAI code stays exactly the same.

    What ClevAgent catches

    Once connected, ClevAgent watches for three categories of problems:

    Crew hangs

    If no heartbeat arrives for 120 seconds, ClevAgent sends an alert to Telegram or Slack. This catches the most common CrewAI failure: an agent waiting on an LLM call that never returns. Your cron job sees a running process. ClevAgent sees a silent agent.

    Agent loops

    ClevAgent tracks the frequency and pattern of ping() calls. If an agent sends 50 step completions in 30 seconds with identical metadata, that's a loop. You get a warning before the token bill becomes a problem.

    Token cost spikes

    Every ping() with metadata feeds into per-run cost estimates. ClevAgent compares the current run against your historical average. A run that's 5x the normal cost triggers a warning. You can set a hard budget ceiling per agent in the dashboard — if exceeded, ClevAgent sends an immediate alert.

    Use clevagent.ping() for work-progress tracking

    Beyond failure detection, ping() is useful for tracking that your crew is actually doing its job. After each crew run, send a ping with business-level metadata:

    result = crew.kickoff()

    clevagent.ping( status="crew_complete", meta={ "report_date": today, "stories_found": len(stories), "word_count": len(result.split()), }, )

    On the ClevAgent dashboard, this creates a timeline of crew runs. You can see at a glance:

  • Did today's 6 AM run actually complete?
  • How many stories did it find compared to yesterday?
  • Is the output length consistent, or did something degrade?
  • This is the difference between "the process ran" and "the crew did useful work." Process monitoring gives you the first. Ping metadata gives you the second.

    What you'd see on the dashboard

    The ClevAgent dashboard shows a real-time view of your crew's health. For the example above, you'd see:

  • Agent card: "daily-research-crew" with a green heartbeat indicator, last seen 45 seconds ago
  • Timeline: A row of green dots for each successful crew run, with gaps highlighted in red when a run was missed
  • Step log: Individual entries for each agent step — "Research Analyst: step_complete" and "Report Writer: step_complete" with timestamps
  • Cost chart: A line graph showing token cost per run over the past 7 days, with a spike on Tuesday clearly visible
  • Alert history: "2026-03-28 06:14 — Heartbeat missed for 180s, alert sent to Telegram"
  • Everything is per-agent, so if you're running 3 different crews, each gets its own card and history.

    Getting started

  • Sign up at clevagent.io
  • Copy your API key from the dashboard
  • pip install clevagent
  • Add the 3 lines from Step 2 above
  • Run your crew — monitoring starts immediately
  • No config files, no YAML, no separate infrastructure. The SDK is 40KB and has zero dependencies beyond requests.

    Related reading

  • How to Monitor AI Agents in Production — The complete guide to heartbeat-based monitoring for any AI agent, not just CrewAI.
  • Three AI Agent Failure Modes That Traditional Monitoring Will Never Catch — Deep dive into silent exits, zombie agents, and runaway loops with real production examples.

  • Free for 3 agents — start monitoring →

    Get your API key in 30s →

    3 agents free · No credit card · Setup in 30 seconds