You built a CrewAI crew. A researcher agent finds data, an analyst agent writes the report, and crew.kickoff() ties them together. It works perfectly on your laptop.

Then you deploy it. It runs on a schedule — every morning at 6 AM. On day three, the researcher agent gets stuck in a retry loop against a rate-limited API. The analyst agent never receives input, so it sits idle. Your cron job exits with code 0 because CrewAI doesn't treat agent-level hangs as fatal errors. Nobody notices until Friday when someone asks why the reports stopped.

This is the core problem with multi-agent frameworks in production: a crew can fail without crashing.

Why CrewAI crews need dedicated monitoring

CrewAI orchestrates multiple agents that call LLMs, use tools, and pass context to each other. Each agent is a potential failure point:

•Agent hangs: One agent waits indefinitely for an LLM response that never comes. The crew stalls, but the process stays alive.

•Infinite loops: An agent retries a failed tool call endlessly. Your token meter spins, but no work output appears.

•Silent quality degradation: The LLM returns garbage, the next agent processes it anyway, and the final output is subtly wrong. No error thrown.

•Cost spikes: A single crew run normally costs $0.15. One bad run costs $12 because an agent kept rephrasing the same request.

Traditional process monitoring (systemd, Docker health checks) only tells you the process is alive. It tells you nothing about whether the *crew* is making progress.

Try it now — monitor your agent in 2 lines:

pip install clevagent

import clevagent
clevagent.init(api_key="cv_xxx", agent="my-agent")

Free for 3 agents. No credit card required. Get your API key →

Add ClevAgent to your CrewAI crew in 3 lines

ClevAgent monitors your crew at the agent level — heartbeats, loop detection, and per-run cost tracking. Setup takes about 30 seconds.

Step 1: Install

pip install clevagent

Step 2: Initialize before kickoff

import os
import clevagentclevagent.init(
    api_key=os.environ["CLEVAGENT_API_KEY"],
    agent="my-research-crew",
)

That's it. ClevAgent starts sending heartbeats automatically. If your crew hangs or the process dies, you get alerted within 120 seconds.

Step 3 (optional): Add a step callback for per-agent tracking

CrewAI supports a step_callback on each agent. Wire it to ClevAgent to get visibility into each agent's work:

def track_step(step_output):
    clevagent.ping(
        status="step_complete",
        meta={
            "agent": step_output.agent,
            "output_length": len(str(step_output.output)),
        },
    )

Pass this callback when defining your agents:

researcher = Agent(
    role="Research Analyst",
    goal="Find the latest market data",
    backstory="You are a senior research analyst...",
    llm=llm,
    step_callback=track_step,
)

Now every agent step shows up on your dashboard with timing and metadata.

Complete example: 2-agent crew with monitoring

Here's a full working example — a research crew with two agents, monitored by ClevAgent:

import os
from crewai import Agent, Task, Crew, Process
import clevagent
Initialize monitoring
clevagent.init(
    api_key=os.environ["CLEVAGENT_API_KEY"],
    agent="daily-research-crew",
)
def track_step(step_output):
    clevagent.ping(
        status="step_complete",
        meta={
            "agent": step_output.agent,
            "output_length": len(str(step_output.output)),
        },
    )
Define agents
researcher = Agent(
    role="Research Analyst",
    goal="Find the 3 most important tech news stories today",
    backstory="You are a senior research analyst who reads dozens of sources daily.",
    verbose=True,
    step_callback=track_step,
)
writer = Agent(
    role="Report Writer",
    goal="Write a concise morning briefing from the research",
    backstory="You are a technical writer who distills complex topics into clear summaries.",
    verbose=True,
    step_callback=track_step,
)
Define tasks
research_task = Task(
    description="Search for today's top 3 tech news stories. Include source URLs.",
    expected_output="A list of 3 news items with title, summary, and source URL.",
    agent=researcher,
)
writing_task = Task(
    description="Write a 200-word morning briefing based on the research.",
    expected_output="A formatted briefing email ready to send.",
    agent=writer,
)
Assemble and run
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True,
)
result = crew.kickoff()
Report completion with output metadata
clevagent.ping(
    status="crew_complete",
    meta={
        "output_length": len(str(result)),
        "agents_used": 2,
    },
)print(result)

The entire monitoring integration is 8 lines — the init(), the track_step callback, and the final ping(). Your existing CrewAI code stays exactly the same.

What ClevAgent catches

Once connected, ClevAgent watches for three categories of problems:

Crew hangs

If no heartbeat arrives for 120 seconds, ClevAgent sends an alert to Telegram or Slack. This catches the most common CrewAI failure: an agent waiting on an LLM call that never returns. Your cron job sees a running process. ClevAgent sees a silent agent.

Agent loops

ClevAgent tracks the frequency and pattern of ping() calls. If an agent sends 50 step completions in 30 seconds with identical metadata, that's a loop. You get a warning before the token bill becomes a problem.

Token cost spikes

Every ping() with metadata feeds into per-run cost estimates. ClevAgent compares the current run against your historical average. A run that's 5x the normal cost triggers a warning. You can set a hard budget ceiling per agent in the dashboard — if exceeded, ClevAgent sends an immediate alert.

Use clevagent.ping() for work-progress tracking

Beyond failure detection, ping() is useful for tracking that your crew is actually doing its job. After each crew run, send a ping with business-level metadata:

result = crew.kickoff()clevagent.ping(
    status="crew_complete",
    meta={
        "report_date": today,
        "stories_found": len(stories),
        "word_count": len(result.split()),
    },
)

On the ClevAgent dashboard, this creates a timeline of crew runs. You can see at a glance:

•Did today's 6 AM run actually complete?

•How many stories did it find compared to yesterday?

•Is the output length consistent, or did something degrade?

This is the difference between "the process ran" and "the crew did useful work." Process monitoring gives you the first. Ping metadata gives you the second.

What you'd see on the dashboard

The ClevAgent dashboard shows a real-time view of your crew's health. For the example above, you'd see:

•Agent card: "daily-research-crew" with a green heartbeat indicator, last seen 45 seconds ago

•Timeline: A row of green dots for each successful crew run, with gaps highlighted in red when a run was missed

•Step log: Individual entries for each agent step — "Research Analyst: step_complete" and "Report Writer: step_complete" with timestamps

•Cost chart: A line graph showing token cost per run over the past 7 days, with a spike on Tuesday clearly visible

•Alert history: "2026-03-28 06:14 — Heartbeat missed for 180s, alert sent to Telegram"

Everything is per-agent, so if you're running 3 different crews, each gets its own card and history.

Getting started

Copy your API key from the dashboard

pip install clevagent

Add the 3 lines from Step 2 above

Run your crew — monitoring starts immediately

No config files, no YAML, no separate infrastructure. The SDK is 40KB and has zero dependencies beyond requests.

How to Monitor CrewAI Agents in Production

Why CrewAI crews need dedicated monitoring

Add ClevAgent to your CrewAI crew in 3 lines

Step 1: Install

Step 2: Initialize before kickoff

Step 3 (optional): Add a step callback for per-agent tracking

Complete example: 2-agent crew with monitoring

Initialize monitoring

Define agents

Define tasks

Assemble and run

Report completion with output metadata