Add heartbeat monitoring, loop detection, and cost tracking to your CrewAI crews. 3 lines of code, zero config.
You built a CrewAI crew. A researcher agent finds data, an analyst agent writes the report, and crew.kickoff() ties them together. It works perfectly on your laptop.
Then you deploy it. It runs on a schedule — every morning at 6 AM. On day three, the researcher agent gets stuck in a retry loop against a rate-limited API. The analyst agent never receives input, so it sits idle. Your cron job exits with code 0 because CrewAI doesn't treat agent-level hangs as fatal errors. Nobody notices until Friday when someone asks why the reports stopped.
This is the core problem with multi-agent frameworks in production: a crew can fail without crashing.
CrewAI orchestrates multiple agents that call LLMs, use tools, and pass context to each other. Each agent is a potential failure point:
Traditional process monitoring (systemd, Docker health checks) only tells you the process is alive. It tells you nothing about whether the *crew* is making progress.
Try it now — monitor your agent in 2 lines:
pip install clevagent
import clevagent
clevagent.init(api_key="cv_xxx", agent="my-agent")
Free for 3 agents. No credit card required. Get your API key →
ClevAgent monitors your crew at the agent level — heartbeats, loop detection, and per-run cost tracking. Setup takes about 30 seconds.
pip install clevagent
import os
import clevagentclevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="my-research-crew",
)
That's it. ClevAgent starts sending heartbeats automatically. If your crew hangs or the process dies, you get alerted within 120 seconds.
CrewAI supports a step_callback on each agent. Wire it to ClevAgent to get visibility into each agent's work:
def track_step(step_output):
clevagent.ping(
status="step_complete",
meta={
"agent": step_output.agent,
"output_length": len(str(step_output.output)),
},
)
Pass this callback when defining your agents:
researcher = Agent(
role="Research Analyst",
goal="Find the latest market data",
backstory="You are a senior research analyst...",
llm=llm,
step_callback=track_step,
)
Now every agent step shows up on your dashboard with timing and metadata.
Here's a full working example — a research crew with two agents, monitored by ClevAgent:
import os
from crewai import Agent, Task, Crew, Process
import clevagentInitialize monitoring
clevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="daily-research-crew",
)def track_step(step_output):
clevagent.ping(
status="step_complete",
meta={
"agent": step_output.agent,
"output_length": len(str(step_output.output)),
},
)
Define agents
researcher = Agent(
role="Research Analyst",
goal="Find the 3 most important tech news stories today",
backstory="You are a senior research analyst who reads dozens of sources daily.",
verbose=True,
step_callback=track_step,
)writer = Agent(
role="Report Writer",
goal="Write a concise morning briefing from the research",
backstory="You are a technical writer who distills complex topics into clear summaries.",
verbose=True,
step_callback=track_step,
)
Define tasks
research_task = Task(
description="Search for today's top 3 tech news stories. Include source URLs.",
expected_output="A list of 3 news items with title, summary, and source URL.",
agent=researcher,
)writing_task = Task(
description="Write a 200-word morning briefing based on the research.",
expected_output="A formatted briefing email ready to send.",
agent=writer,
)
Assemble and run
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential,
verbose=True,
)result = crew.kickoff()
Report completion with output metadata
clevagent.ping(
status="crew_complete",
meta={
"output_length": len(str(result)),
"agents_used": 2,
},
)print(result)
The entire monitoring integration is 8 lines — the init(), the track_step callback, and the final ping(). Your existing CrewAI code stays exactly the same.
Once connected, ClevAgent watches for three categories of problems:
If no heartbeat arrives for 120 seconds, ClevAgent sends an alert to Telegram or Slack. This catches the most common CrewAI failure: an agent waiting on an LLM call that never returns. Your cron job sees a running process. ClevAgent sees a silent agent.
ClevAgent tracks the frequency and pattern of ping() calls. If an agent sends 50 step completions in 30 seconds with identical metadata, that's a loop. You get a warning before the token bill becomes a problem.
Every ping() with metadata feeds into per-run cost estimates. ClevAgent compares the current run against your historical average. A run that's 5x the normal cost triggers a warning. You can set a hard budget ceiling per agent in the dashboard — if exceeded, ClevAgent sends an immediate alert.
Beyond failure detection, ping() is useful for tracking that your crew is actually doing its job. After each crew run, send a ping with business-level metadata:
result = crew.kickoff()clevagent.ping(
status="crew_complete",
meta={
"report_date": today,
"stories_found": len(stories),
"word_count": len(result.split()),
},
)
On the ClevAgent dashboard, this creates a timeline of crew runs. You can see at a glance:
This is the difference between "the process ran" and "the crew did useful work." Process monitoring gives you the first. Ping metadata gives you the second.
The ClevAgent dashboard shows a real-time view of your crew's health. For the example above, you'd see:
Everything is per-agent, so if you're running 3 different crews, each gets its own card and history.
pip install clevagentNo config files, no YAML, no separate infrastructure. The SDK is 40KB and has zero dependencies beyond requests.
Free for 3 agents — start monitoring →
3 agents free · No credit card · Setup in 30 seconds