Add heartbeat monitoring, loop detection, and cost tracking to your AutoGen multi-agent systems. Works with both AutoGen 0.2 and AgentChat 0.4.
You built a multi-agent AutoGen system. An AssistantAgent drafts the plan, a UserProxyAgent executes the code, and they pass messages back and forth until the task is done. Works fine in your terminal.
Then you productionize it. The job runs overnight, processing a queue of incoming requests. On night two, the assistant agent generates code that fails silently — wrong output format, no exception raised. The executor tries to parse the result, sends confused feedback, and the assistant tries again. And again. Twelve iterations. Forty minutes. $18 in API calls. Your queue processes exactly zero items.
AutoGen's conversation loop is powerful precisely because it persists until an agent says it's done. That same property makes it dangerous in production: the loop never stops unless something terminates it or you're watching.
AutoGen's multi-agent architecture has specific failure modes that generic infrastructure monitoring won't catch:
max_turns is a safety net, not a monitor — it terminates the run but doesn't alert you.UserProxyAgent executes generated code. If that code hangs (waiting on a socket, stuck in a loop), the conversation stalls indefinitely.Process-level monitoring (Docker health checks, systemd watchdog) only knows your Python process is running. It has no idea whether the agents are making progress.
Monitor your AutoGen agents now:
pip install clevagent
import clevagent
clevagent.init(api_key="cv_xxx", agent="my-autogen-agent")
Free tier: 3 agents, no credit card. Get your API key →
ClevAgent monitors AutoGen conversations at the job level — tracking heartbeats, iteration counts, and token costs so you can catch runaway loops and budget spikes before they get out of hand.
pip install clevagent
import os
import autogen
import clevagentclevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="nightly-analysis-job",
)
assistant = autogen.AssistantAgent(
name="assistant",
llm_config={"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]},
)
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
code_execution_config={"work_dir": "workspace"},
)
user_proxy.initiate_chat(assistant, message="Analyze Q1 sales data and output summary.json")
clevagent.shutdown()
ClevAgent starts sending heartbeats from init(). If the process dies or hangs without calling shutdown(), you get alerted within 120 seconds.
AutoGen fires callbacks you can hook. Use the reply_func mechanism to emit a ping per turn — this is the signal that tells ClevAgent the conversation is actively progressing rather than just alive at the process level:
import os
import autogen
import clevagentclevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="nightly-analysis-job",
)
turn_count = 0
def track_turn(recipient, messages, sender, config):
global turn_count
turn_count += 1
last_msg = messages[-1] if messages else {}
clevagent.ping(
status="turn_complete",
meta={
"turn": turn_count,
"sender": sender.name,
"msg_length": len(str(last_msg.get("content", ""))),
},
)
return False, None # continue normal processing
assistant = autogen.AssistantAgent(
name="assistant",
llm_config={"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]},
)
assistant.register_reply(
trigger=autogen.ConversableAgent,
reply_func=track_turn,
position=0,
)
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
code_execution_config={"work_dir": "workspace"},
)
user_proxy.initiate_chat(assistant, message="Analyze Q1 sales data and output summary.json")
clevagent.shutdown()
With per-turn pings, ClevAgent can detect if the conversation stalls mid-run — not just if the process dies.
AutoGen's llm_config includes a filter_dict for model usage, but it doesn't report costs out of band. The simplest approach is to log costs at the end of each initiate_chat() call using the conversation's cost summary:
import os
import autogen
import clevagentclevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="nightly-analysis-job",
)
assistant = autogen.AssistantAgent(
name="assistant",
llm_config={"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]},
)
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
code_execution_config={"work_dir": "workspace"},
)
chat_result = user_proxy.initiate_chat(
assistant,
message="Analyze Q1 sales data and output summary.json",
)
AutoGen 0.2 returns cost info in chat_result.cost
if hasattr(chat_result, "cost") and chat_result.cost:
usage = chat_result.cost.get("usage_including_cached_inference", {})
total_tokens = usage.get("total_tokens", 0)
total_cost = usage.get("total_cost", 0.0)
clevagent.log_cost(
tokens=total_tokens,
cost_usd=total_cost,
model="gpt-4o",
)clevagent.shutdown()
ClevAgent will alert you if cost per run exceeds your configured threshold — useful for catching the "20-message death spiral" before it happens three nights in a row.
If you prefer a wrapper approach over manual callbacks, the clevagent.integrations.autogen module provides a drop-in mixin:
import os
import autogen
from clevagent.integrations.autogen import MonitoredAssistantAgent
import clevagentclevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="nightly-analysis-job",
)
Drop-in replacement for autogen.AssistantAgent
assistant = MonitoredAssistantAgent(
name="assistant",
llm_config={"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"]},
)user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=10,
code_execution_config={"work_dir": "workspace"},
)
user_proxy.initiate_chat(assistant, message="Analyze Q1 sales data and output summary.json")
clevagent.shutdown()
MonitoredAssistantAgent wraps generate_reply() to emit pings and log token usage automatically. You get per-turn tracking without registering callbacks manually.
Once you have monitoring in place, here's what's worth alerting on for AutoGen jobs:
Heartbeat timeout — Set it to 2x your expected max conversation duration. If a typical run takes 5 minutes, alert at 10 minutes. AutoGen conversations can legitimately take longer than expected, so give enough headroom.
Iteration count — max_consecutive_auto_reply is a hard stop, but you want to know before you hit it. If your normal conversations finish in 4–6 turns and you're seeing 9–10, that's a signal the agents are stuck.
Cost per run — Establish a baseline over a week of normal runs. Alert if any single run exceeds 3x the median. One-off spikes happen; consistent spikes mean something changed.
Silent completion with no output file — This requires application-level logic, but it's worth adding: after shutdown(), check that the expected output artifact exists. A "successful" AutoGen run that produced no output is often worse than one that crashed.
Microsoft renamed and restructured AutoGen as AgentChat in the 0.4.x series. The core monitoring pattern is identical — clevagent.init() before the conversation, clevagent.shutdown() after, and clevagent.ping() in message handlers. The specific callback registration API differs:
AgentChat 0.4 pattern
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_ext.models.openai import OpenAIChatCompletionClient
import clevagentclevagent.init(
api_key=os.environ["CLEVAGENT_API_KEY"],
agent="agentchat-job",
)
model_client = OpenAIChatCompletionClient(model="gpt-4o")
agent = AssistantAgent(name="assistant", model_client=model_client)
team = RoundRobinGroupChat([agent], max_turns=10)
import asyncio
async def run():
result = await team.run(task="Analyze Q1 sales data")
clevagent.ping(status="complete", meta={"turns": len(result.messages)})
clevagent.shutdown()
asyncio.run(run())
The heartbeat and alerting behavior is identical regardless of which AutoGen version you're on.
The scenario at the top of this post — assistant and executor stuck in a disagreement loop for 40 minutes — looks like this on ClevAgent's dashboard:
Without monitoring, you find out at 7 AM when you check the results. With monitoring, you get a Slack message at minute 12 and can kill the job before it burns through $18.
Stop finding out about AutoGen failures at 7 AM. ClevAgent monitors your conversations in real time — heartbeats, loop detection, cost alerts, and daily reports. Free for up to 3 agents.
3 agents free · No credit card · Setup in 30 seconds