ClevAgent
← All posts
2026-04-03monitoringlangchainproductionagents

How to Monitor LangChain Agents in Production

Add heartbeat monitoring to LangChain and LangGraph agents. Detect stuck chains, runaway loops, and cost spikes before they burn through your budget.

Your LangChain agent works in development. Chains resolve, tools return, the ReAct loop converges.

Then you deploy it. Day one is fine. Day two, the agent processes 200 requests without a single error.

Day three, you check your OpenAI bill. $340 — on an agent that should cost $15/day. The agent got stuck in a tool-retry loop at 2 AM. The LLM kept calling a search tool that returned empty results, parsing the empty response, deciding it needed to search again, and repeating. No exceptions. No crashes. Every health check returned 200 OK.

Traditional monitoring tools — including LangSmith — would have shown you the traces after the fact. Nobody would have woken you up at 2 AM when it started.

Why LangChain agents need runtime monitoring

LangChain agents fail differently from web services:

  • Stuck chains: An HTTP tool call hangs indefinitely. The chain never completes. The process is alive, the health endpoint responds, but no work is happening.
  • Infinite ReAct loops: The agent keeps calling tools without converging. max_iterations helps, but only if you set it — and only for iteration count, not cost.
  • Silent cost spikes: A loop that makes 50 LLM calls in 30 seconds doesn't spike CPU. It spikes your API bill. By the time you see the invoice, the damage is done.
  • Zombie agents: The callback thread is alive, traces are flowing to LangSmith, but the actual work loop is stuck on a deadlocked resource.
  • LangSmith and Langfuse are excellent for tracing — understanding what happened after the fact. But they don't answer the real-time question: is this agent alive and making progress right now?


    Free for 3 agents. No credit card required. Get your API key →


    Add ClevAgent to your LangChain agent in 3 lines

    Step 1. Install the SDK.

    pip install clevagent
    

    Step 2. Initialize ClevAgent with your API key.

    import clevagent

    clevagent.init( api_key="your-api-key", agent="langchain-research-agent", )

    Step 3. Add the callback handler to your LLM or chain.

    from clevagent.integrations.langchain import ClevAgentCallbackHandler

    handler = ClevAgentCallbackHandler() llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])

    That's it. Every LLM call now sends a heartbeat with token usage. If the agent stops calling the LLM — because a chain hung, a tool timed out, or the process crashed — ClevAgent detects the silence and alerts you.

    Complete example

    import clevagent
    from clevagent.integrations.langchain import ClevAgentCallbackHandler
    from langchain_openai import ChatOpenAI
    from langchain.agents import AgentExecutor, create_react_agent
    from langchain.tools import Tool

    clevagent.init(api_key="your-api-key", agent="research-agent") handler = ClevAgentCallbackHandler()

    llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])

    tools = [ Tool(name="search", func=search_web, description="Search the web"), Tool(name="calculate", func=calculator, description="Do math"), ]

    agent = create_react_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, max_iterations=15)

    Every LLM call and tool use is now monitored

    result = executor.invoke({"input": "Research the latest AI agent frameworks"})

    LangGraph agents: use the node decorator

    For LangGraph's graph-based agents, ClevAgent provides a @monitored_node decorator that wraps each node with automatic heartbeat monitoring:

    from clevagent.integrations.langgraph import monitored_node
    from langgraph.graph import StateGraph

    @monitored_node("research") def research_node(state): result = llm.invoke(state["messages"]) return {"messages": [result]}

    @monitored_node("summarize") def summarize_node(state): summary = llm.invoke(f"Summarize: {state['messages'][-1].content}") return {"messages": [summary]}

    graph = StateGraph(AgentState) graph.add_node("research", research_node) graph.add_node("summarize", summarize_node) graph.add_edge("research", "summarize")

    Each node execution sends a heartbeat. If a node hangs — because an API call never returns or an LLM request times out — ClevAgent detects the gap and alerts you.

    Or use the explicit callback for more control:

    from clevagent.integrations.langgraph import clevagent_node_callback

    def research_node(state): result = llm.invoke(state["messages"]) clevagent_node_callback("research", tokens=result.usage_metadata.get("total_tokens", 0)) return {"messages": [result]}

    What ClevAgent catches

    Stuck chains and hung tools

    Your agent calls an external API inside a tool. The API hangs. The chain never completes. The process is still alive — systemctl status says "running" — but no heartbeats are arriving.

    ClevAgent detects the silence within your configured threshold (default: 120 seconds) and sends an alert.

    Infinite ReAct loops

    The agent enters a loop: call tool → parse result → decide to call tool again → repeat. max_iterations caps the count, but what about cost? An agent that makes 15 iterations of GPT-4o calls in 30 seconds burns through tokens fast.

    ClevAgent tracks cumulative token usage per heartbeat cycle. If tokens spike 10-100x above your agent's baseline, you get a cost alert — while the loop is still running, not after.

    Silent exits

    The process gets OOM-killed at 3 AM. No traceback, no error log, no alert. The agent just stops.

    ClevAgent expects a heartbeat every N seconds. When it stops arriving, you get an alert within one missed interval. Optional auto-restart brings the agent back without manual intervention.

    Getting started

  • pip install clevagent
  • Get your API key from clevagent.io/signup
  • Add clevagent.init() and the callback handler
  • Deploy. ClevAgent starts monitoring immediately.
  • Configure alerts in the dashboard — Telegram, Slack, Discord, or email.
  • Related reading

  • Three AI Agent Failure Modes That Traditional Monitoring Will Never Catch
  • Why Your AI Agent Health Check Is Lying to You
  • How to Track LLM Token Costs in Production

  • *ClevAgent monitors LangChain and LangGraph agents with heartbeat detection, cost tracking, and auto-restart. Free for up to 3 agents — start monitoring →*

    Get your API key in 30s →

    3 agents free · No credit card · Setup in 30 seconds