LangGraph's HITL Has a Double Execution Problem

LangGraph ships native human-in-the-loop support: interrupt() pauses the graph, Command(resume=...) restarts it. The idea is that you don’t need a custom state machine or a bespoke approval layer.

I toyed around with the code and here’s what it looks like:

The protocol is as clean as advertised

The entire HITL flow is two HTTP requests.

# Request 1 — fresh invocation
# The graph runs until it hits interrupt(), then halts.
POST /chat
{ "messages": [{"role": "user", "content": "..."}] }

→ { "thread_id": "abc-uuid", "interrupt": { "tool_name": "send_email", "args": {...} } }

# Request 2 — resume
# The graph reloads its checkpoint and continues from where it stopped.
POST /chat
{ "thread_id": "abc-uuid", "resume": "approve" }

→ { "thread_id": "abc-uuid", "message": "Done, email sent to alice@example.com." }

The frontend shape is simple enough to drive with no extra state machine. The interrupt field either exists or it doesn’t. When it does, render an approval card. When the user decides, send the same thread_id back with a resume value.

The server generates the thread_id on the first request and returns it in every response. The client stores it and echoes it back. That’s the entire session contract.

Sequential approvals work correctly

When the LLM requests two approval-required tools in a single turn, LangGraph handles them without any extra code.

Prompt: “Send an email to alice@example.com and also to bob@example.com.”

First interrupt() call → card shows alice’s args → user approves
On resume, tool_node re-executes from the top — but this time the first interrupt() returns "approve" immediately from the checkpoint
Second interrupt() call → card shows bob’s args → user decides

LangGraph replays earlier interrupt values from the checkpoint and surfaces a new card for each subsequent tool. Positional matching is automatic.

Each tool call is a separate checkpoint, not a batch. So you can have multiple approval-required tools in the same node, and the user approves them one at a time without any extra state tracking.

What the checkpoint actually does

When interrupt() is called inside a node, LangGraph serialises the full graph execution state to the checkpointer and halts. On resume, it restores that checkpoint and re-executes the node from the top.

The official docs actually say this plainly: “Whenever execution resumes, it starts at the beginning of the node.” It’s not hidden. But the docs only show clean single-tool examples, so the side effect on mixed nodes never comes up.

For a node that does a single approval-required action, this is fine. For a node that calls multiple tools before hitting interrupt(), it isn’t.

tool_node runs all tool calls in a loop:

async def tool_node(state: AgentState) -> AgentState:
    results = []
    for tool_call in state["messages"][-1].tool_calls:
        if tool_call["name"] in APPROVAL_REQUIRED:
            decision = interrupt({"tool_name": tool_call["name"], "args": tool_call["args"], ...})
            if decision == "approve":
                result = await run_tool(tool_call)
                results.append(result)
        else:
            result = await run_tool(tool_call)
            results.append(result)
    return {"messages": results}

results is a local variable not part of AgentState. When interrupt() halts the node, that list is discarded with it.

The double execution bug

Prompt: “Create a support ticket for this issue and email the customer with the ticket number.”

The LLM decides to call create_ticket (auto-execute) and send_email (requires approval) in the same turn. Here’s what actually runs:

create_ticket called: title="Login broken" ← first execution, before interrupt
  ... graph pauses, user sees approval card for the customer email ...
create_ticket called: title="Login broken"  ← second execution, on resume
send_email called: to=customer@example.com subject="Your ticket #..."

Two tickets for the same issue. The user was reviewing the email draft and clicked Approve. The conversation shows one ticket, one email. No error, no duplicate in the message history.

LangGraph builds a single ToolMessage from the final committed run, so the message history has the right shape. The bug is invisible at the application layer. The ticket ran twice before the user saw anything, and the customer now has two open tickets.

This is a direct consequence of re-execution on resume. The node re-runs from the top, and any tool that already fired before the interrupt fires again. It’s not obvious from the docs until you add logging and look at what actually ran.

This isn’t an edge case. A LangGraph maintainer opened issue #6208 in September 2025 titled “Do not re-execute a node that interrupted unless all of its interrupts have been resumed.”

Their own comment:

a node with two interrupts will rerun after only one resume. We can’t solve this without knowing how many resumes are pending per task […]

The issue is still open and unresolved. Related open issues confirm the pattern goes further — #6533 covers interrupt resume values misrouted between tools in a ToolNode, and #6626 covers parallel interrupts generating identical IDs that make multi-interrupt resume impossible.

If your agent node mixes auto-execute and approval-required tools, you either need to separate them into distinct nodes, or accumulate results into graph state before the interrupt so re-execution can detect what already ran. The maintainer’s own recommended workaround is the same: chain multiple nodes rather than mixing tool types in one.

My rule of thumb: any tool with side effects or irreversible consequences gets its own node.

Approval binding is positional, not explicit

When the client sends resume: "approve", it sends no tool_call_id. The server applies that decision to whatever interrupt the checkpoint is currently waiting on. The binding is purely positional: the checkpoint knows where it paused, and "approve" lands there.

The tool_call_id in the interrupt payload exists so the frontend can display it. It is never echoed back to the server, and the server never verifies it.

In my previous post I covered the attack surface that opens up when servers reconstruct pending tool calls from client-supplied data.

LangGraph’s checkpointer closes that specific gap; the server holds the pending state, the client can’t supply a fabricated tool call. But the binding is still implicit: if the approval UI displays incorrect args due to a frontend bug, the user approves one thing and the checkpoint resumes another.

For many use cases this is fine. For anything involving irreversible actions or compliance requirements, it’s worth knowing.

What needs addressing before production

Double execution. Any node that mixes auto-execute and approval-required tools needs refactoring. The simplest fix is a dedicated approval_node that only ever contains one tool call, so re-execution on resume can’t double-fire a create_ticket, provision_environment, or anything else with a side effect.

Thread ID ownership. The server should generate and own the thread_id. Letting the client supply an arbitrary ID on the first request would let it replay or hijack another session’s checkpoint. In the PoC, the server generates a UUID when thread_id is null and returns it in every response. In production this would be derived from the authenticated user session, not a random UUID the client echoes.

Shared checkpointer for multi-pod deployments. InMemorySaver is local to the process. Redis or Postgres as the checkpointer means any pod can load the checkpoint for a given thread_id — but only if the client includes thread_id on every request, which the protocol already requires.

The core primitive is solid. interrupt() / Command(resume=...) maps cleanly onto two HTTP requests. The APPROVAL_REQUIRED set keeps the graph topology unaware of approval logic — adding a new tool to the gate is one line. Detecting interrupts via aget_state() after ainvoke() is more reliable than inferring state from the return value.

”Native HITL” means LangGraph handles the state machine. It doesn’t mean the double-execution problem doesn’t exist. The checkpoint model is elegant, but re-execution on resume has consequences that aren’t visible until you add logging.

The full PoC, including backend, frontend, and test plan, is on GitHub.