[Ch 5] Tools Integration, Guardrails & Safety Patterns

Apr 7, 2026 · 8 min read
blog AI Agent

In Ch 4 we built a working multi-turn agent. But it has a critical gap: the tools have no idea who is calling them. In production, you need tools to know the current user, apply safety filters to inputs and outputs, and pause for human approval before irreversible actions. This chapter implements all three.


The Three Safety Layers

graph TD U([User Input]) --> G1["🛡️ Input Guardrail\n(NeMo)"] G1 -->|"safe"| A["🧠 Agent Loop"] G1 -->|"blocked"| R1([Safe Refusal]) A --> T["🔧 Tool Execution\n(with AgentContext)"] T --> H{{"⚠️ High-stakes\naction?"}} H -->|"yes"| I["⏸️ Human-in-the-Loop\ninterrupt()"] I -->|"approved"| T2["✅ Execute Action"] I -->|"rejected"| R2([Cancel Action]) H -->|"no"| T2 T2 --> G2["🛡️ Output Guardrail\n(NeMo)"] G2 --> F([Response to User]) style U fill:#4CAF50,color:#fff,stroke:none style F fill:#2196F3,color:#fff,stroke:none style G1 fill:#E53935,color:#fff,stroke:none style G2 fill:#E53935,color:#fff,stroke:none style I fill:#FF9800,color:#fff,stroke:none style A fill:#9C27B0,color:#fff,stroke:none style T fill:#0288D1,color:#fff,stroke:none
Fig 1: Three-layer safety architecture for production agents

Part 1: AgentContext — Safe User Context Injection

The Problem

You need your tools to know user_id, session_id, and any per-request metadata. The naive approach is to embed them in the system prompt or the user message:

# ❌ Bad: pollutes the conversation history
user_message = f"[USER_ID: {user_id}] {actual_query}"

This is bad for three reasons:

  1. It wastes precious context window tokens on every turn
  2. The LLM may paraphrase or drop the metadata in its reasoning
  3. It leaks internal identifiers into the conversation transcript

The Solution: config["configurable"]

LangGraph passes a RunnableConfig through every node and tool call. Use it to carry context that’s external to the conversation:

# agent/context.py
from pydantic import BaseModel

class AgentContext(BaseModel):
    """Typed context passed through every agent call without touching messages."""
    user_id: str
    session_id: str
    permissions: list[str] = []   # e.g. ["read", "write", "delete"]
    org_id: str = ""
# agent/tools.py — accessing context inside a tool
from langchain_core.runnables import RunnableConfig
from langchain_core.tools import tool
from .context import AgentContext

@tool
def delete_task(task_id: str, config: RunnableConfig) -> str:
    """Permanently delete a task. Requires 'delete' permission."""
    # LangGraph automatically injects `config` — just add it as a parameter
    ctx_data = config.get("configurable", {}).get("agent_context")
    if ctx_data is None:
        return "Error: No agent context provided."

    ctx = AgentContext(**ctx_data) if isinstance(ctx_data, dict) else ctx_data

    # Permission check using context — no need to pass user_id as a tool arg
    if "delete" not in ctx.permissions:
        return f"Error: User {ctx.user_id} does not have delete permission."

    if task_id not in _tasks:
        return f"Error: Task '{task_id}' not found."

    task_title = _tasks.pop(task_id)["title"]
    return f"Deleted task '{task_title}' (ID: {task_id})."
# agent/main.py — how to pass context at call time
from .context import AgentContext

ctx = AgentContext(
    user_id="user-42",
    session_id="sess-abc",
    permissions=["read", "write"],   # no "delete" for this user
    org_id="acme-corp",
)

config = {
    "configurable": {
        "thread_id": "user-42-session-1",
        "agent_context": ctx.model_dump(),   # serialize for LangGraph
    }
}

app.invoke({"messages": [HumanMessage(content="Delete task task-001")]}, config=config)
# → Tool returns: "Error: User user-42 does not have delete permission."

The key insight: config["configurable"] is available in every tool function — just add a config: RunnableConfig parameter, and LangGraph injects it automatically. No message pollution, no token waste.


Part 2: NeMo Guardrails

NeMo Guardrails is NVIDIA’s open-source library for adding programmable safety rails to LLM applications. It works as a layer around your agent: inputs go through rails before reaching the LLM, and outputs go through rails before being returned.

Installation

pip install nemoguardrails

Guardrail Configuration

NeMo uses Colang (a domain-specific language) for defining rails. Create a config directory:

guardrails/
├── config.yml
└── main.co         # Colang flow definitions
# guardrails/config.yml
models:
  - type: main
    engine: openai
    model: gpt-4o-mini

rails:
  input:
    flows:
      - check harmful input
  output:
    flows:
      - check harmful output
# guardrails/main.co

# ── Input rail: block harmful requests ───────────────────────────────────────
define flow check harmful input
  $is_harmful = execute check_input_for_harm

  if $is_harmful
    bot refuse harmful input
    stop

define bot refuse harmful input
  "I'm sorry, I can't help with that request."

# ── Output rail: block harmful responses ─────────────────────────────────────
define flow check harmful output
  $is_harmful = execute check_output_for_harm

  if $is_harmful
    bot provide safe response
    stop

define bot provide safe response
  "I'm not able to provide that information."

Integrating Guardrails with Your Agent

The cleanest pattern is to wrap your agent call — not embed NeMo inside the LangGraph graph itself. This keeps the graph simple and makes the rails easy to swap:

# agent/guardrail.py
import os
from nemoguardrails import RailsConfig, LLMRails

def load_rails(config_path: str = "guardrails/") -> LLMRails:
    """Load NeMo guardrails from config directory."""
    config = RailsConfig.from_path(config_path)
    return LLMRails(config)

rails = load_rails()
# agent/main.py — wrapped agent call with guardrails
async def safe_chat(app, thread_id: str, user_input: str) -> str:
    """Run user input through guardrails before and after the agent."""

    # ── 1. Input guardrail ────────────────────────────────────────────────────
    input_check = await rails.generate_async(
        messages=[{"role": "user", "content": user_input}]
    )
    # NeMo returns the guardrail response; if it's a refusal, return immediately
    if _is_refusal(input_check):
        return input_check["content"]

    # ── 2. Run the agent ──────────────────────────────────────────────────────
    config = {"configurable": {"thread_id": thread_id}}
    final_state = app.invoke(
        {"messages": [HumanMessage(content=user_input)]},
        config=config,
    )
    agent_response = final_state["messages"][-1].content

    # ── 3. Output guardrail ───────────────────────────────────────────────────
    output_check = await rails.generate_async(
        messages=[
            {"role": "user", "content": user_input},
            {"role": "assistant", "content": agent_response},
        ]
    )
    if _is_refusal(output_check):
        return "I can't provide that response."

    return agent_response


def _is_refusal(response: dict) -> bool:
    """Detect if NeMo triggered a refusal rail."""
    content = response.get("content", "")
    refusal_phrases = ["I'm sorry, I can't", "I'm not able to"]
    return any(phrase in content for phrase in refusal_phrases)

⚠️ Guardrails add latency. Each rail check is an additional LLM call (~500ms–1s). Profile your p95 latency before adding rails to every endpoint. For low-risk internal tools, input rails alone may be sufficient.


Part 3: Human-in-the-Loop (HITL)

Some actions are irreversible: deleting records, sending emails, making payments. For these, you want the agent to pause and ask the human before proceeding. LangGraph has first-class support for this via interrupt().

How interrupt() Works

sequenceDiagram participant U as User participant A as Agent Loop participant T as Tool Node U->>A: "Delete all completed tasks" A->>T: tool_call: delete_completed_tasks T->>T: interrupt("Are you sure?") Note over T: Graph pauses here.
State is saved to checkpointer. T-->>U: "⚠️ About to delete 5 tasks. Confirm? (yes/no)" U->>A: resume with {"approved": true} A->>T: Resume execution T->>T: Execute deletion T-->>U: "Deleted 5 completed tasks."
Fig 2: HITL flow — agent pauses mid-tool for human approval

Implementation

# agent/tools.py — a tool that interrupts for confirmation
from langgraph.types import interrupt, Command

class DeleteCompletedInput(BaseModel):
    confirm_message: str = Field(
        description="Message to show the user when asking for confirmation"
    )

@tool("delete_completed_tasks", args_schema=DeleteCompletedInput)
def delete_completed_tasks(confirm_message: str) -> str:
    """Delete all tasks marked as 'done'. Requires human confirmation."""
    done_tasks = [t for t in _tasks.values() if t["status"] == "done"]

    if not done_tasks:
        return "No completed tasks to delete."

    # ── INTERRUPT: pause and ask the human ───────────────────────────────────
    approval = interrupt({
        "question": f"About to permanently delete {len(done_tasks)} completed task(s). Continue?",
        "tasks_to_delete": [t["title"] for t in done_tasks],
        "action": "delete_completed_tasks",
    })

    # Execution resumes here after the human responds
    if not approval.get("approved", False):
        return "Deletion cancelled by user."

    # Execute the deletion
    for task in done_tasks:
        del _tasks[task["id"]]

    return f"Successfully deleted {len(done_tasks)} completed task(s)."
# agent/main.py — handling the interrupt in your API/CLI layer
from langgraph.types import Command

def chat_with_hitl(app, thread_id: str, user_input: str):
    config = {"configurable": {"thread_id": thread_id}}

    # First invoke — may pause at an interrupt
    for event in app.stream(
        {"messages": [HumanMessage(content=user_input)]},
        config=config,
        stream_mode="values",
    ):
        if "__interrupt__" in event:
            interrupt_data = event["__interrupt__"][0].value
            print(f"\n⚠️  CONFIRMATION REQUIRED")
            print(f"   {interrupt_data['question']}")
            print(f"   Tasks: {', '.join(interrupt_data['tasks_to_delete'])}")

            answer = input("   Approve? (yes/no): ").strip().lower()
            approved = answer in ("yes", "y")

            # Resume the graph with the human's decision
            for resume_event in app.stream(
                Command(resume={"approved": approved}),
                config=config,
                stream_mode="values",
            ):
                last = resume_event.get("messages", [])
                if last:
                    print(f"\nAssistant: {last[-1].content}")
            return

    # No interrupt — normal completion
    state = app.get_state(config)
    if state.values["messages"]:
        print(f"\nAssistant: {state.values['messages'][-1].content}")

What happens under the hood:

  1. interrupt() raises a special exception that LangGraph catches
  2. The graph state (including the pending tool call) is saved to the checkpointer
  3. The graph returns control to your code with an __interrupt__ event
  4. When you call app.stream(Command(resume=...), config=config), LangGraph restores the state and continues from exactly where it paused

Tool Design Checklist

Before shipping any tool to production:

CheckWhy
✅ Pydantic args_schema with Field(description=...)The LLM reads descriptions to know how to call the tool
✅ Returns str error messages (not exceptions)Prevents tool errors from crashing the agent loop
✅ Validates inputs explicitlyDon’t trust the LLM to pass well-formed data
✅ Uses config: RunnableConfig for user contextNo user IDs or secrets in the message stream
interrupt() for irreversible actionsDelete, send, pay, deploy — always ask first
✅ Logs tool calls with input/output (see Ch 7)Essential for debugging production issues

.env.example

# .env.example
OPENAI_API_KEY=your-api-key-here

Summary

PatternProblem Solved
AgentContext via config["configurable"]Pass user context to tools without message pollution
NeMo Guardrails (input rail)Block harmful or off-topic requests before the LLM sees them
NeMo Guardrails (output rail)Prevent the agent from returning harmful content
interrupt() + Command(resume=...)Pause for human approval before irreversible actions
Tool error strings (not exceptions)Tools fail gracefully without crashing the agent loop

In the next chapter, we add proper memory management — giving the agent persistent long-term knowledge via vector stores and controlling the growth of conversation history.


← Ch 4: Build Your First Agent | Ch 6: Memory Management →