LEARN

How AI Agents Actually Work

Not a chatbot. An agent understands, plans, executes, and delivers. Here is what happens under the hood.

TL;DR

An AI agent is an LLM that can take actions, not just generate text. You give it a task, it builds a plan, calls tools (APIs, databases, services) step by step, and delivers a completed result. The difference between a chatbot and an agent is the difference between a recipe and a chef.

What an AI Agent Is (and What It Is Not)

A chatbot takes your message and returns a text response. It is a conversational interface to a language model. You ask a question, it answers. That is the entire interaction loop.

An AI agent takes your message, interprets it as a task, and works to complete it. It can read your calendar, check a CRM, draft an email, update a spreadsheet, and send a confirmation, all from a single instruction like "Schedule a follow-up with the Johnson account for next Tuesday at 2pm."

The difference is agency: the ability to take actions in external systems, observe results, and decide what to do next. According to Anthropic's research on building effective agents, the most reliable agent architectures are not complex frameworks but simple loops where an LLM repeatedly calls tools and evaluates results until the task is done.


The 4-Phase Execution Model

Every agent task follows a consistent execution loop. Whether the agent is booking a meeting or updating a pipeline, the underlying pattern is the same:

  [User Task]
       |
       v
  Phase 1: UNDERSTANDING
  Parse the request. Identify entities,
  constraints, and required tools.
       |
       v
  Phase 2: PLANNING
  Generate a step-by-step plan.
  Output: JSON array of actions.
       |
       v
  Phase 3: EXECUTING
  Call tools sequentially.
  Each step receives the previous result.
       |
       v
  Phase 4: COMPLETE
  Compile results. Deliver summary.
  Report what was done.

Phase 1: Understanding

The agent parses your natural language input and extracts the structured information it needs: what is the task, who is involved, what are the constraints (time, format, dependencies), and which tools will be required. This is where intent recognition happens. "Move the Chen deal to negotiation and notify sales" becomes two distinct operations: a CRM update and a notification dispatch.

Phase 2: Planning

The agent generates a structured plan, typically as a JSON array of steps. Each step specifies the tool to call, the arguments to pass, and the expected output. The plan is not freeform text. It is a machine-readable execution blueprint that the runtime can validate before any tool is called.

  {
    "plan": [
      {
        "step": 1,
        "tool": "crm.updateDeal",
        "args": { "dealId": "chen-0042", "stage": "negotiation" },
        "expect": "confirmation"
      },
      {
        "step": 2,
        "tool": "notifications.send",
        "args": { "channel": "sales", "message": "Chen deal moved to negotiation" },
        "expect": "delivered"
      }
    ]
  }

Phase 3: Executing

The runtime walks through the plan and calls each tool in sequence. After each call, the result is fed back to the agent. If a step fails, the agent can retry, adjust the plan, or skip to the next step with a note about what went wrong. This observe-act loop is what separates agents from simple API scripts: the agent can adapt mid-execution.

Phase 4: Complete

Once all steps are done, the agent compiles a summary of what it accomplished. This summary typically includes which actions succeeded, which failed, and any data that was created or modified. The user sees a clear accounting of what the agent did, not a wall of raw API responses.

Understanding Planning Executing Complete

How Tool Calling Works

Tool calling is the mechanism that gives agents their agency. Without it, an LLM can only produce text. With it, the LLM can interact with databases, APIs, file systems, and any service that exposes a callable interface.

The process works like this: the agent runtime provides a list of available tools, each defined by a name, description, and input schema. When the agent decides to use a tool, it outputs a structured JSON object (not prose) specifying which tool to call and what arguments to pass. The runtime validates the call, executes it, and returns the result to the agent.

  Agent receives: "Check my calendar for Tuesday"
       |
       v
  Agent outputs tool call:
  { "tool": "calendar.getEvents", "args": { "date": "2026-04-14" } }
       |
       v
  Runtime executes: GET /calendar/events?date=2026-04-14
       |
       v
  Runtime returns result to agent:
  { "events": [{ "time": "10:00", "title": "Team standup" }, ...] }
       |
       v
  Agent processes result and decides next action

The key insight is that tool calls are sequential and observable. Each call produces a result that the agent can inspect before deciding what to do next. This is fundamentally different from a pre-programmed script, which executes a fixed sequence regardless of intermediate results. OpenAI's function calling documentation describes this as "letting the model intelligently choose to output a JSON object containing arguments to call one or many functions," and this pattern has become the standard across all major LLM providers.


Structured Output: Why JSON, Not Prose

Agents produce structured output at every stage. The plan is JSON. Tool calls are JSON. Step results are JSON. The final summary may be natural language, but every intermediate step is machine-readable.

This matters for three reasons:

  • Validation. You can check that the plan is well-formed before executing it. A JSON schema can verify that required fields are present and types are correct.
  • Debugging. When something goes wrong, you can inspect the exact tool call, arguments, and result at each step. There is no ambiguity in a JSON trace.
  • Composability. Structured output from one agent can be passed directly to another agent or system. JSON is the universal interchange format. An agent that outputs prose requires another parsing step before its results can be used programmatically.

Google DeepMind's research on structured generation for LLMs shows that constraining model output to valid JSON schemas reduces error rates and makes agent behavior more predictable, which is why production agent frameworks enforce structured output at the protocol level.


Real-World Applications

The four-phase model applies to any task that involves calling external systems. Here are the patterns that show up most in production:

  • Scheduling. Parse a meeting request, check calendar availability, find a mutual slot, create the event, send invitations. Five tool calls, fully automated.
  • CRM automation. Score an inbound lead based on firmographic data, assign it to the right rep, create a deal in the pipeline, schedule a follow-up task. The agent replaces a manual process that touches 3-4 different screens.
  • Email triage. Read incoming messages, classify by urgency and topic, draft responses for routine requests, flag complex items for human review, apply labels. An agent can process a full inbox in seconds.
  • Workflow orchestration. When a contract is signed, update the CRM, trigger onboarding, create project folders, schedule a kickoff call, and notify the team. One trigger, one agent, five systems updated.
  • Content pipelines. Research a topic, outline an article, draft sections, check facts against sources, format for publishing. Each step uses different tools (search, generation, retrieval, formatting).

The common thread is sequential tool calling with decision-making between steps. The agent does not execute a fixed script. It adapts based on what each tool returns.


See It in Action

The AI Agents demo lets you type any task and watch the agent think through its plan, call tools, and deliver results in real time. You can see each phase of the execution model as it happens.


Sources & Further Reading

  1. Anthropic: Building Effective Agents (2024). Design patterns for reliable agent architectures, augmented LLM loops, and when to use frameworks vs simple tool-calling loops.
  2. OpenAI: Function Calling Guide. The standard tool-calling protocol: how models output structured JSON to invoke functions, and how results are fed back into the conversation.
  3. Google DeepMind: Structured Generation and Constrained Decoding for LLMs (2025). Why constraining LLM output to valid schemas improves reliability in agent pipelines.
  4. Google Cloud: What Are AI Agents. Overview of agent architectures, tool use patterns, and the distinction between reactive and planning-based agents.
  5. Yao et al.: ReAct: Synergizing Reasoning and Acting in Language Models (2022). The foundational paper on interleaving reasoning traces with tool actions, which underpins modern agent execution loops.

More where that came from.

Back to all demos →