<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Context Engineering |</title><link>https://leminhnguyen.github.io/tags/context-engineering/</link><atom:link href="https://leminhnguyen.github.io/tags/context-engineering/index.xml" rel="self" type="application/rss+xml"/><description>Context Engineering</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 07 Apr 2026 00:00:00 +0000</lastBuildDate><image><url>https://leminhnguyen.github.io/media/icon_hu_702a800cd775dbac.png</url><title>Context Engineering</title><link>https://leminhnguyen.github.io/tags/context-engineering/</link></image><item><title>[Ch 2] AI Agent Components &amp; Context Engineering</title><link>https://leminhnguyen.github.io/blog/blog-tutorial/enterprise-ai-agent/ch2-components-context-engineering/</link><pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate><guid>https://leminhnguyen.github.io/blog/blog-tutorial/enterprise-ai-agent/ch2-components-context-engineering/</guid><description>&lt;div style="text-align: justify; font-size: 15px; margin-top: 20px"&gt;
&lt;p&gt;Knowing that an agent has a &amp;ldquo;loop&amp;rdquo; is not enough to build a good one. You need to understand what the system is made of — and exactly what the LLM sees at every step. This chapter covers both.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-four-core-components"&gt;The Four Core Components&lt;/h2&gt;
&lt;p&gt;Every AI agent system, regardless of framework, is built from four components:&lt;/p&gt;
&lt;div class="mermaid"&gt;graph LR
A[🧠 LLM Backbone] --- B[🔧 Tools]
A --- C[🗃️ Memory]
A --- D[🗺️ Planning]
style A fill:#7C3AED,color:#fff,stroke:none
style B fill:#2563EB,color:#fff,stroke:none
style C fill:#059669,color:#fff,stroke:none
style D fill:#D97706,color:#fff,stroke:none
&lt;/div&gt;
&lt;figcaption style="text-align: center; font-size: 0.9em; color: #666; margin-top: -0.5rem; margin-bottom: 2rem;"&gt;Fig 1: The four pillars of an AI agent&lt;/figcaption&gt;
&lt;h3 id="1-llm-backbone"&gt;1. LLM Backbone&lt;/h3&gt;
&lt;p&gt;The language model is the brain. It decides what to do next: whether to call a tool, what arguments to pass, and when to stop. The quality, speed, and cost of your agent depend heavily on which model you choose.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key decisions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model choice&lt;/strong&gt;: GPT-4o for complex reasoning; GPT-4o-mini for simpler, high-throughput tasks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Temperature&lt;/strong&gt;: Keep it low (0–0.3) for tool-calling agents — you want deterministic decisions, not creative variance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context window size&lt;/strong&gt;: Determines how much history and retrieved content the model can &amp;ldquo;see&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-tools"&gt;2. Tools&lt;/h3&gt;
&lt;p&gt;Tools are functions the LLM can call to interact with the outside world. Common categories:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Information retrieval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web search, vector store lookup, database query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Action execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;File write, API call, form submission&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Computation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code execution, math evaluation, data parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Communication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Send email, post to Slack, create a ticket&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A tool is just a function with a clear name, description, and typed parameters — the model reads the description to know when and how to use it.&lt;/p&gt;
&lt;h3 id="3-memory"&gt;3. Memory&lt;/h3&gt;
&lt;p&gt;Memory is how the agent maintains context across steps and sessions. There are four distinct memory types (covered in depth in Ch 6):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;In-context&lt;/strong&gt;: The current conversation history in the message list&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Episodic&lt;/strong&gt;: Persisted conversation state (checkpointer / DB)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Semantic&lt;/strong&gt;: Long-term knowledge retrieved via RAG (vector store)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Procedural&lt;/strong&gt;: The system prompt — &amp;ldquo;how to behave&amp;rdquo; instructions baked into every request&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="4-planning"&gt;4. Planning&lt;/h3&gt;
&lt;p&gt;Planning is the agent&amp;rsquo;s ability to decompose a complex goal into steps. This ranges from implicit (the LLM decides step-by-step via ReAct) to explicit (a dedicated &amp;ldquo;planner&amp;rdquo; LLM call generates a structured plan that an &amp;ldquo;executor&amp;rdquo; then follows).&lt;/p&gt;
&lt;p&gt;For most enterprise agents, the &lt;strong&gt;implicit ReAct planning&lt;/strong&gt; is sufficient — and simpler to debug. Explicit multi-step planners are powerful but add complexity; reach for them only when the task requires long-horizon coordination across many sub-tasks.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="prompt-engineering-vs-context-engineering"&gt;Prompt Engineering vs. Context Engineering&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s a distinction that will save you weeks of frustration in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prompt Engineering&lt;/strong&gt; is about &lt;em&gt;what words you use&lt;/em&gt; — how to phrase instructions to get the model to behave the way you want. It&amp;rsquo;s the craft of writing good system prompts, user messages, and few-shot examples.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context Engineering&lt;/strong&gt; is about &lt;em&gt;what the model actually sees&lt;/em&gt; — managing everything that occupies the context window at each step of the agent loop.&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Context Engineering is a superset of Prompt Engineering.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In a single LLM API call, you have to write a good prompt. In an agent system, you have to do that &lt;em&gt;and&lt;/em&gt; manage every byte in the context window across potentially dozens of steps.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="anatomy-of-a-context-window"&gt;Anatomy of a Context Window&lt;/h2&gt;
&lt;p&gt;Every time the LLM is called, it receives a context window. In an agent loop, that window contains multiple layers:&lt;/p&gt;
&lt;div class="mermaid"&gt;block-beta
columns 1
block:context["Context Window (e.g. 128,000 tokens)"]
A["📋 System Prompt\n(instructions, persona, output format)"]
B["🔧 Tool Schemas\n(name, description, parameters for each tool)"]
C["💬 Conversation History\n(all prior messages + tool results)"]
D["📚 Retrieved Documents\n(RAG chunks injected for this turn)"]
E["❓ Current User Message"]
end
style A fill:#7C3AED,color:#fff
style B fill:#2563EB,color:#fff
style C fill:#059669,color:#fff
style D fill:#D97706,color:#fff
style E fill:#DC2626,color:#fff
&lt;/div&gt;
&lt;figcaption style="text-align: center; font-size: 0.9em; color: #666; margin-top: 0.5rem; margin-bottom: 2rem;"&gt;Fig 2: Everything competing for space in the context window&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;The crucial insight:&lt;/strong&gt; tools eat tokens even when they&amp;rsquo;re not used.&lt;/p&gt;
&lt;p&gt;A typical tool schema (name + description + parameter definitions) costs 100–300 tokens. If your agent has 15 tools, that&amp;rsquo;s 1,500–4,500 tokens consumed before any conversation history or retrieved documents. For an agent with a 32K token budget running a long conversation, this becomes a real constraint.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="token-budget-a-practical-example"&gt;Token Budget: A Practical Example&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s measure the real cost of each context component:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# token_budget.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;tiktoken&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# GPT-4o uses the o200k_base encoding&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;ENCODING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tiktoken&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_encoding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;o200k_base&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ENCODING&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# --- Simulate context components ---&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;You are an expert document analyst assistant. You help users
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;understand and extract information from complex technical documents. You are precise,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;thorough, and always cite your sources. When a document is provided, read it carefully
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;before answering. Format your responses in clear markdown.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;TOOL_SCHEMA_EXAMPLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;Tool: search_documents
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;Description: Search through the document knowledge base for relevant passages.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;Parameters:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; - query (string, required): The search query to find relevant document sections
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; - max_results (integer, optional): Maximum number of results to return (default: 5)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; - document_filter (string, optional): Filter results to a specific document by name&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Simulate 10 back-and-forth turns&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;CONVERSATION_HISTORY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;User: Can you find all mentions of the payment terms in the contract?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;Assistant: [calls search_documents(query=&amp;#34;payment terms&amp;#34;)]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;Tool Result: Found 3 sections mentioning payment terms: Section 4.2, 7.1, and Appendix B.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;Assistant: I found payment terms in three locations. Section 4.2 covers...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;[... 9 more turns ...]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# multiply to simulate a longer conversation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;RETRIEVED_DOCS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;[Retrieved chunk 1 - Section 4.2]: Payment shall be made within 30 days of invoice...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;[Retrieved chunk 2 - Section 7.1]: Late payment penalties apply at 1.5% per month...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;[Retrieved chunk 3 - Appendix B]: Extended payment plans available upon request...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="c1"&gt;# 5 retrieved chunks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;CURRENT_QUERY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;What is the penalty for paying 60 days late?&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# --- Count and report ---&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;components&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;System Prompt&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Tool Schema (x10 tools estimate)&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TOOL_SCHEMA_EXAMPLE&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Conversation History (10 turns)&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CONVERSATION_HISTORY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Retrieved Documents (5 chunks)&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RETRIEVED_DOCS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Current User Query&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CURRENT_QUERY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Component&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;40&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Tokens&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;8&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;% o&lt;/span&gt;&lt;span class="s1"&gt;f 32K&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;10&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;-&amp;#34;&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;62&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;components&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;40&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;8,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;9.1f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;%&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;-&amp;#34;&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;62&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;TOTAL&amp;#39;&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;40&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;8,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;9.1f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;%&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Remaining budget (32K model): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32768&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; tokens&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;Remaining budget (128K model): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;131072&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; tokens&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Running this produces output like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Component&lt;/span&gt; &lt;span class="n"&gt;Tokens&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;--------------------------------------------------------------&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;System&lt;/span&gt; &lt;span class="n"&gt;Prompt&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Tool&lt;/span&gt; &lt;span class="n"&gt;Schema&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x10&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="n"&gt;estimate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;430&lt;/span&gt; &lt;span class="mf"&gt;4.4&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Conversation&lt;/span&gt; &lt;span class="n"&gt;History&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="n"&gt;turns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;710&lt;/span&gt; &lt;span class="mf"&gt;5.2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Retrieved&lt;/span&gt; &lt;span class="n"&gt;Documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mi"&gt;975&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Current&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="n"&gt;Query&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;--------------------------------------------------------------&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;TOTAL&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;183&lt;/span&gt; &lt;span class="mf"&gt;12.8&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Remaining&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;585&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;Remaining&lt;/span&gt; &lt;span class="n"&gt;budget&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mi"&gt;126&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;889&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Ten turns into a conversation with a moderate number of tools and RAG chunks, you&amp;rsquo;ve used only 13% of a 32K window — but that grows fast. At 30+ turns with larger documents, you&amp;rsquo;ll hit limits and need to trim.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="context-engineering-strategies"&gt;Context Engineering Strategies&lt;/h2&gt;
&lt;p&gt;Once you understand what&amp;rsquo;s in the context window, you can manage it deliberately:&lt;/p&gt;
&lt;h3 id="1-system-prompt--keep-it-focused"&gt;1. System Prompt — Keep It Focused&lt;/h3&gt;
&lt;p&gt;The system prompt runs on every single call. Every unnecessary sentence costs tokens across your entire agent&amp;rsquo;s lifetime. Write precisely:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# ❌ Vague, padded&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;BAD_SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;You are a very helpful and friendly AI assistant. You are smart and knowledgeable
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;about many things. You always try to be nice and helpful to users. When users ask
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;questions, you do your best to answer them accurately and helpfully...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# ✅ Precise, structured&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;GOOD_SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;You are a document analysis assistant.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;CAPABILITIES:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;- Search documents using search_documents tool
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;- Extract and cite specific passages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;- Summarize findings in structured markdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;CONSTRAINTS:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;- Only answer questions about documents in the knowledge base
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;- Always cite the source section when quoting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;- If information is not found, say so explicitly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="2-tool-schemas--prune-unused-tools"&gt;2. Tool Schemas — Prune Unused Tools&lt;/h3&gt;
&lt;p&gt;Don&amp;rsquo;t pass every tool on every call. If your agent has a &amp;ldquo;create_document&amp;rdquo; tool but the user just asked a read-only question, remove it from the current call. In LangGraph, you can swap the tool list dynamically per node.&lt;/p&gt;
&lt;h3 id="3-conversation-history--trim-aggressively"&gt;3. Conversation History — Trim Aggressively&lt;/h3&gt;
&lt;p&gt;Keep a sliding window of the last N turns, or summarize old turns into a single &amp;ldquo;context block.&amp;rdquo; We&amp;rsquo;ll implement this in Ch 6.&lt;/p&gt;
&lt;h3 id="4-rag-chunks--quality-over-quantity"&gt;4. RAG Chunks — Quality Over Quantity&lt;/h3&gt;
&lt;p&gt;Don&amp;rsquo;t inject 20 retrieved chunks &amp;ldquo;just in case.&amp;rdquo; Use a relevance threshold: only inject chunks above a similarity score of 0.75. Three highly relevant chunks beat ten mediocre ones — and cost far less context.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-mental-model"&gt;The Mental Model&lt;/h2&gt;
&lt;p&gt;Think of the context window as a &lt;strong&gt;whiteboard&lt;/strong&gt; that the LLM reads before every response. You control everything on that whiteboard:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What instructions appear (system prompt)&lt;/li&gt;
&lt;li&gt;What tools are available (tool schemas)&lt;/li&gt;
&lt;li&gt;What happened before (conversation history)&lt;/li&gt;
&lt;li&gt;What facts are relevant (retrieved documents)&lt;/li&gt;
&lt;li&gt;What the user just asked (current message)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Context Engineering is the discipline of making every token on that whiteboard count.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="envexample"&gt;.env.example&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# .env.example&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-api-key-here
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;💡 &lt;strong&gt;Ollama note:&lt;/strong&gt; The token counting above uses &lt;code&gt;tiktoken&lt;/code&gt; (OpenAI&amp;rsquo;s tokenizer). Ollama models use different tokenizers (Llama uses &lt;code&gt;sentencepiece&lt;/code&gt;). Token counts will vary, but the principles — keep system prompts tight, prune unused tools, trim history — apply universally.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Key Takeaway&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Backbone&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The decision-maker; choose model + temperature carefully&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tools&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Functions the LLM can call; each schema costs tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Four types: in-context, episodic, semantic, procedural&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Planning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Implicit (ReAct) vs. explicit; start with implicit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prompt Engineering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Craft of writing good instructions and examples&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context Engineering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managing &lt;em&gt;everything&lt;/em&gt; in the context window across all steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Token Budget&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;System prompt + tool schemas + history + RAG + query = limited budget&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In the next chapter, we get hands-on with LangChain and LangGraph — the frameworks that automate the agent loop we&amp;rsquo;ve been building mentally.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;
|
&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;</description></item></channel></rss>