Overview
Context engineering addresses three core challenges:- Token Limit Management - Prevent exceeding model’s maximum input tokens
- Relevant Context Retention - Keep recent, important information while discarding noise
- Context Isolation - Separate main agent and sub-agent conversation contexts
Automatic Summarization
Configuration
Defined inconfig.yaml under summarization key:
Trigger Types
Summarization activates when any trigger threshold is met:1. Fraction Trigger
- Calculates current token count via model’s tokenizer
- Compares against model’s
max_input_tokensconfig - Triggers at
0.8 * max_input_tokens
2. Tokens Trigger
- Counts tokens in message history
- Triggers when count exceeds 4000
3. Messages Trigger
- Counts messages in conversation
- Triggers at 50 messages
Keep Policies
After summarization triggers, the keep policy determines how much recent context to preserve.Messages Keep
- Keeps most recent 20 messages verbatim
- Summarizes everything before that
- Final history:
[SystemMessage(summary), ...last 20 messages]
Tokens Keep
- Calculates token count backwards from last message
- Keeps messages until total reaches ~3000 tokens
- Summarizes remainder
Fraction Keep
- Calculates
0.3 * model.max_input_tokens - Keeps that many tokens from end of history
Model Selection
null(default): Uses lightweight model viacreate_chat_model(thinking_enabled=False)- Model name string: Uses specified model from
config.yamlmodels list
Implementation
Summarization handled by LangChain’sSummarizationMiddleware, configured in backend/src/agents/lead_agent/agent.py:41-80:
Summarization Process
-
Trigger Check (
before_model):- Count current tokens/messages
- Compare against all trigger thresholds
- If any threshold met, proceed to step 2
-
Message Preparation:
- Split history into
to_summarizeandto_keep - Trim
to_summarizetotrim_tokens_to_summarizetokens (default 4000) - This prevents overwhelming summarization model
- Split history into
-
Summary Generation:
- Invoke model with
to_summarizemessages - Default prompt: “Summarize the following conversation concisely”
- Custom prompt via
summary_promptconfig
- Invoke model with
-
History Reconstruction:
- Create
SystemMessagewith summary - Append
to_keepmessages (recent context) - Replace state’s message history
- Create
Memory Injection
DeerFlow’s memory system complements summarization by injecting persistent facts into every turn.Memory Structure
Stored inbackend/.deer-flow/memory.json:
Injection Process
Location: System prompt template inbackend/src/agents/lead_agent/prompt.py
backend/src/agents/memory/updater.py):
Memory Configuration
max_injection_tokens: 2000ensures memory doesn’t dominate prompt- Priority: User context > Recent history > Top 15 facts (by confidence)
- If exceeds budget, truncates oldest/lowest-confidence facts first
Memory Update Flow
-
Queue (
MemoryMiddleware.after_agent): -
Debounce (
MemoryQueue):- Waits
debounce_seconds(default 30s) - Batches multiple turns if conversation continues
- Deduplicates per-thread updates
- Waits
-
Extract (
MemoryUpdater):- Invokes LLM with conversation history
- Extracts new facts, updates context summaries
- Assigns confidence scores (0-1)
-
Persist (Atomic file I/O):
-
Inject (Next turn):
- Load memory from
storage_path - Format for injection (trim to
max_injection_tokens) - Insert into system prompt
<memory>tags
- Load memory from
Context Isolation for Sub-Agents
Sub-agents run in completely isolated contexts to prevent pollution of main conversation.Motivation
Problem: Without isolation, sub-agent’s exploration pollutes main context:Implementation
Task Tool (backend/src/tools/builtins/task_tool.py:28-78):
backend/src/subagents/executor.py:200-250):
Context Sharing
While conversation context is isolated, file system access is shared: Shared:- File system (via
thread_idfrom parent) - Sandbox environment
- Physical directories:
backend/.deer-flow/threads/{parent_thread_id}/
- Message history
- State (artifacts, todos, viewed_images)
- LLM context window
- Checkpoints (separate thread IDs)
Artifact Transfer
Sub-agent artifacts can be inherited by main agent:Token Accounting
Counting Tokens
DeerFlow uses model-specific tokenizers:Token Budget Breakdown
Typical token distribution for 8K context model:- Set
summarization.trigger.value: 0.6(60% threshold) - Set
summarization.keep.value: 0.4(keep 40% = 3,200 tokens) - Set
memory.max_injection_tokens: 500(6% of total)
Performance Impact
Summarization:- Additional LLM call: ~1-2s latency
- Cost: ~$0.01 per summarization (with GPT-4o-mini)
- Frequency: Every 50 messages or 80% token limit
- Negligible latency (cached, loaded from disk)
- Adds ~500 tokens to every request
- Cost: ~2/M tokens)
- No additional token cost (separate contexts)
- Storage cost: Separate checkpoint per sub-agent thread
- Cleanup: Periodic pruning of old sub-agent threads
Best Practices
1. Choose Appropriate Triggers
2. Balance Keep Policy
3. Memory Token Budget
4. Sub-Agent Usage
Use sub-agents when:- Task requires extensive exploration (e.g., “analyze this codebase”)
- Output is verbose (e.g., “run linter on all files”)
- Want to isolate errors (e.g., “try multiple approaches until one works”)
- Task is simple (e.g., “read this file”)
- Need tight coordination (e.g., “implement feature and write tests together”)
- Context sharing critical (e.g., “continue from where we left off”)
Monitoring & Debugging
Enable Debug Logging
Summarization Events
Memory Events
Sub-Agent Events
See Also
- Middleware Chain - How summarization middleware fits into pipeline
- Thread State - State management and persistence
- Memory System - Memory extraction and injection details