Skip to main content
DeerFlow’s memory system provides persistent, long-term memory across conversations, learning about you and adapting over time.

Overview

Most agents forget everything when a conversation ends. DeerFlow remembers:
  • User Context: Your work, preferences, and habits
  • Conversation History: Recent and historical interactions
  • Extracted Facts: Discrete facts with confidence scores
Storage: backend/.deer-flow/memory.json
Memory is stored locally and stays under your control. No data is sent to external services.

Memory Structure

Memory is organized into three main sections:

User Context

Current state and preferences:
{
  "userContext": {
    "workContext": "Software engineer at TechCorp working on microservices",
    "personalContext": "Interested in AI/ML, prefers Python over JavaScript",
    "topOfMind": "Currently building a new API gateway service"
  }
}

History

Temporal context organization:
{
  "history": {
    "recentMonths": "Worked on authentication system (Jan-Feb 2026)",
    "earlierContext": "Led migration to Kubernetes (Q4 2025)",
    "longTermBackground": "Joined TechCorp in 2023, previously at StartupXYZ"
  }
}

Facts

Discrete, scored knowledge:
{
  "facts": [
    {
      "id": "fact-1",
      "content": "Uses VS Code as primary editor",
      "category": "preference",
      "confidence": 0.9,
      "createdAt": "2026-03-01T10:00:00Z",
      "source": "thread-abc123"
    },
    {
      "id": "fact-2",
      "content": "Prefers test-driven development",
      "category": "behavior",
      "confidence": 0.85,
      "createdAt": "2026-03-02T14:30:00Z",
      "source": "thread-xyz789"
    }
  ]
}

How Memory Works

1. Conversation Capture

MemoryMiddleware filters relevant messages:
class MemoryMiddleware:
    async def after_agent(self, state: ThreadState):
        # Filter to user inputs and final AI responses
        relevant_messages = [
            msg for msg in state["messages"]
            if isinstance(msg, (HumanMessage, AIMessage))
            and not msg.tool_calls  # Exclude tool-calling messages
        ]
        
        # Queue for memory update
        memory_queue.add(state["thread_id"], relevant_messages)

2. Debounced Updates

Memory updates are debounced to batch changes:
class MemoryQueue:
    def __init__(self, debounce_seconds=30):
        self.debounce_seconds = debounce_seconds
        self.queue = {}
    
    def add(self, thread_id, messages):
        # Add to queue
        self.queue[thread_id] = messages
        
        # Wait for debounce period
        await asyncio.sleep(self.debounce_seconds)
        
        # Process update
        await self.process_update(thread_id)
Default: 30 seconds after conversation ends

3. Fact Extraction

An LLM extracts facts from the conversation:
async def extract_facts(messages: list) -> list[dict]:
    prompt = f"""
    Extract key facts from this conversation.
    
    For each fact, provide:
    - content: The fact itself
    - category: preference|knowledge|context|behavior|goal
    - confidence: 0.0-1.0 (how certain are you?)
    
    Conversation:
    {messages}
    """
    
    response = await llm.ainvoke(prompt)
    return parse_facts(response)
Confidence Scoring:
  • 0.9-1.0: Explicit statements (“I prefer X”)
  • 0.7-0.9: Strong inference (“I always use X”)
  • 0.5-0.7: Weak inference (“I might use X”)
  • Below 0.5: Discarded

4. Memory Storage

Facts are merged with existing memory:
async def update_memory(new_facts: list):
    # Load existing memory
    memory = load_memory()
    
    # Merge new facts (deduplicate by content similarity)
    memory["facts"] = merge_facts(memory["facts"], new_facts)
    
    # Prune if exceeding max_facts
    if len(memory["facts"]) > config.max_facts:
        memory["facts"] = prune_facts(memory["facts"], config.max_facts)
    
    # Atomic write (temp file + rename)
    save_memory(memory)

5. Context Injection

On the next conversation, memory is injected into the system prompt:
def apply_prompt_template(config: RunnableConfig) -> str:
    memory = load_memory()
    
    # Select top facts by confidence
    top_facts = sorted(
        memory["facts"],
        key=lambda f: f["confidence"],
        reverse=True
    )[:15]
    
    prompt = f"""
    <memory>
    User Context:
    - {memory["userContext"]["workContext"]}
    - {memory["userContext"]["personalContext"]}
    
    Key Facts:
    {"\n".join(f"- {fact['content']} (confidence: {fact['confidence']})" for fact in top_facts)}
    </memory>
    """
    
    return prompt

Configuration

Memory is configured in config.yaml:
memory:
  enabled: true                          # Master switch
  storage_path: .deer-flow/memory.json  # Relative to backend/
  debounce_seconds: 30                  # Wait before processing
  model_name: null                       # LLM for extraction (null = default)
  max_facts: 100                        # Maximum facts to store
  fact_confidence_threshold: 0.7        # Minimum confidence to store
  injection_enabled: true               # Inject into prompts
  max_injection_tokens: 2000            # Token limit for injection

Memory Configuration

Detailed configuration options

Fact Categories

Facts are categorized for organization:
User preferences and likes/dislikes.Examples:
  • “Prefers Python over JavaScript”
  • “Uses VS Code as primary editor”
  • “Likes dark themes”
User’s expertise and knowledge areas.Examples:
  • “Expert in distributed systems”
  • “Familiar with Kubernetes”
  • “Knows React and Next.js”
Current situation and environment.Examples:
  • “Works at TechCorp as senior engineer”
  • “Based in San Francisco”
  • “Team size is 8 engineers”
How the user works and makes decisions.Examples:
  • “Prefers test-driven development”
  • “Always writes documentation first”
  • “Uses git rebase instead of merge”
User’s objectives and aspirations.Examples:
  • “Learning Rust for systems programming”
  • “Building a SaaS product”
  • “Planning to migrate to microservices”

Memory API

Manage memory via the Gateway API:

Get Memory

curl http://localhost:8001/api/memory

Reload Memory

curl -X POST http://localhost:8001/api/memory/reload

Get Configuration

curl http://localhost:8001/api/memory/config

Memory API Reference

Complete API documentation

Python Client

Access memory programmatically:
from src.client import DeerFlowClient

client = DeerFlowClient()

# Get memory data
memory = client.get_memory()
print(memory["userContext"])
print(memory["facts"])

# Reload from disk
client.reload_memory()

# Get configuration
config = client.get_memory_config()
print(config["max_facts"])

Best Practices

The more explicit you are, the higher the confidence:Good: “I prefer Python because it’s more readable”Bad: “I guess Python is okay”
If the agent has wrong information, correct it:“Actually, I don’t use VS Code anymore. I switched to Neovim last month.”The system will update or replace the fact.
Check memory data for accuracy:
curl http://localhost:8001/api/memory | jq '.facts'
Manually edit backend/.deer-flow/memory.json if needed.
If you get too many low-quality facts:
memory:
  fact_confidence_threshold: 0.8  # Raise threshold

Memory Privacy

Memory is completely private:
  • Stored locally in backend/.deer-flow/memory.json
  • Never sent to external services
  • Only used for prompt injection in your own agent
  • Fully under your control
To delete memory:
rm backend/.deer-flow/memory.json
The system will create a new, empty memory file.

Troubleshooting

Check that memory is enabled:
memory:
  enabled: true
Verify the storage path exists:
ls backend/.deer-flow/memory.json
Raise the confidence threshold:
memory:
  fact_confidence_threshold: 0.8
Check injection is enabled:
memory:
  injection_enabled: true
Verify facts exist:
curl http://localhost:8001/api/memory | jq '.facts | length'

Next Steps

Memory Configuration

Configure memory system

Memory API

API reference

Context Engineering

How memory injection works

Agent System

Learn about the agent