OpenExp — Q-Learning Memory for AI Agents

The real problem isn't forgetting

It's retrieving the wrong things. More context = worse accuracy. The data is there.

Context degrades with scale

GPT-4o: 99.3% → 69.7% at 32K tokens. Opus: 78.3% on MRCR v2 at 1M. Real-world degradation starts at 400K. After 600K, retrieval is unreliable. Stuffing more into the prompt makes things worse.

Vector search has no memory

Mem0, Zep, LangMem add storage. But vector similarity alone can't tell you: "last time you recalled this context, did it actually help?" Every memory is equally weighted. There's no learning signal.

No outcome signal

Did that retrieved context lead to a commit? A closed deal? A passed test? Nobody tracks the connection between retrieval and results. So the system never learns which retrieval was good.

A hippocampus for your AI agent

Write everything to the vector DB. Let Q-learning indexes decide what to retrieve.

Two principles

1. Store everything. Every action, result, decision, context. Storage is cheap. Don't filter on write — you don't know what will be useful later.

2. Retrieve what worked. Exps (experiences) are Q-learning indexes on top of the vector DB. They track: "last time this context was retrieved in a similar situation, did the session produce results?" Good context gets higher Q-values. Noise sinks.

Result: 5 precise memories instead of 200 dumped into the prompt. Less tokens, better accuracy.

# Dev: refactoring auth module. 200+ memories in DB.
# Without exps — pulls everything, context polluted.
# With exps — top 5 by Q-value:

[q=0.73] Approach X passed all tests last refactor
[q=0.65] Approach Y broke 3 tests — avoid
[q=0.42] Auth uses token refresh, 15min expiry
[q=0.12] Redis caching attempted, rolled back

# Sales: preparing outreach for fintech CTO.
# Q-values show: compliance angle closed 3 deals,
# product-speed pitch converted zero.

How it works

Four hooks fire automatically. PostToolUse writes, SessionStart reads, SessionEnd scores.

1

Observe

PostToolUse hook captures every file edit, bash command, and tool call as structured JSONL observations. Reads are filtered out.

2

Recall

SessionStart runs hybrid retrieval (5 signals) against Qdrant. Top memories are injected into context, ranked by Q-value from the active exp.

3

Reward

SessionEnd computes reward from git events (commits, PRs, tests). All memories recalled during the session get Q-value updates: Q += α · reward.

4

Extract

Opus reads the transcript via claude -p and extracts decisions and insights. Not "edited file.html" but "chose X because Y." Stored as first-class memories.

Learning timeline

Q-values start at 0.0 and diverge based on real outcomes.

1

Day 1

Observations accumulate. All Q-values at 0.0. SessionEnd ingests into Qdrant, computes first rewards from git events. No visible change yet.

5

Day 5

Q-values start diverging. Memories from productive sessions sit at q=0.1-0.2. SessionStart injection now returns ranked context instead of random.

20

Week 3

Clear signal/noise separation. Useful context at q=0.5+, noise at q<0. Decision extraction adds strategic memories. Retrieval quality visibly improves.

60

Month 2+

Run calibrate_experience_q for manual fine-tuning. Create custom exps for different workflows. Outcome resolvers (CRM CSV → rewards) close the business loop.

How it compares

Other tools add memory. OpenExp adds memory that learns from outcomes.

Feature	OpenExp	Mem0	Zep / Graphiti	LangMem
Core approach	Q-learning on retrieval outcomes	Managed memory API	Knowledge graph + temporal	LLM-managed memory
Learns from outcomes	Q-values from real results	No	No	No
Retrieval	BM25 + vector + Q-value (5 signals)	Vector similarity	Graph traversal + vector	Vector similarity
Hosting	Local only (Docker + Qdrant)	Managed cloud (easiest setup)	Cloud or self-hosted	Cloud API
UI / Dashboard	CLI + MCP tools only	Web dashboard	Web UI	No
Claude Code integration	Native hooks, zero-config	Manual MCP setup	Manual MCP setup	Manual MCP setup
Domain-specific profiles	Exps (reward weights, pipelines)	No	No	No
Privacy	All data stays local	Data on their servers	Depends on deployment	Data on their servers

Mem0 and Zep are great if you want managed infrastructure. OpenExp is for people who want their memory system to actually learn from outcomes.

Exps: Q-learning indexes for your domain

An exp defines what "productive" means. Different work → different reward signals → different Q-value trajectories.

default

Optimized for coding workflows. Commits and PRs are the primary success signals. Good session: edit files, commit, open PR = +0.42 reward. Empty session: just read files = -0.20.

Pipeline: backlog → in_progress → review → merged → deployed

commit+0.30
pr+0.20
deploy+0.10
tests+0.10
decisions+0.10
base (every session)-0.10

default.yaml

# .openexp.yaml in your project experience: default # Session ends: edit + commit + PR reward = +0.30 + 0.20 + 0.02*3 = +0.56 # Recalled memories get Q-value boost # Next session: these memories rank higher

sales

Optimized for outreach, follow-ups, and deal progression. Rewards decisions and insights, not raw tool actions. The system learns which email strategies and talking points lead to closed deals.

Pipeline: lead → contacted → qualified → proposal → negotiation → won

decision+0.25
email_sent+0.15
follow_up+0.15
proposal+0.20
deal_won (outcome)+0.80
base (every session)-0.10

sales.yaml

# Outcome-based rewards: # CRM deal moves negotiation -> won add_memory( content="Client prefers Google stack", client_id="comp-acme" ) ... weeks of work ... resolve_outcomes() # -> memories tagged comp-acme get +0.8

dealflow

Full deal lifecycle from discovery to payment. Tracks proposals, invoices, and payment events. Learns which discovery approaches and pricing strategies convert best.

Pipeline: lead → discovery → nda → proposal → negotiation → invoice → paid

proposal_sent+0.20
invoice_created+0.15
payment_received+0.30
deal_won (outcome)+0.80
decisions+0.15
base (every session)-0.10

dealflow.yaml

# Create your own experience: openexp experience create # Pick a process type (dev/sales/support) # Customize stages, signal weights # Memory type filters # Or set via env: export OPENEXP_EXPERIENCE=dealflow

Create Your Own

Define what "productive" means for your specific workflow. An Experience is a YAML file that controls how OpenExp scores sessions and which memories get rewarded.

Three ways to activate:

1. Drop .openexp.yaml in your project root with experience: my-exp
2. Set env var: export OPENEXP_EXPERIENCE=my-exp
3. Auto-detect: OpenExp reads your prompts and picks the right experience

~/.openexp/experiences/support.yaml

# Example: Customer Support experience name: support description: Customer support workflows # What counts as productive? session_reward_weights: ticket_resolved: 0.30 response_sent: 0.20 escalation: -0.10 decision: 0.15 base: -0.10 # Pipeline stages (for tracking) pipeline: - new - investigating - responded - resolved - closed # Only reward these memory types reward_memory_types: - decision - insight - action

5-signal hybrid retrieval

Not just vector similarity. Five weighted signals, configurable per exp.

30%

Vector

Semantic similarity via FastEmbed (BAAI/bge-small, 384d)

10%

BM25

Keyword match for exact term recall

15%

Recency

Exponential decay, 90-day half-life

15%

Importance

Type-weighted: decisions > insights > actions

30%

Q-Value

Learned usefulness from real outcomes

Plus 10% epsilon-greedy exploration — occasionally surfaces low-Q memories to prevent premature convergence.

Local-First. Private by Default.

Your data never leaves your machine. No cloud APIs required for core functionality.

Local Embeddings

FastEmbed runs BAAI/bge-small-en-v1.5 locally. No API key needed. 384-dimension vectors, free forever.

Local Vector DB

Qdrant runs as a Docker container on your machine. All memories, Q-values, and observations stay local.

Optional LLM

Anthropic API key is optional. Without it, memories get default metadata. With it, auto-enrichment adds type, tags, and validity windows.

Quick Start

Three commands. Open Claude Code — it now has memory.

Python 3.11+ Docker (Qdrant runs as a single ~200MB container) jq Claude Code

# 1. Clone and install
$ git clone https://github.com/anthroos/openexp.git
$ cd openexp
$ ./setup.sh

✓ Python venv created
✓ Dependencies installed
✓ Qdrant container started (Docker)
✓ Hooks registered in Claude Code
✓ MCP server configured

# 2. Verify it works
$ openexp stats
Memories: 0 | Q-cache: 0 entries | Qdrant: connected

# 3. Open Claude Code in any project
$ claude
OpenExp is now observing. Work normally — it learns in the background.

# Optional: set an experience for non-coding work
$ echo "experience: sales" > .openexp.yaml
# No API key needed for core functionality.

16 MCP tools

Works in Claude Code, Cursor, or any MCP client. Hooks run the learning loop; tools let you inspect and steer.

Memory

search_memory
Hybrid BM25 + vector + Q-value retrieval
add_memory
Store with auto-enrichment (type, tags, validity)
get_agent_context
Full context for a specific agent/session
reflect
Trigger self-reflection on recent work

Q-Learning

explain_q
Why a memory has its Q-value (full trace)
reward_detail
Reward breakdown for a specific memory
memory_reward_history
Full reward timeline for a memory
calibrate_experience_q
Manually boost/penalize Q-values
reload_q_cache
Reload Q-cache from disk

Experiences

experience_info
Active exp config, weights, pipeline stages
experience_insights
Reward distribution, learning velocity
experience_top_memories
Highest-Q memories for an exp
memory_stats
Collection stats, Q-cache summary

Outcomes

log_prediction
Record a prediction with confidence
log_outcome
Record actual outcome, compute accuracy
resolve_outcomes
CRM stage changes → Q-value rewards

# Same operations via CLI
$ openexp search -q "auth flow" -n 5
$ openexp stats
$ openexp ingest --dry-run
$ openexp resolve
$ openexp experience list
$ openexp compact --dry-run

Frequently Asked Questions

How is this different from Mem0?

+

Different tools for different problems. Mem0 is a managed memory API — polished, cloud-hosted, has a web dashboard, easy setup. If you want to add basic memory to an app without running infrastructure, Mem0 is solid.

OpenExp solves a different problem: it tracks whether retrieved memories actually helped (via Q-learning from real outcomes like commits, PRs, closed deals). It's local-first, requires Docker, has no UI — CLI and MCP tools only. Choose it if you care about retrieval quality improving over time, not just storage.

Do I need an API key?

+

No. Core functionality works without any API key. Embeddings run locally via FastEmbed. Qdrant runs as a Docker container. The full learning loop — observe, remember, reward, improve — is completely free.

An Anthropic API key is optional — it enables auto-enrichment (type classification, tags, validity windows). Decision extraction uses claude -p (Claude Code pipe mode), which runs on your Max subscription at zero API cost. Everything works great without an API key.

What is a Q-value?

+

A learned quality score per memory, range [-0.5, 1.0], init 0.0. Update: Q_new = Q_old + α · reward (α=0.25). Reward comes from git events (commits, PRs, tests) weighted by the active exp.

Three layers: action (50%) — did it help get work done; hypothesis (20%) — was the content accurate; fit (30%) — was it relevant to the situation. Negative rewards are discounted on the fit layer (50%) so good-fit memories aren't killed by one bad session.

What are Experiences?

+

An Experience tells OpenExp what "productive" means for your type of work. The default one rewards coding: commits, PRs, test runs. But if you're doing sales, a "productive session" means emails sent, follow-ups done, deals progressed — not git commits.

Three come built-in (default, sales, dealflow). You can create your own by writing a simple YAML file — define what actions count as success, and OpenExp optimizes for that.

Does it work with Cursor / other AI editors?

+

OpenExp is built as an MCP server, so any MCP-compatible client can use it. Claude Code has native hook integration (zero config). For Cursor and other editors that support MCP, you add OpenExp as an MCP tool provider.

The core learning loop (Q-value updates, hybrid search, outcome rewards) works regardless of which client you use.

Where this is going

OpenExp is a database and annotation layer today. Tomorrow it's a training pipeline.

Now: Hippocampus

Store everything in a vector DB. Q-learning indexes (exps) score each memory based on real outcomes. Hybrid retrieval surfaces what worked. The system accumulates experience automatically.

Next: Curated training data

Q-values are labels. High-Q memories = high-quality training examples. Low-Q = noise. OpenExp becomes an automatic annotation pipeline for fine-tuning local models — no manual labeling needed.

Future: Personal AI that learns

Fine-tunable local models (on-device, private) trained on Q-curated experience. Not RAG — the model knows, not searches. Online learning: every session updates the model. System 2 → System 1 compression.

Store everything.Retrieve what worked.