Open Source · Local-First · MIT Licensed
Store everything.
Retrieve what worked.
More context doesn't mean better results. GPT-4o drops from 99% to 70% accuracy at 32K tokens. Opus loses 1 in 5 facts at 1M. OpenExp flips the approach: write everything to a vector DB, then use Q-learning indexes (exps) to surface only what actually helped in similar situations.
The real problem isn't forgetting
It's retrieving the wrong things. More context = worse accuracy. The data is there.
Context degrades with scale
GPT-4o: 99.3% → 69.7% at 32K tokens. Opus: 78.3% on MRCR v2 at 1M. Real-world degradation starts at 400K. After 600K, retrieval is unreliable. Stuffing more into the prompt makes things worse.
Vector search has no memory
Mem0, Zep, LangMem add storage. But vector similarity alone can't tell you: "last time you recalled this context, did it actually help?" Every memory is equally weighted. There's no learning signal.
No outcome signal
Did that retrieved context lead to a commit? A closed deal? A passed test? Nobody tracks the connection between retrieval and results. So the system never learns which retrieval was good.
A hippocampus for your AI agent
Write everything to the vector DB. Let Q-learning indexes decide what to retrieve.
Two principles
1. Store everything. Every action, result, decision, context. Storage is cheap. Don't filter on write — you don't know what will be useful later.
2. Retrieve what worked. Exps (experiences) are Q-learning indexes on top of the vector DB. They track: "last time this context was retrieved in a similar situation, did the session produce results?" Good context gets higher Q-values. Noise sinks.
Result: 5 precise memories instead of 200 dumped into the prompt. Less tokens, better accuracy.
# Without exps — pulls everything, context polluted.
# With exps — top 5 by Q-value:
[q=0.73] Approach X passed all tests last refactor
[q=0.65] Approach Y broke 3 tests — avoid
[q=0.42] Auth uses token refresh, 15min expiry
[q=0.12] Redis caching attempted, rolled back
# Sales: preparing outreach for fintech CTO.
# Q-values show: compliance angle closed 3 deals,
# product-speed pitch converted zero.
How it works
Four hooks fire automatically. PostToolUse writes, SessionStart reads, SessionEnd scores.
Observe
PostToolUse hook captures every file edit, bash command, and tool call as structured JSONL observations. Reads are filtered out.
Recall
SessionStart runs hybrid retrieval (5 signals) against Qdrant. Top memories are injected into context, ranked by Q-value from the active exp.
Reward
SessionEnd computes reward from git events (commits, PRs, tests). All memories recalled during the session get Q-value updates: Q += α · reward.
Extract
Opus reads the transcript via claude -p and extracts decisions and insights. Not "edited file.html" but "chose X because Y." Stored as first-class memories.
Learning timeline
Q-values start at 0.0 and diverge based on real outcomes.
Day 1
Observations accumulate. All Q-values at 0.0. SessionEnd ingests into Qdrant, computes first rewards from git events. No visible change yet.
Day 5
Q-values start diverging. Memories from productive sessions sit at q=0.1-0.2. SessionStart injection now returns ranked context instead of random.
Week 3
Clear signal/noise separation. Useful context at q=0.5+, noise at q<0. Decision extraction adds strategic memories. Retrieval quality visibly improves.
Month 2+
Run calibrate_experience_q for manual fine-tuning. Create custom exps for different workflows. Outcome resolvers (CRM CSV → rewards) close the business loop.
How it compares
Other tools add memory. OpenExp adds memory that learns from outcomes.
| Feature | OpenExp | Mem0 | Zep / Graphiti | LangMem |
|---|---|---|---|---|
| Core approach | Q-learning on retrieval outcomes | Managed memory API | Knowledge graph + temporal | LLM-managed memory |
| Learns from outcomes | Q-values from real results | No | No | No |
| Retrieval | BM25 + vector + Q-value (5 signals) | Vector similarity | Graph traversal + vector | Vector similarity |
| Hosting | Local only (Docker + Qdrant) | Managed cloud (easiest setup) | Cloud or self-hosted | Cloud API |
| UI / Dashboard | CLI + MCP tools only | Web dashboard | Web UI | No |
| Claude Code integration | Native hooks, zero-config | Manual MCP setup | Manual MCP setup | Manual MCP setup |
| Domain-specific profiles | Exps (reward weights, pipelines) | No | No | No |
| Privacy | All data stays local | Data on their servers | Depends on deployment | Data on their servers |
Mem0 and Zep are great if you want managed infrastructure. OpenExp is for people who want their memory system to actually learn from outcomes.
Exps: Q-learning indexes for your domain
An exp defines what "productive" means. Different work → different reward signals → different Q-value trajectories.
default
Optimized for coding workflows. Commits and PRs are the primary success signals. Good session: edit files, commit, open PR = +0.42 reward. Empty session: just read files = -0.20.
Pipeline: backlog → in_progress → review → merged → deployed
- commit+0.30
- pr+0.20
- deploy+0.10
- tests+0.10
- decisions+0.10
- base (every session)-0.10
sales
Optimized for outreach, follow-ups, and deal progression. Rewards decisions and insights, not raw tool actions. The system learns which email strategies and talking points lead to closed deals.
Pipeline: lead → contacted → qualified → proposal → negotiation → won
- decision+0.25
- email_sent+0.15
- follow_up+0.15
- proposal+0.20
- deal_won (outcome)+0.80
- base (every session)-0.10
dealflow
Full deal lifecycle from discovery to payment. Tracks proposals, invoices, and payment events. Learns which discovery approaches and pricing strategies convert best.
Pipeline: lead → discovery → nda → proposal → negotiation → invoice → paid
- proposal_sent+0.20
- invoice_created+0.15
- payment_received+0.30
- deal_won (outcome)+0.80
- decisions+0.15
- base (every session)-0.10
Create Your Own
Define what "productive" means for your specific workflow. An Experience is a YAML file that controls how OpenExp scores sessions and which memories get rewarded.
Three ways to activate:
-
1.
Drop
.openexp.yamlin your project root withexperience: my-exp -
2.
Set env var:
export OPENEXP_EXPERIENCE=my-exp - 3. Auto-detect: OpenExp reads your prompts and picks the right experience
5-signal hybrid retrieval
Not just vector similarity. Five weighted signals, configurable per exp.
Vector
Semantic similarity via FastEmbed (BAAI/bge-small, 384d)
BM25
Keyword match for exact term recall
Recency
Exponential decay, 90-day half-life
Importance
Type-weighted: decisions > insights > actions
Q-Value
Learned usefulness from real outcomes
Plus 10% epsilon-greedy exploration — occasionally surfaces low-Q memories to prevent premature convergence.
Local-First. Private by Default.
Your data never leaves your machine. No cloud APIs required for core functionality.
Local Embeddings
FastEmbed runs BAAI/bge-small-en-v1.5 locally. No API key needed. 384-dimension vectors, free forever.
Local Vector DB
Qdrant runs as a Docker container on your machine. All memories, Q-values, and observations stay local.
Optional LLM
Anthropic API key is optional. Without it, memories get default metadata. With it, auto-enrichment adds type, tags, and validity windows.
Quick Start
Three commands. Open Claude Code — it now has memory.
$ git clone https://github.com/anthroos/openexp.git
$ cd openexp
$ ./setup.sh
✓ Python venv created
✓ Dependencies installed
✓ Qdrant container started (Docker)
✓ Hooks registered in Claude Code
✓ MCP server configured
# 2. Verify it works
$ openexp stats
Memories: 0 | Q-cache: 0 entries | Qdrant: connected
# 3. Open Claude Code in any project
$ claude
OpenExp is now observing. Work normally — it learns in the background.
# Optional: set an experience for non-coding work
$ echo "experience: sales" > .openexp.yaml
# No API key needed for core functionality.
16 MCP tools
Works in Claude Code, Cursor, or any MCP client. Hooks run the learning loop; tools let you inspect and steer.
Memory
search_memory
Hybrid BM25 + vector + Q-value retrievaladd_memory
Store with auto-enrichment (type, tags, validity)get_agent_context
Full context for a specific agent/sessionreflect
Trigger self-reflection on recent work
Q-Learning
explain_q
Why a memory has its Q-value (full trace)reward_detail
Reward breakdown for a specific memorymemory_reward_history
Full reward timeline for a memorycalibrate_experience_q
Manually boost/penalize Q-valuesreload_q_cache
Reload Q-cache from disk
Experiences
experience_info
Active exp config, weights, pipeline stagesexperience_insights
Reward distribution, learning velocityexperience_top_memories
Highest-Q memories for an expmemory_stats
Collection stats, Q-cache summary
Outcomes
log_prediction
Record a prediction with confidencelog_outcome
Record actual outcome, compute accuracyresolve_outcomes
CRM stage changes → Q-value rewards
$ openexp search -q "auth flow" -n 5
$ openexp stats
$ openexp ingest --dry-run
$ openexp resolve
$ openexp experience list
$ openexp compact --dry-run
Frequently Asked Questions
How is this different from Mem0?
Different tools for different problems. Mem0 is a managed memory API — polished, cloud-hosted, has a web dashboard, easy setup. If you want to add basic memory to an app without running infrastructure, Mem0 is solid.
OpenExp solves a different problem: it tracks whether retrieved memories actually helped (via Q-learning from real outcomes like commits, PRs, closed deals). It's local-first, requires Docker, has no UI — CLI and MCP tools only. Choose it if you care about retrieval quality improving over time, not just storage.
Do I need an API key?
No. Core functionality works without any API key. Embeddings run locally via FastEmbed. Qdrant runs as a Docker container. The full learning loop — observe, remember, reward, improve — is completely free.
An Anthropic API key is optional — it enables auto-enrichment (type classification, tags, validity windows). Decision extraction uses claude -p (Claude Code pipe mode), which runs on your Max subscription at zero API cost. Everything works great without an API key.
What is a Q-value?
A learned quality score per memory, range [-0.5, 1.0], init 0.0. Update: Q_new = Q_old + α · reward (α=0.25). Reward comes from git events (commits, PRs, tests) weighted by the active exp.
Three layers: action (50%) — did it help get work done; hypothesis (20%) — was the content accurate; fit (30%) — was it relevant to the situation. Negative rewards are discounted on the fit layer (50%) so good-fit memories aren't killed by one bad session.
What are Experiences?
An Experience tells OpenExp what "productive" means for your type of work. The default one rewards coding: commits, PRs, test runs. But if you're doing sales, a "productive session" means emails sent, follow-ups done, deals progressed — not git commits.
Three come built-in (default, sales, dealflow). You can create your own by writing a simple YAML file — define what actions count as success, and OpenExp optimizes for that.
Does it work with Cursor / other AI editors?
OpenExp is built as an MCP server, so any MCP-compatible client can use it. Claude Code has native hook integration (zero config). For Cursor and other editors that support MCP, you add OpenExp as an MCP tool provider.
The core learning loop (Q-value updates, hybrid search, outcome rewards) works regardless of which client you use.
Where this is going
OpenExp is a database and annotation layer today. Tomorrow it's a training pipeline.
Now: Hippocampus
Store everything in a vector DB. Q-learning indexes (exps) score each memory based on real outcomes. Hybrid retrieval surfaces what worked. The system accumulates experience automatically.
Next: Curated training data
Q-values are labels. High-Q memories = high-quality training examples. Low-Q = noise. OpenExp becomes an automatic annotation pipeline for fine-tuning local models — no manual labeling needed.
Future: Personal AI that learns
Fine-tunable local models (on-device, private) trained on Q-curated experience. Not RAG — the model knows, not searches. Online learning: every session updates the model. System 2 → System 1 compression.
Store everything.
Retrieve what worked.
Three commands to install. Hooks fire automatically. Q-values diverge within a week. Open source, local-first, MIT licensed.