Open Source · Local-First · MIT Licensed

Store everything.
Retrieve what worked.

More context doesn't mean better results. GPT-4o drops from 99% to 70% accuracy at 32K tokens. Opus loses 1 in 5 facts at 1M. OpenExp flips the approach: write everything to a vector DB, then use Q-learning indexes (exps) to surface only what actually helped in similar situations.

MIT License Python 3.11+ 258 tests passed

Works with

Claude Code
MCP
Qdrant
FastEmbed
Cursor

The real problem isn't forgetting

It's retrieving the wrong things. More context = worse accuracy. The data is there.

Context degrades with scale

GPT-4o: 99.3% → 69.7% at 32K tokens. Opus: 78.3% on MRCR v2 at 1M. Real-world degradation starts at 400K. After 600K, retrieval is unreliable. Stuffing more into the prompt makes things worse.

Vector search has no memory

Mem0, Zep, LangMem add storage. But vector similarity alone can't tell you: "last time you recalled this context, did it actually help?" Every memory is equally weighted. There's no learning signal.

No outcome signal

Did that retrieved context lead to a commit? A closed deal? A passed test? Nobody tracks the connection between retrieval and results. So the system never learns which retrieval was good.

A hippocampus for your AI agent

Write everything to the vector DB. Let Q-learning indexes decide what to retrieve.

Two principles

1. Store everything. Every action, result, decision, context. Storage is cheap. Don't filter on write — you don't know what will be useful later.

2. Retrieve what worked. Exps (experiences) are Q-learning indexes on top of the vector DB. They track: "last time this context was retrieved in a similar situation, did the session produce results?" Good context gets higher Q-values. Noise sinks.

Result: 5 precise memories instead of 200 dumped into the prompt. Less tokens, better accuracy.

# Dev: refactoring auth module. 200+ memories in DB.
# Without exps — pulls everything, context polluted.
# With exps — top 5 by Q-value:

[q=0.73] Approach X passed all tests last refactor
[q=0.65] Approach Y broke 3 tests — avoid
[q=0.42] Auth uses token refresh, 15min expiry
[q=0.12] Redis caching attempted, rolled back

# Sales: preparing outreach for fintech CTO.
# Q-values show: compliance angle closed 3 deals,
# product-speed pitch converted zero.

How it works

Four hooks fire automatically. PostToolUse writes, SessionStart reads, SessionEnd scores.

1

Observe

PostToolUse hook captures every file edit, bash command, and tool call as structured JSONL observations. Reads are filtered out.

2

Recall

SessionStart runs hybrid retrieval (5 signals) against Qdrant. Top memories are injected into context, ranked by Q-value from the active exp.

3

Reward

SessionEnd computes reward from git events (commits, PRs, tests). All memories recalled during the session get Q-value updates: Q += α · reward.

4

Extract

Opus reads the transcript via claude -p and extracts decisions and insights. Not "edited file.html" but "chose X because Y." Stored as first-class memories.

Learning timeline

Q-values start at 0.0 and diverge based on real outcomes.

1

Day 1

Observations accumulate. All Q-values at 0.0. SessionEnd ingests into Qdrant, computes first rewards from git events. No visible change yet.

5

Day 5

Q-values start diverging. Memories from productive sessions sit at q=0.1-0.2. SessionStart injection now returns ranked context instead of random.

20

Week 3

Clear signal/noise separation. Useful context at q=0.5+, noise at q<0. Decision extraction adds strategic memories. Retrieval quality visibly improves.

60

Month 2+

Run calibrate_experience_q for manual fine-tuning. Create custom exps for different workflows. Outcome resolvers (CRM CSV → rewards) close the business loop.

How it compares

Other tools add memory. OpenExp adds memory that learns from outcomes.

Feature OpenExp Mem0 Zep / Graphiti LangMem
Core approach Q-learning on retrieval outcomes Managed memory API Knowledge graph + temporal LLM-managed memory
Learns from outcomes Q-values from real results No No No
Retrieval BM25 + vector + Q-value (5 signals) Vector similarity Graph traversal + vector Vector similarity
Hosting Local only (Docker + Qdrant) Managed cloud (easiest setup) Cloud or self-hosted Cloud API
UI / Dashboard CLI + MCP tools only Web dashboard Web UI No
Claude Code integration Native hooks, zero-config Manual MCP setup Manual MCP setup Manual MCP setup
Domain-specific profiles Exps (reward weights, pipelines) No No No
Privacy All data stays local Data on their servers Depends on deployment Data on their servers

Mem0 and Zep are great if you want managed infrastructure. OpenExp is for people who want their memory system to actually learn from outcomes.

Exps: Q-learning indexes for your domain

An exp defines what "productive" means. Different work → different reward signals → different Q-value trajectories.

default

Optimized for coding workflows. Commits and PRs are the primary success signals. Good session: edit files, commit, open PR = +0.42 reward. Empty session: just read files = -0.20.

Pipeline: backlog → in_progress → review → merged → deployed

  • commit+0.30
  • pr+0.20
  • deploy+0.10
  • tests+0.10
  • decisions+0.10
  • base (every session)-0.10
default.yaml
# .openexp.yaml in your project experience: default # Session ends: edit + commit + PR reward = +0.30 + 0.20 + 0.02*3 = +0.56 # Recalled memories get Q-value boost # Next session: these memories rank higher

sales

Optimized for outreach, follow-ups, and deal progression. Rewards decisions and insights, not raw tool actions. The system learns which email strategies and talking points lead to closed deals.

Pipeline: lead → contacted → qualified → proposal → negotiation → won

  • decision+0.25
  • email_sent+0.15
  • follow_up+0.15
  • proposal+0.20
  • deal_won (outcome)+0.80
  • base (every session)-0.10
sales.yaml
# Outcome-based rewards: # CRM deal moves negotiation -> won add_memory( content="Client prefers Google stack", client_id="comp-acme" ) ... weeks of work ... resolve_outcomes() # -> memories tagged comp-acme get +0.8

dealflow

Full deal lifecycle from discovery to payment. Tracks proposals, invoices, and payment events. Learns which discovery approaches and pricing strategies convert best.

Pipeline: lead → discovery → nda → proposal → negotiation → invoice → paid

  • proposal_sent+0.20
  • invoice_created+0.15
  • payment_received+0.30
  • deal_won (outcome)+0.80
  • decisions+0.15
  • base (every session)-0.10
dealflow.yaml
# Create your own experience: openexp experience create # Pick a process type (dev/sales/support) # Customize stages, signal weights # Memory type filters # Or set via env: export OPENEXP_EXPERIENCE=dealflow

Create Your Own

Define what "productive" means for your specific workflow. An Experience is a YAML file that controls how OpenExp scores sessions and which memories get rewarded.

Three ways to activate:

  • 1. Drop .openexp.yaml in your project root with experience: my-exp
  • 2. Set env var: export OPENEXP_EXPERIENCE=my-exp
  • 3. Auto-detect: OpenExp reads your prompts and picks the right experience
~/.openexp/experiences/support.yaml
# Example: Customer Support experience name: support description: Customer support workflows # What counts as productive? session_reward_weights: ticket_resolved: 0.30 response_sent: 0.20 escalation: -0.10 decision: 0.15 base: -0.10 # Pipeline stages (for tracking) pipeline: - new - investigating - responded - resolved - closed # Only reward these memory types reward_memory_types: - decision - insight - action

5-signal hybrid retrieval

Not just vector similarity. Five weighted signals, configurable per exp.

30%

Vector

Semantic similarity via FastEmbed (BAAI/bge-small, 384d)

10%

BM25

Keyword match for exact term recall

15%

Recency

Exponential decay, 90-day half-life

15%

Importance

Type-weighted: decisions > insights > actions

30%

Q-Value

Learned usefulness from real outcomes

Plus 10% epsilon-greedy exploration — occasionally surfaces low-Q memories to prevent premature convergence.

Local-First. Private by Default.

Your data never leaves your machine. No cloud APIs required for core functionality.

Local Embeddings

FastEmbed runs BAAI/bge-small-en-v1.5 locally. No API key needed. 384-dimension vectors, free forever.

Local Vector DB

Qdrant runs as a Docker container on your machine. All memories, Q-values, and observations stay local.

Optional LLM

Anthropic API key is optional. Without it, memories get default metadata. With it, auto-enrichment adds type, tags, and validity windows.

Quick Start

Three commands. Open Claude Code — it now has memory.

Python 3.11+ Docker (Qdrant runs as a single ~200MB container) jq Claude Code
# 1. Clone and install
$ git clone https://github.com/anthroos/openexp.git
$ cd openexp
$ ./setup.sh

Python venv created
Dependencies installed
Qdrant container started (Docker)
Hooks registered in Claude Code
MCP server configured

# 2. Verify it works
$ openexp stats
Memories: 0 | Q-cache: 0 entries | Qdrant: connected

# 3. Open Claude Code in any project
$ claude
OpenExp is now observing. Work normally — it learns in the background.

# Optional: set an experience for non-coding work
$ echo "experience: sales" > .openexp.yaml
# No API key needed for core functionality.

16 MCP tools

Works in Claude Code, Cursor, or any MCP client. Hooks run the learning loop; tools let you inspect and steer.

Memory

  • search_memory
    Hybrid BM25 + vector + Q-value retrieval
  • add_memory
    Store with auto-enrichment (type, tags, validity)
  • get_agent_context
    Full context for a specific agent/session
  • reflect
    Trigger self-reflection on recent work

Q-Learning

  • explain_q
    Why a memory has its Q-value (full trace)
  • reward_detail
    Reward breakdown for a specific memory
  • memory_reward_history
    Full reward timeline for a memory
  • calibrate_experience_q
    Manually boost/penalize Q-values
  • reload_q_cache
    Reload Q-cache from disk

Experiences

  • experience_info
    Active exp config, weights, pipeline stages
  • experience_insights
    Reward distribution, learning velocity
  • experience_top_memories
    Highest-Q memories for an exp
  • memory_stats
    Collection stats, Q-cache summary

Outcomes

  • log_prediction
    Record a prediction with confidence
  • log_outcome
    Record actual outcome, compute accuracy
  • resolve_outcomes
    CRM stage changes → Q-value rewards
# Same operations via CLI
$ openexp search -q "auth flow" -n 5
$ openexp stats
$ openexp ingest --dry-run
$ openexp resolve
$ openexp experience list
$ openexp compact --dry-run

Frequently Asked Questions

How is this different from Mem0?

+

Different tools for different problems. Mem0 is a managed memory API — polished, cloud-hosted, has a web dashboard, easy setup. If you want to add basic memory to an app without running infrastructure, Mem0 is solid.

OpenExp solves a different problem: it tracks whether retrieved memories actually helped (via Q-learning from real outcomes like commits, PRs, closed deals). It's local-first, requires Docker, has no UI — CLI and MCP tools only. Choose it if you care about retrieval quality improving over time, not just storage.

Do I need an API key?

+

No. Core functionality works without any API key. Embeddings run locally via FastEmbed. Qdrant runs as a Docker container. The full learning loop — observe, remember, reward, improve — is completely free.

An Anthropic API key is optional — it enables auto-enrichment (type classification, tags, validity windows). Decision extraction uses claude -p (Claude Code pipe mode), which runs on your Max subscription at zero API cost. Everything works great without an API key.

What is a Q-value?

+

A learned quality score per memory, range [-0.5, 1.0], init 0.0. Update: Q_new = Q_old + α · reward (α=0.25). Reward comes from git events (commits, PRs, tests) weighted by the active exp.

Three layers: action (50%) — did it help get work done; hypothesis (20%) — was the content accurate; fit (30%) — was it relevant to the situation. Negative rewards are discounted on the fit layer (50%) so good-fit memories aren't killed by one bad session.

What are Experiences?

+

An Experience tells OpenExp what "productive" means for your type of work. The default one rewards coding: commits, PRs, test runs. But if you're doing sales, a "productive session" means emails sent, follow-ups done, deals progressed — not git commits.

Three come built-in (default, sales, dealflow). You can create your own by writing a simple YAML file — define what actions count as success, and OpenExp optimizes for that.

Does it work with Cursor / other AI editors?

+

OpenExp is built as an MCP server, so any MCP-compatible client can use it. Claude Code has native hook integration (zero config). For Cursor and other editors that support MCP, you add OpenExp as an MCP tool provider.

The core learning loop (Q-value updates, hybrid search, outcome rewards) works regardless of which client you use.

Where this is going

OpenExp is a database and annotation layer today. Tomorrow it's a training pipeline.

Now: Hippocampus

Store everything in a vector DB. Q-learning indexes (exps) score each memory based on real outcomes. Hybrid retrieval surfaces what worked. The system accumulates experience automatically.

Next: Curated training data

Q-values are labels. High-Q memories = high-quality training examples. Low-Q = noise. OpenExp becomes an automatic annotation pipeline for fine-tuning local models — no manual labeling needed.

Future: Personal AI that learns

Fine-tunable local models (on-device, private) trained on Q-curated experience. Not RAG — the model knows, not searches. Online learning: every session updates the model. System 2 → System 1 compression.

Store everything.
Retrieve what worked.

Three commands to install. Hooks fire automatically. Q-values diverge within a week. Open source, local-first, MIT licensed.