ReasoningBank: Strategy Memory in Vel

ReasoningBank is Vel’s strategic memory layer, designed to help agents improve reasoning over time by recalling how they’ve solved similar problems in the past.
It stores distilled heuristics and anti-patterns, indexed by embeddings, to make retrieval semantic rather than literal.

🧩 Concept Overview

ReasoningBank treats each successful (or failed) task as a learning event.

Each run yields:

Strategy text – A short heuristic such as
“Summarize user intent before planning.”
Anti-patterns – Optional “things to avoid,” e.g.
“Do not replan mid-stream.”
Signature – Structured metadata describing the task context, such as
{"intent": "planning", "domain": "fastapi", "risk": "low"}
Outcome – Whether the strategy succeeded.

These items are stored in a small local database (SQLite by default) and indexed by vector embeddings.

🔍 Why Embeddings Matter

Embeddings give ReasoningBank a sense of similarity between tasks.
They make recall semantic rather than exact.

Without embeddings

Retrieval would rely on literal key matches —
e.g., "intent": "planning" must appear exactly the same for a strategy to be found.

That means:

"build API" and "create endpoint" would not match.
"analyze report" and "summarize document" would appear unrelated.

With embeddings

Embeddings represent each text (strategy or signature) as a vector in high-dimensional space.
Similar meanings produce nearby vectors.

This lets ReasoningBank retrieve strategies that feel conceptually related even when the words differ.

Example:
A strategy learned from “Summarize a document” can help “Condense a report” because their embeddings are close.

⚙️ How It Works Internally

1. Insertion (Learning)

When a run completes successfully:

The runtime creates embeddings for both the strategy text and the task signature.
It stores:
- strategy_text, anti_patterns, signature, confidence
- vector_strategy, vector_signature
The confidence score increases slightly with each success.

2. Retrieval (Advice)

When a new task begins:

The runtime encodes the current task signature using the same embedding function.
It computes cosine similarity between the new vector and stored signature vectors.
The top-K most similar strategies are retrieved and injected as “Strategy Advice” at the start of the LLM prompt.

This happens synchronously and usually completes in under 50 ms.

3. Updating (Outcome)

After the run:

The system adjusts confidence scores asynchronously.
Failed runs lower confidence or record anti-pattern notes.

All of this happens without LLM tool calls — memory is fully runtime-owned.

🧠 The Role of `embeddings_fn`

Vel doesn’t impose a specific embedding model — you provide one.

It just needs a callable:

embeddings_fn: Callable[[List[str]], np.ndarray]

that takes a list of texts and returns a NumPy array of vectors.

Example 1 — Minimal Hash-Based Embeddings

A deterministic option for local use (no dependencies):

def encode(texts):
    import numpy as np, hashlib
    out = []
    for t in texts:
        h = hashlib.sha256(t.encode()).digest()
        v = np.frombuffer(h, dtype=np.uint8).astype(np.float32)
        v = (v - v.mean()) / (v.std() + 1e-8)
        out.append(v[:256])
    return np.vstack(out)

Example 2 — SentenceTransformer Embeddings

For semantic understanding:

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

def encode(texts):
    return np.array(model.encode(texts, normalize_embeddings=True), dtype=np.float32)

📊 Similarity Metric

ReasoningBank uses cosine similarity to rank stored strategies:

[ \text{similarity}(A, B) = \frac{A \cdot B}{||A|| , ||B||} ]

Top-K items (default k=5) are returned as recommendations. They can optionally be prefixed to your LLM’s system prompt:

Strategy Advice:
1. Clarify the user’s intent before executing.
2. Avoid re-evaluating after completion.

🧱 Database Schema (Actual Implementation)

rb_strategies table:

Column	Type	Description
`id`	INTEGER	Primary key
`signature_json`	TEXT	Context metadata (JSON)
`strategy_text`	TEXT	The heuristic itself
`anti_patterns`	TEXT	JSON list of “avoid” statements
`evidence_refs`	TEXT	JSON list of run IDs
`confidence`	REAL	Strength of belief (0.0-1.0)
`created_at`	REAL	Unix timestamp
`updated_at`	REAL	Unix timestamp

rb_embeddings table:

Column	Type	Description
`strategy_id`	INTEGER	Foreign key to rb_strategies(id)
`embedding`	BLOB	Combined embedding (signature + strategy text)
`dim`	INTEGER	Embedding dimensionality

Note: Unlike the paper’s two separate embeddings, Vel combines signature and strategy text into one embedding vector for efficiency.

All vectors are stored as serialized NumPy arrays (float32).

🚀 Performance and Scaling

Operation	Description	Typical latency
Retrieval	Embedding + cosine similarity across ≤1k strategies	20–50 ms
Update	Background async write (confidence + anti-patterns)	<1 ms per record

For larger-scale deployments, ReasoningBank can easily migrate to:

FAISS
Qdrant
SQLite + vector extension

The interface remains the same.

🧩 When to Use ReasoningBank

Scenario	Benefit
Repeated reasoning tasks (analysis, planning)	Reuses heuristics for similar tasks
Multi-session agents	Learns to “think” better over time
Specialized domains	Adapts domain-specific strategies
Rapid prototyping	Improves behavior without retraining

⚠️ Best Practices

Limit rb_top_k to 3–5 to keep prompts lean.
Manually prune low-confidence strategies periodically (no automatic decay implemented).
Never inject strategy advice directly into user-visible text (system-only).
Keep your embedding function stable — changing models resets similarity baselines.
Provide clear success/failure signals for accurate confidence updates.

🧩 Summary

ReasoningBank provides strategic rather than episodic memory. By embedding past strategies and task signatures, it lets your agent recall what reasoning worked before in conceptually similar contexts.

This gives your agent:

Semantic recall without explicit retraining.
Runtime-only memory (no extra tool calls).
Continual self-improvement through feedback loops.

Think of ReasoningBank as long-term procedural memory for reasoning itself — it doesn’t store facts, it stores how to think.