Your agent processed 10,000 conversations yesterday. And learned nothing.
A lightweight Python SDK that wraps any LLM with persistent memory, autonomous reflection cycles, and an internal critic that evaluates every output against the agent’s own self-authored rules.
22% → 2% fabrication rate on the same model (Claude Haiku). n=45, single corpus, broader eval in progress.
Generic responses. No context. Starts from zero like every other agent.
Patterns start appearing. First rules proposed. Memory shapes search results.
Agent behaves differently. Self-authored rules active. Dream cycles surfacing insights.
You don’t want to lose it. Accumulated judgment that can’t be recreated from scratch.
Install the SDK. Wrap your agent. It remembers conversations, reflects on patterns, writes its own rules, and evaluates its own output — all persisted across sessions. Your agent on Day 30 is measurably different from your agent on Day 1.
Agent records real interactions into three-tier memory with semantic search.
Dream cycles extract patterns, synthesize journals, and audit existing rules.
Agent writes and applies its own behavioral rules. Critic evaluates every output.
# pip install "wisdom-layer[ollama]" # Works with local models (Ollama) or cloud providers from wisdom_layer import WisdomAgent, AgentConfig from wisdom_layer.llm.ollama import OllamaAdapter from wisdom_layer.storage.sqlite import SQLiteBackend llm = OllamaAdapter(model="llama3.2") agent = WisdomAgent( agent_id="support-agent", config=AgentConfig(name="Support Agent"), llm=llm, backend=SQLiteBackend("./agent.db", embed_fn=llm.embed), ) await agent.initialize() # Agent remembers this conversation await agent.memory.capture("conversation", {"user": msg}) # Agent reflects overnight — reconsolidates, audits, journals report = await agent.dreams.trigger() # Agent evaluates its own output against learned rules review = await agent.critic.evaluate(response)
Core subsystems, built and tested. When an agent captures experience, reflects on it, writes rules from it, and evaluates against those rules — it develops persistent identity across sessions. No fine-tuning. No retraining. Just architecture.
Built for production: spend ceilings, append-only provenance, and longitudinal health monitoring on every paid tier.
Raw events → consolidated knowledge → reflective journals. Semantic search across all tiers. Automatic salience scoring and decay. Every insight traces back to source.
Autonomous reflection pipeline: reconsolidate memories, evolve directives, audit coherence, synthesize journals. Schedulable with cron-like intervals or trigger on-demand.
Evaluates agent output against active directives in real time. Catches narrative inflation, confidence miscalibration, and source grounding failures before they reach your users.
Agents propose their own behavioral rules from experience. Rules follow a lifecycle: provisional → active → permanent. Human-approved, automatically enforced.
Composite wisdom score (0–1) snapshotted daily. Cognitive-state classifier (healthy / stagnant / drifting / overloaded). 30-day trajectory window on Pro, unlimited on Enterprise. Catches drift before it ships.
Every mutation logged: memory captures, directive promotions, dream
phases, snapshots. agent.provenance.trace()
for any entity, .explain()
(Enterprise) for narrated chains, .export()
(Enterprise) for compliance archival.
Hard-enforced spend ceilings on three windows: daily, monthly, and per-cycle. Calls fail at the cap, not warnings in a log. Pre-flight cost estimate before any dream cycle so you decide before you spend. Per-call metering, CSV export on Enterprise.
Drop-in LangGraph nodes (recall, capture, dream, directives). MCP server for Claude Code and Cursor. LangChain BaseStore adapter. See docs →
Browser-based visualization of your agent’s cognitive architecture.
Health gauges, directive lifecycle, memory search, dream history,
and full configuration panel. pip install wisdom-layer[dashboard]
Anthropic, OpenAI, Gemini, Ollama (local), LiteLLM (100+ providers), and CallableAdapter for custom inference. Model-agnostic by design. Zero vendor lock-in.
Compiled feature enforcement (Cython-built, monkeypatch-resistant). Ed25519-signed license claims (verified locally, no network round-trip). Zero telemetry — your agent counts, traffic patterns, and deployment topology stay on your infrastructure. Read more →
The Wisdom Layer SDK is the formalized version of patterns extracted from work across three domains. The SDK is the packaging of what already worked.
7 agents running continuously for 6+ months on a digital-brain architecture inspired by functional neuroscience. Source material for the SDK and the benchmarks published on this page.
A computational pharmacogenomics platform built on the pre-SDK architecture is producing cross-platform-validated biomarker candidates. Active collaboration discussions with academic cancer centers.
Loom-code, an internal AI-assisted coding tool built on the same architecture, runs across 20+ of my own development repositories. Each repo accumulates hundreds of memories that distill into 10–15 targeted directives, measurably reducing error rates in agent-driven code generation.
Patent pending. 1,498 tests passing. v1.0 live on PyPI — model-agnostic, zero infrastructure required (SQLite included). Integrates with LangGraph, MCP, and LangChain.
45 questions. Same model (Claude Haiku). Same prompts. The only variable: the Wisdom Layer architecture.
Single-corpus early eval — broader multi-corpus evaluation in progress.
Pro tier only. Real-time Critic not engaged. Same model both conditions. Full methodology on request.
We’d rather you know exactly where we are. Everything marked “Shipped”
is tested, documented, and available now via pip install wisdom-layer. Everything else has a timeline.
| Capability | Status |
|---|---|
| Three-tier memory (capture, search, consolidate, decay) | Shipped |
| Dream cycles (reconsolidate, evolve, audit, journal, synthesize) | Shipped |
| Internal critic (evaluate, audit, veto) | Shipped |
| Directive evolution (propose, promote, decay, lifecycle) | Shipped |
| 6 LLM adapters (Anthropic, OpenAI, Gemini, Ollama, LiteLLM, Callable) | Shipped |
| SQLite & Postgres backends | Shipped |
| Dream scheduling (cron-like intervals, pause/resume) | Shipped |
| Full provenance tracking & LLM-narrated explain | Shipped |
| Health analytics (wisdom score, cognitive classifier, trajectory) | Shipped |
| Cost visibility & budget guards (daily/monthly caps) | Shipped |
| Export/import, cross-backend clone, re-embed | Shipped |
| Retry policy, graceful shutdown, dream checkpointing | Shipped |
| Dashboard (health, directives, memory, dreams, config) | Shipped |
| LangGraph nodes (recall, capture, dream, directives) | Shipped |
| MCP server (Claude Code, Cursor, Windsurf) | Shipped |
| LangChain adapter (WisdomStore + legacy BaseMemory) | Shipped |
| Feature flags & tier enforcement | Shipped |
| SyncWisdomAgent (blocking wrapper for scripts/Jupyter) | Shipped |
| PyPI distribution & Cython-compiled internals | Shipped |
| v1.0 GA (public launch) | Shipped |
| Multi-agent coordination | Planned — v1.1+ |
Start free. Upgrade when your agents need full behavioral evolution.
For your team or product. Multi-tenant deployments (one agent per customer) typically move to Enterprise.
Upgrade to ProPricing reflects founding rates and is subject to change. Final terms are confirmed in your service agreement.
Technical deep dives into the architecture and the results.
Same model. Same questions. 97.8% accuracy vs 77.8%. 10× fewer fabrications. The only variable was the Wisdom Layer.
Why agents that process 10,000 conversations a day learn nothing from any of them.
A small model autonomously designed a framework to catch its own confabulation.
Memory scaling works. Retrieval isn’t judgment — and judgment is what breaks in production.
1,498 passing tests. Full integrations. Production dashboard. Install in 30 seconds.
Or reach out directly: jeff@rhatigan.ai
Built by Jeff Rhatigan over 9 months, drawing on a research platform of 7 persistent agents in continuous operation. The SDK is the formalization of what worked. Same architecture is now powering a computational pharmacogenomics research platform and the loom-code AI coding utility across 20+ repos. If you’re building agents that need to get better over time, I’d like to hear what you’re working on.