How Gnosis Works
Gnosis is a remote cloud MCP server. Your AI assistant connects to it over HTTPS, stores memories, and retrieves them via semantic search. There is no local install, no Docker, no vector database to manage. Just a URL.
Under the hood, four pieces of engineering make this work well.
Topic-Landscape Architecture
The core problem with AI memory is discovery: how does the LLM know what you've stored without reading everything? Most memory systems solve this with random samples — dump ~800 tokens of arbitrary memories into context and hope something relevant shows up.
Structured Discovery, Not Random Sampling
At session start, init_core_memories returns a structured topic landscape — a complete map of your knowledge graph organized into macro-topic clusters with memory counts and active task counts, topic keywords with density, type distributions, active tasks with progress indicators, and your behavioral preferences. Every token carries signal. Nothing is random. Nothing is wasted.
Compare that to systems that spend 800 tokens on a random grab bag of memories that are irrelevant 60% of the time and give the AI no way to search for what it actually needs. Gnosis gives your AI a map and a search engine. Competitors give it a handful of confetti.
How the topic landscape is structured
The landscape is a compressed representation of your entire memory corpus. It includes:
- Macro-topic clusters with memory counts and active task counts — your AI sees
gnosis(1501, 3 tasks)and knows there's a large body of knowledge with open work items on that topic - Topic keywords with density counts — a flat list of every topic in your corpus, ranked by frequency, so the AI knows what search terms will find results
- Type distributions — how many facts, decisions, tasks, and preferences exist, giving the AI a sense of corpus shape
- Active tasks with progress indicators — incomplete work surfaces automatically
- Behavioral preferences — communication style, workflow rules, and constraints that shape the AI's behavior from turn one
The format is content-agnostic. It works identically whether your corpus is deeply technical, personal, or a mix. The AI doesn't need to see your memories to know what you've stored — it navigates the landscape and searches when it needs specifics.
Preview-then-retrieve: how search stays efficient
When your AI searches, Gnosis returns compressed previews — not full memories. Your AI scans the previews, picks the ones it needs, and retrieves only those in full:
- Breadth first — 32 previews fit in roughly 400 tokens. Traditional RAG returns 10 full chunks at 3,000–5,000 tokens — most of which the AI ignores
- Depth on demand — your AI reads the previews, identifies the 2–3 it actually needs, and retrieves only those. The AI decides what to read, not the system
- Piggybacked initialization — the first search can ride along with the init call, eliminating a network round-trip and the token overhead of a separate tool invocation
The key insight is who decides what to read. In traditional RAG, the system guesses which chunks are relevant. In Gnosis, your AI makes that choice — informed by previews that are cheap enough to scan in bulk.
Quality By Design
Gnosis has no server-side LLM. It is a GDPR data processor — it stores what it's told to store. So how does 99.8% of stored content grade B+ or better?
Protocol-Guided Quality
Every MCP tool description is deeply refined to guide the calling LLM toward writing structured, specific, searchable memories. The descriptions encode creation guidelines, topic conventions, type taxonomy, and quality heuristics directly into the tool schema — the same schema the LLM reads before deciding what to write.
This works even with small 8B-parameter models. The quality isn't enforced by a gate — it's guided by the interface itself. Combined with server-side deduplication that prevents redundant storage, the result is a clean memory corpus without any content filtering.
What makes a high-quality memory
Every memory is guided toward a specific structure:
- Front-loaded summary — the first 50 characters must be an executive summary that stands alone. An AI deciding whether to retrieve a memory reads only this preview — if it's vague, the memory is effectively invisible
- Type discipline — each memory is classified as a fact, preference, decision, path, or task. Each type has structural requirements that make it searchable in predictable ways
- Topic keywords — single lowercase words chosen to match future search queries. If searching "redis" should find a memory about caching, "redis" must be in the topics — even if the memory is primarily about sessions
- Self-contained content — present tense, includes the rationale, names the subject. No memory should require reading another memory to understand
These conventions aren't enforced by a server-side filter. They're embedded in the tool descriptions that every LLM reads before calling memory_add. The LLM follows them because the interface makes them the path of least resistance. Audited across 2,095 memories: 99.8% grade B+ or better. Only 0.2% needed improvement.
How deduplication works
Two-tier deduplication prevents your memory corpus from filling with redundant entries:
- Hash fast-path — exact text matches are caught instantly by content hashing. Zero overhead, zero false positives
- Semantic similarity — new memories are embedded and compared against existing memories. Above a similarity threshold, the duplicate is rejected and the existing memory is returned so the AI can update it instead
- Update, don't duplicate — when a duplicate is caught, the AI receives the existing memory's ID and content. It can refine or replace the existing memory rather than creating a near-duplicate — the corpus grows in accuracy, not noise
- No false suppression — semantic similarity uses a conservative threshold to avoid blocking genuinely new memories that happen to be on a similar topic. Better to store a near-duplicate than silently drop new knowledge
The result: a corpus that gets more accurate over time as redundant entries are caught and consolidated.
Your AI Stays in Control
Most memory services are black boxes. Your data goes in, something happens behind the scenes, and you hope for the best. You can't see what was stored, can't see what was silently dropped, and can't see what a server-side model decided was “important enough” to keep.
AI-Directed Storage
Your AI decides what to remember. It chooses what to store, how to categorize it, and what to search for. Gnosis never touches that editorial judgment — what goes into your memory is between you and your AI.
The tool-use problem. Getting LLMs to reliably use external tools is one of the hardest problems in AI integration. LLMs don't always call tools when they should. They sometimes hallucinate tool capabilities. They lose track of available tools as conversations grow long and context fills up.
Protocol as guidance. Gnosis solves this through deeply refined tool descriptions that align with how LLMs naturally make decisions. Rather than fighting the model's behavior, the protocol makes good memory practice the path of least resistance. The AI stores what matters because the interface makes it easy to store well and hard to store poorly.
Retrieval That Earns Trust
Speed is table stakes. Trust is the real challenge. Semantic search with cross-encoder reranking delivers sub-100ms results — but if those results are wrong, speed makes the problem worse, not better.
The memory poisoning problem. A factually incorrect memory doesn't just give a wrong answer once — it poisons future conversations. The AI treats retrieved memories as ground truth: if a memory says "use library X" and library X was deprecated six months ago, the AI will confidently recommend it, argue for it, and build on it. One bad memory cascades through every session that retrieves it.
Silence over noise. This is why Gnosis returns nothing rather than returning low-confidence matches. Your AI learns that when Gnosis returns a result, it's worth reading — and when it returns nothing, the information genuinely isn't stored, not that the search failed.
Full Transparency
Every operation is visible. Every memory_add call appears in your conversation. Every search result comes back where you can see it. If your AI stores something wrong, you see it and correct it on the spot.
Why this matters. Memory services that operate invisibly — silently extracting, filtering, or modifying what gets stored — create a system where nobody can audit what the AI "knows." When the AI makes a mistake because of a bad memory, you can't trace it. When the service silently drops something important, you don't notice until the AI forgets it.
Errors are caught at the source. With Gnosis, you see the memory being created, you see the content, you see the topics assigned. If the AI stores something wrong — and it will, because all LLMs make mistakes — you catch it immediately and correct it. The correction replaces the bad memory. The system gets more accurate over time because mistakes are visible.
Data Sovereignty
Processor, not controller. Under GDPR, Gnosis acts on instructions from you and your AI, never on its own judgment. There is no server-side LLM deciding what's "important enough" to keep. No invisible filtering. No editorial layer between your AI and your memories.
Full authority stays with you. What gets stored, how it's organized, when it's deleted, and where it goes. One-click export gives you your entire memory corpus as JSON. Account deletion is permanent and auditable. Your memories are encrypted with keys derived from your own credentials — we hold the data, but you hold the keys.
Constraints that protect. This is a GDPR architecture decision, not a product limitation. A processor that doesn't inspect content can't be compelled to filter content. A system that can't decrypt your data can't be ordered to disclose it.
Token Efficient
Context windows are expensive. Every token spent on memory infrastructure is a token your AI can't use for reasoning. Gnosis is designed to minimize overhead while maximizing the signal your AI receives.
Compressed Topic Landscape
Instead of dumping random memories into context, Gnosis returns a structured map of your knowledge — topic clusters with counts and type distributions. Your AI searches intelligently from this map instead of scanning everything. Less context, better results.
Refined tool descriptions guide LLMs to write memories that are already well-structured and searchable. This means fewer retrieval round-trips, less redundant storage, and higher hit rates on the first search. The efficiency compounds: better memories in means fewer tokens spent finding them later.
Where the token savings come from
Efficiency isn't one optimization — it's a stack of them, each compounding on the last:
- Topic landscape vs random injection — structured discovery replaces the industry-standard approach of injecting random memory samples. The AI gets a map of everything you know, not a random handful
- Preview-then-retrieve — 32 compressed previews in ~400 tokens vs 3,000–5,000 tokens for 10 traditional RAG chunks. An order of magnitude less context for better results
- Optimized response formats — every response format is tuned for the widest range of models. The challenge isn't just compression — it's finding formats that small models parse as reliably as large ones. No nested JSON, no schema negotiation
- Field-specific updates — when refining an existing memory, only the changed fields are rewritten. A topic adjustment costs a fraction of regenerating the entire memory
- Piggybacked initialization — the first search can ride along with the init call, eliminating a network round-trip and the overhead of a separate tool invocation
In benchmarks against OpenAI Memory, LangMem, MemGPT/Letta, and full-context approaches: 26% higher accuracy, 91% lower p95 latency, and 90% fewer session tokens.
The search pipeline
When your AI searches, a multi-stage pipeline finds the best matches in sub-100ms — less time than one thinking token from your LLM:
- Embedding — the search query is converted to a vector representation, the same mathematical space your memories live in
- Vector similarity — finds the nearest memories by meaning, not keywords. A search for "database performance" finds memories about query optimization even if they never use the word "performance"
- Topic matching — a parallel path that finds memories by their topic tags, catching results that vector search might rank lower
- Reciprocal rank fusion — merges the vector and topic results into a single ranked list, combining the strengths of both approaches
- Cross-encoder reranking — a dedicated model reads each candidate alongside your query and scores relevance directly. More accurate than vector similarity alone, because it sees the full text of both query and memory together
The pipeline is adaptive. Small result sets skip reranking entirely — no point scoring 3 candidates when you can return them all. A confidence floor rejects low-quality matches rather than always returning something. Your AI learns that results from Gnosis are worth reading, and that an empty result means the information genuinely isn't stored.
Encryption Architecture
Memory content is encrypted at rest using AES-256 with per-user keys derived via HKDF. Keys exist only in memory during active sessions — they are never written to persistent storage. We cannot decrypt your memories. This is an architectural constraint, not a policy promise.
Vector embeddings are stored unencrypted because similarity search requires mathematical operations on raw vectors. Embeddings are lossy, non-reversible projections — useful for matching, but the original text cannot be directly decoded.
What this means for search
Encrypting content at rest has a deliberate consequence for how search works. Traditional approaches are off the table:
- No full-text search — keyword matching (BM25) requires a plaintext index. Encrypted content can't be indexed. There is no searchable plaintext copy of your memories anywhere in the system
- No lexical fallback — most search systems fall back to keyword matching when semantic search misses. Gnosis can't — the entire retrieval path runs on vector similarity and cross-encoder reranking
- Decryption only at delivery — content is decrypted only for the final results your AI actually receives. The search pipeline itself never sees plaintext — it operates on vectors and scores
The tradeoff is explicit: search quality depends entirely on embedding quality and reranker accuracy. In exchange, your memory content is never exposed in a searchable index. This is why Gnosis invests heavily in reranker quality — it's not an optional refinement, it's the only semantic layer between your query and your memories.
Compliance advantages
Encryption at rest provides concrete legal protections beyond the security benefit:
- GDPR Article 25 — privacy by design and by default
- GDPR Article 32 — encryption at rest is explicitly listed as an appropriate technical measure
- GDPR Article 34(3)(a) — encrypted data breaches do not require individual user notification
- US state safe harbors — multiple state breach notification laws exempt encrypted data
The encryption architecture was designed from day one to support SOC 2 and HIPAA certification. The remaining work is auditing and certification, not redesign.
Full details on our Security page, including threat model and what we do and don't protect against.
Cross-Platform by Default
MCP is an open protocol, and Gnosis implements Streamable HTTP transport with OAuth 2.1 auto-discovery. Most clients just need the URL — https://gnosismemory.com — and handle the rest automatically.
Currently verified across 14+ clients: Claude, ChatGPT, Gemini, Cursor, VS Code, Copilot CLI, Cline, Roo Code, OpenCode, Vibe, Goose, grok-cli, and mcp-remote as a universal bridge. Clients that don't support native HTTP can use mcp-remote as a stdio-to-Streamable-HTTP adapter.
Why cross-platform is harder than it sounds
MCP is a standard, but every client implements it differently. Making one server work reliably across 14+ clients means solving compatibility problems that the protocol specification doesn't cover:
- Transport string fragmentation — VS Code expects
http, Cline expectsstreamableHttp, Roo Code expectsstreamable-http, and others use their own variants. The wrong string causes silent failures with no error message - Auth flow differences — some clients support OAuth auto-discovery natively, others need manual token configuration, and others use bridge adapters like
mcp-remoteto translate between transport types - Config format fragmentation — Claude uses
claude_desktop_config.json, VS Code uses.mcp.json, Gemini CLI usessettings.jsonwith different field names for the same concepts - Mobile sync behavior — ChatGPT mobile inherits MCP config from the web interface automatically, Claude mobile syncs from your account settings. Each platform handles this differently
None of this is glamorous engineering. It's testing every client, documenting every config format, and handling every edge case. The result is that your memories follow you across devices and providers because we've already solved the compatibility problems you'd otherwise hit yourself.
How portability works under the hood
Your memory corpus is stored centrally, authenticated by your OAuth identity. The data format is designed for portability:
- Encrypted content is just bytes — the encrypted payload is storage-agnostic. It can be exported, backed up, or migrated to a different backend without re-encryption
- Vectors are standardized floats — the embedding format is the same mathematical representation used across the industry. Not tied to a proprietary index
- Keys stay with you — encryption keys are derived from your identity, not from the storage system. Your data remains yours regardless of where it's hosted
One-click export downloads your entire corpus as JSON. You can inspect it, back it up, or take it somewhere else.
Roadmap
No promises on dates — just honest priorities.
Launching With
These ship at general availability.
- Self-service account management — browser-based dashboard for export, deletion, and account settings
- One-click data export — download all your memories as JSON
- Self-service account deletion — two-step confirmation, permanent within 30 days
- Memory import — bring existing memories from Claude.ai, ChatGPT, JSON, or plain text
Shared Collections
Right now, memories are strictly per-user. Shared Collections open up team and community use cases with a single primitive.
- Team knowledge bases — private collections shared among team members
- Knowledge packs — public, read-only collections (e.g., framework docs, API references) stored once, linked by many
- Cross-user sharing — publish memories into a shared collection with human-in-the-loop approval
- Storage efficiency — shared collections are stored once, not duplicated per subscriber
All users can subscribe to shared collections. Contributing requires a Pro plan.
Artifacts
Store files, images, and documents alongside your memories. Each artifact gets a searchable summary and topic tags — the full file is retrieved only when your AI needs it. Designed around the same preview-then-retrieve pattern that keeps text search efficient.
Pro Features
Available on Pro plans.
- REST API — direct HTTP access to your memories outside of MCP. Build custom integrations, dashboards, or agentic workflows that read and write memories programmatically
- Agent IDs — separate memory identities for different AI agents under your account. Each agent gets its own memory space while you retain visibility and control
- Skills — reusable prompt templates stored as memories, callable by your AI on demand
- Agent artifacts — structured outputs and configuration files that agents can read and write as part of their workflows
Additional OAuth Providers
Currently Google-only. We plan to add sign-in with GitHub, Microsoft, Apple, and Facebook so you can use whichever identity you already have.
Compliance and Business Tiers
The encryption architecture was designed from day one to support these. The work is auditing and certification, not redesign.
- SOC 2 Type II certification — independent audit of security controls
- HIPAA compliance — Business Associate Agreements for healthcare use cases
- Business-tier SLAs — guaranteed uptime, response times, and support
- Tamper-proof audit trail — cryptographic verification that no one has modified your memories or logs
- Independent penetration testing — third-party security assessment
What We're Not Building
- Server-side LLM — Gnosis is a data processor. Your AI client does the thinking. This is a GDPR architecture decision, not a limitation.
- Memory content auditing — quality comes from protocol design, not surveillance.
- Tracking or analytics on memory content — we can't read your memories, and we intend to keep it that way.