Technology — Gnosis Memory

How Gnosis Works

Gnosis is a remote cloud MCP server. Your AI assistant connects to it over HTTPS, stores memories, and retrieves them via semantic search. There is no local install, no Docker, no vector database to manage. Just a URL.

Under the hood, four pieces of engineering make this work well.

Topic-Landscape Architecture

The core problem with AI memory is discovery: how does the AI know what you've stored without reading everything? The common approach is to dump a block of arbitrary memories into context at session start and hope something relevant shows up.

Structured Discovery, Not Random Sampling

At session start, init_core_memories returns a structured topic landscape: a map of your knowledge organized into macro-topic clusters with memory counts and active task counts, topic keywords with density, type distributions, active tasks with progress indicators, and your behavioral preferences. Every field has a job.

The AI gets a map and a search engine instead of a random handful of memories. It can see what topics exist, how dense each one is, and what's still open, then search for specifics when it needs them.

How the topic landscape is structured

The landscape is a compressed representation of your entire memory corpus. It includes:

Macro-topic clusters with memory counts and active task counts. Your AI sees gnosis(1501, 3 tasks) and knows there's a large body of knowledge with open work items on that topic
Topic keywords with density counts: a flat list of every topic in your corpus, ranked by frequency, so the AI knows what search terms will find results
Type distributions: how many facts, decisions, tasks, and preferences exist, giving the AI a sense of corpus shape
Active tasks with progress indicators. Incomplete work appears automatically
Behavioral preferences: communication style, workflow rules, and constraints that shape the AI's behavior from turn one

The format is content-agnostic. It works identically whether your corpus is technical, personal, or a mix. The AI doesn't need to read your memories to know what you've stored: it navigates the landscape and searches when it needs specifics.

Preview-then-retrieve: how search stays efficient

When your AI searches, Gnosis returns compressed previews, not full memories. Your AI scans the previews, picks the ones it needs, and retrieves only those in full:

Breadth first: 32 previews fit in roughly 400 tokens. A typical memory system returns 10 full chunks at 3,000–5,000 tokens, most of which the AI ignores
Depth on demand: your AI reads the previews, identifies the 2–3 it actually needs, and retrieves only those. The AI decides what to read, not the system
Piggybacked initialization: the first search can ride along with the init call, eliminating a network round-trip and the token overhead of a separate tool invocation

In most memory systems, the server guesses which chunks are relevant. In Gnosis, your AI picks from previews cheap enough to scan in bulk.

Quality By Design

Gnosis has no server-side LLM. It is a GDPR data processor: it stores what it's told to store. An internal audit still found 99.8% of stored content grading B+ or better. Here's why.

Protocol-Guided Quality

Every MCP tool description guides the calling LLM toward writing structured, specific, searchable memories. The descriptions encode creation guidelines, topic conventions, type taxonomy, and quality heuristics directly into the tool schema, which is the same schema the LLM reads before deciding what to write.

This works even with small 8B-parameter models. There is no quality gate. The interface itself makes good output the path of least resistance, and server-side deduplication prevents redundant storage. The result is a clean memory corpus without any content filtering.

What makes a high-quality memory

Every memory is guided toward a specific structure:

Front-loaded summary: the opening sentence must be an executive summary that stands alone. An AI deciding whether to retrieve a memory reads only the preview. If the summary is vague, the memory is effectively invisible
Type discipline: each memory is classified as a fact, preference, decision, path, or task. Each type has structural requirements that make it searchable in predictable ways
Topic keywords: single lowercase words chosen to match future search queries. If searching "redis" should find a memory about caching, "redis" must be in the topics, even if the memory is primarily about sessions
Self-contained content: present tense, includes the rationale, names the subject. No memory should require reading another memory to understand

These conventions aren't enforced by a server-side filter. They're embedded in the tool descriptions that every LLM reads before calling memory_add. The LLM follows them because the interface makes them easier than the alternative. Internal audit across 2,095 memories: 99.8% graded B+ or better. Only 0.2% needed improvement.

Meeting models halfway

The protocol design guides quality on the input side. What about when a model gets the format wrong despite having the right intent?

Small models, and even frontier models having a bad day, regularly produce close-but-wrong tool calls: memory_replace instead of memory_edit, new_content instead of content, topics as a comma-separated string instead of an array, or arrays double-serialized as JSON strings. Each mistake normally costs a round trip: the model gets an error, spends tokens understanding it, and tries again.

Gnosis intercepts these at a normalization layer before they reach the service. Hallucinated tool names map to the correct tool. Wrong parameter names are remapped. Malformed arrays are recovered. Invalid type values resolve to their closest valid equivalent. Production logs show 30+ distinct error patterns that this layer catches, each one a saved round trip.

How deduplication works

Two-tier deduplication prevents your memory corpus from filling with redundant entries:

Hash fast-path: exact text matches are caught instantly by content hashing. Zero overhead, zero false positives
Semantic similarity: new memories are embedded and compared against existing memories. Above a similarity threshold, the duplicate is rejected and the existing memory is returned so the AI can update it instead
Update, don't duplicate: when a duplicate is caught, the AI receives the existing memory's ID and content. The AI can refine or replace the existing memory rather than creating a near-duplicate. The corpus grows in accuracy, not noise
No false suppression: semantic similarity uses a conservative threshold to avoid blocking genuinely new memories that happen to be on a similar topic. Better to store a near-duplicate than silently drop new knowledge

The corpus gets more accurate over time as redundant entries are caught and consolidated.

Your AI Stays in Control

With Gnosis, every storage decision is the AI's, every operation is visible in your conversation, and nothing is filtered or rewritten on the server. You can read what was stored, you can see what was rejected as a duplicate, and you can correct a bad memory the moment you see it.

AI-Directed Storage

Your AI decides what to remember. The AI chooses what to store, how to categorize it, and what to search for. Gnosis never touches that editorial judgment. What goes into your memory is between you and your AI.

The tool-use problem. Getting LLMs to reliably use external tools is one of the harder problems in AI integration. LLMs don't always call tools when they should. They sometimes hallucinate tool capabilities. They lose track of available tools as conversations grow long and context fills up.

Protocol as guidance. Gnosis addresses this through tool descriptions that align with how LLMs make decisions. Rather than fighting the model's behavior, the protocol makes good memory practice the path of least resistance. The AI stores what matters because the interface makes it easier to store well than to store poorly.

Retrieval That Earns Trust

Fast search isn't enough on its own. Semantic search with cross-encoder reranking delivers sub-150ms results. If those results are wrong, speed makes the problem worse, not better.

The memory poisoning problem. A factually incorrect memory doesn't just give a wrong answer once. It poisons future conversations. The AI treats retrieved memories as ground truth: if a memory says "use library X" and library X was deprecated six months ago, the AI will confidently recommend it, argue for it, and build on it. One bad memory cascades through every session that retrieves it.

Silence over noise. This is why Gnosis returns nothing rather than returning low-confidence matches. When Gnosis returns a result, the result is worth reading. When Gnosis returns nothing, the information genuinely isn't stored, not that the search failed.

Full Transparency

Every operation is visible. Every memory_add call appears in your conversation as a visible tool call. Every search result comes back where you can read it. If your AI stores something wrong, you see the mistake and correct it on the spot.

Why this matters. When storage and retrieval happen out of view, mistakes are hard to trace. A bad memory keeps poisoning future answers, and you only notice when the AI forgets something it should still know.

Errors are caught at the source. With Gnosis, you see the memory being created, you see the content, you see the topics assigned. All LLMs make mistakes. When yours does, you catch the error immediately and correct it. The correction replaces the bad memory, and the system gets more accurate over time because mistakes are visible.

Data Sovereignty

Processor, not controller. Under GDPR, Gnosis acts on instructions from you and your AI, never on its own judgment. There is no server-side LLM deciding what's "important enough" to keep. No invisible filtering. No editorial layer between your AI and your memories.

Full authority stays with you. You control what gets stored, how memories are organized, when they're deleted, and where they go. One-click export gives you your entire memory corpus as JSON. Account deletion is permanent and auditable. Your memories are encrypted with keys derived from your own credentials: Gnosis holds the data, but you hold the keys.

Constraints that protect. This is a GDPR architecture decision, not a product limitation. A processor that doesn't inspect content can't be compelled to filter content. With Advanced Protection enabled, a system that can't decrypt your data can't be ordered to disclose it. Without a PIN, your memories have strong encryption with two-party custody, so that content can be disclosed under a valid subpoena or court order.

Token Efficient

Every token a memory layer spends is a token the model can't spend on the task. Gnosis is designed to keep that overhead small and the signal high.

Compressed Topic Landscape

Instead of dumping random memories into context, Gnosis returns a structured map of your knowledge: topic clusters with counts and type distributions. Your AI searches from this map instead of scanning everything. Less context, better results.

Tool descriptions guide LLMs to write memories that are well-structured and searchable from the start. That means fewer retrieval round-trips, less redundant storage, and higher hit rates on the first search.

Where the token savings come from

Several optimizations contribute, each compounding on the last:

Topic landscape vs random injection: structured discovery replaces the common approach of injecting random memory samples. The AI gets a map of everything you know, not a random handful
Preview-then-retrieve: 32 compressed previews in ~400 tokens vs 3,000–5,000 tokens for 10 full chunks from a typical memory system. An order of magnitude less context for better results
Response formats tuned for small models: the harder problem isn't compression, it's finding formats that small models parse as reliably as large ones. No nested JSON, no schema negotiation
Field-specific updates: when refining an existing memory, only the changed fields are rewritten. A topic adjustment costs a fraction of regenerating the entire memory
Piggybacked initialization: the first search can ride along with the init call, eliminating a network round-trip and the overhead of a separate tool invocation

Production search latency is sub-150ms p95. Topic landscape initialization delivers a complete knowledge map in a design target of ~150 tokens, where comparable systems use ~800 tokens of random samples (measured across Claude, Gemini, and ChatGPT session-start memory dumps).

The search pipeline

When your AI searches, a multi-stage pipeline finds the best matches in sub-150ms:

Embedding: the search query is converted to a vector representation in the same mathematical space your memories live in. The embedding model supports 100+ languages natively, so cross-lingual search works without translation
Vector similarity: finds the nearest memories by meaning, not keywords. A search for "database performance" finds memories about query optimization even if they never use the word "performance"
Topic matching: a parallel path that finds memories by their topic tags, catching results that vector search might rank lower
Reciprocal rank fusion: merges the vector and topic results into a single ranked list, combining the strengths of both approaches
Cross-encoder reranking: a dedicated model reads each candidate alongside your query and scores relevance directly. Like the embedding model, the reranker supports 100+ languages. More accurate than vector similarity alone, because the reranker sees the full text of both query and memory together

The pipeline is adaptive. Small result sets skip reranking entirely, since there's no point scoring 3 candidates when Gnosis can return them all. A confidence floor rejects low-quality matches rather than always returning something. An empty result means the information genuinely isn't stored, not that the search failed.

Encryption Architecture

Memory content is encrypted at rest using AES-256-GCM with per-user keys. How those keys are derived depends on whether you've set a PIN. Without a PIN: session keys are derived from your OAuth credentials by the Cloudflare Worker and delivered to the storage service per-request. They are ephemeral at the storage layer and never persisted there. Gnosis and Cloudflare acting together can derive these keys (two-party custody), so that content can be disclosed under a valid subpoena or court order. Advanced Protection (user PIN, OPAQUE zero-knowledge): Gnosis cannot decrypt. An architectural constraint, not a policy promise.

Vector embeddings are stored unencrypted because similarity search requires mathematical operations on raw vectors. Embeddings are lossy, non-reversible projections: useful for matching, but the original text cannot be directly decoded from an embedding.

What this means for search

Encrypting content at rest has a deliberate consequence for how search works. Traditional approaches are off the table:

No full-text search: keyword matching (BM25) requires a plaintext index. Encrypted content can't be indexed. There is no searchable plaintext copy of your memories anywhere in the system
No lexical fallback: most search systems fall back to keyword matching when semantic search misses. Gnosis can't do that. The entire retrieval path runs on vector similarity and cross-encoder reranking
Decryption only at delivery: content is decrypted only for the final results your AI actually receives. The search pipeline itself never sees plaintext; it operates on vectors and scores

The tradeoff is explicit: search quality depends entirely on embedding quality and reranker accuracy. In exchange, your memory content is never exposed in a searchable index. This is why Gnosis invests heavily in reranker quality. The reranker isn't an optional refinement; it's the only semantic layer between your query and your memories.

Compliance advantages

Encryption at rest provides concrete legal protections beyond the security benefit:

GDPR Article 25: privacy by design and by default
GDPR Article 32: encryption at rest is explicitly listed as an appropriate technical measure
GDPR Article 34(3)(a): encrypted data breaches do not require individual user notification
US state safe harbors: multiple state breach notification laws exempt encrypted data

Architected for SOC 2 and HIPAA controls. Certification has not been completed and no BAA is offered.

Full details on our Security page, including threat model and what Gnosis does and doesn't protect against.

Cross-Platform by Default

MCP is an open protocol, and Gnosis implements Streamable HTTP transport with OAuth 2.1 auto-discovery. Most clients just need the URL, https://gnosismemory.com, and handle the rest automatically.

Currently verified across 13+ clients: Claude, ChatGPT, Gemini, Cursor, VS Code, Copilot CLI, Cline, Roo Code, OpenCode, Vibe, Goose, grok-cli, and mcp-remote as a universal bridge. Clients that don't support native HTTP can use mcp-remote as a stdio-to-Streamable-HTTP adapter.

Why cross-platform is harder than it sounds

MCP is a standard, but every client implements it differently. Making one server work reliably across 13+ clients means solving compatibility problems that the protocol specification doesn't cover:

Transport string fragmentation: VS Code expects http, Cline expects streamableHttp, Roo Code expects streamable-http, and others use their own variants. The wrong string causes silent failures with no error message
Auth flow differences: some clients support OAuth auto-discovery natively, others need manual token configuration, and others use bridge adapters like mcp-remote to translate between transport types
Config format fragmentation: Claude uses claude_desktop_config.json, VS Code uses .mcp.json, Gemini CLI uses settings.json with different field names for the same concepts
Mobile sync behavior: ChatGPT mobile inherits MCP config from the web interface automatically, Claude mobile syncs from your account settings. Each platform handles sync differently

The work is testing every client, documenting every config format, and handling every edge case. Your memories follow you across devices and providers because the compatibility problems have already been solved.

Small models welcome

The entire interface was designed and tested with models as small as 8 billion parameters running on consumer GPUs. Natural language descriptions instead of rigid JSON schemas. Active instructions instead of passive suggestions. Three required fields, server-computed everything else.

Beyond the interface design, an intent acceptance layer catches common formatting mistakes that would otherwise cost a round trip: hallucinated tool names route to the correct tool, misspelled parameter names are remapped, and malformed arrays are recovered from double-serialization. All three required fields still need to be present; the normalizer fixes how they're formatted, not whether they exist.

A model that can read a tool description and fill in three fields can use Gnosis. Frontier models aren't required.

How portability works under the hood

Your memory corpus is stored centrally, authenticated by your OAuth identity. The data format is designed for portability:

Encrypted content is just bytes: the encrypted payload is storage-agnostic. Your data can be exported, backed up, or migrated to a different backend without re-encryption
Vectors are standardized floats: the embedding format is the same mathematical representation used across the industry. Not tied to a proprietary index
Keys stay with you: encryption keys are derived from your identity, not from the storage system. Your data remains yours regardless of where it's hosted

One-click export downloads your entire corpus as JSON. You can inspect your data, back it up, or take it somewhere else.