Architecture - LangMem Documentation

System Overview

High-Level Architecture

graph TB subgraph "Client Applications" A[n8n Workflows] B[Claude Code CLI] C[Custom Applications] end subgraph "API Gateway Layer" D[Memory Service API] E[Authentication Layer] F[Rate Limiting] end subgraph "Core Processing Layer" G[LangMem SDK] H[Memory Manager] I[Context Assembler] J[Hybrid Retrieval Engine] end subgraph "Model Layer" K[Ollama Local LLM] L[Embedding Generator] M[Entity Extractor] end subgraph "Storage Layer" N[Supabase PostgreSQL] O[pgvector Extension] P[Neo4j Graph DB] Q[Vector Indexes] end subgraph "Infrastructure" R[Docker Network] S[Container Orchestration] T[Health Monitoring] end A --> D B --> D C --> D D --> E D --> F D --> G G --> H G --> I G --> J J --> K J --> L J --> M H --> N H --> P N --> O N --> Q R --> S S --> T style G fill:#2563eb,stroke:#1e40af,stroke-width:3px,color:#fff style N fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style P fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style K fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff

Data Flow Architecture

Data Ingestion Flow

sequenceDiagram participant Client participant API as Memory Service API participant LM as LangMem SDK participant OL as Ollama participant SB as Supabase participant N4J as Neo4j Client->>API: POST /v1/ingest API->>LM: Process Document LM->>LM: Text Chunking LM->>OL: Generate Embeddings OL-->>LM: Vector Embeddings LM->>SB: Store Chunks + Embeddings SB-->>LM: Chunk IDs LM->>OL: Extract Entities OL-->>LM: Entity List LM->>N4J: Store Graph Data N4J-->>LM: Graph Node IDs LM->>N4J: Link to Chunk IDs LM-->>API: Ingestion Complete API-->>Client: Success Response

Data Retrieval Flow

sequenceDiagram participant Client participant API as Memory Service API participant LM as LangMem SDK participant OL as Ollama participant SB as Supabase participant N4J as Neo4j Client->>API: POST /v1/context/retrieve API->>LM: Query Processing LM->>OL: Generate Query Embedding OL-->>LM: Query Vector LM->>SB: Vector Similarity Search SB-->>LM: Relevant Chunks LM->>LM: Extract Entities from Chunks LM->>N4J: Graph Traversal Query N4J-->>LM: Related Entities/Facts LM->>LM: Context Assembly LM->>LM: Ranking & Filtering LM-->>API: Augmented Context API-->>Client: Context Response

Component Details

🧠 LangMem SDK

Purpose: Core memory orchestration layer

Key Features:

Storage-agnostic memory API
Active memory tools
Background memory management
LangGraph integration

Integration: Coordinates between vector and graph storage

🐘 Supabase + pgvector

Purpose: Vector storage and semantic search

Key Features:

1536-dimensional embeddings
HNSW indexing for performance
Unified data + vector storage
SQL query capabilities

Scale: Handles 1.6M+ embeddings efficiently

🔗 Neo4j Graph Database

Purpose: Relationship storage and graph queries

Key Features:

Entity relationship modeling
Graph traversal capabilities
Community detection algorithms
Cypher query language

Integration: Links to Supabase via chunk IDs

🦙 Ollama Local LLM

Purpose: Local model inference and embeddings

Key Features:

Privacy-first local processing
OpenAI-compatible API
Multiple model support
Efficient quantization

Models: Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3

Docker Network Architecture

Container Network Topology

graph TB subgraph "localai_network (Bridge)" subgraph "Memory Stack" A[memory_service:8000] B[supabase:5432] C[neo4j:7687] D[ollama:11434] end subgraph "Existing Services" E[n8n:5678] F[Other Services] end end subgraph "External Access" G[Caddy Proxy] H[docs.klas.chat] end A <--> B A <--> C A <--> D A <--> E G --> H G --> A style A fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff style B fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style D fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff

Database Schema Design

📊 Supabase Schema

CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT NOT NULL,
    embedding VECTOR(1536),
    metadata JSONB,
    source_url TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);

CREATE INDEX ON documents 
USING gin (metadata);

🔗 Neo4j Schema

// Node Types
CREATE (doc:DocumentChunk {
    id: $uuid,
    supabase_id: $supabase_id,
    title: $title,
    created_at: datetime()
})

CREATE (person:Person {
    name: $name,
    type: "person"
})

CREATE (concept:Concept {
    name: $name,
    type: "concept"
})

// Relationships
CREATE (doc)-[:MENTIONS]->(person)
CREATE (doc)-[:DISCUSSES]->(concept)
CREATE (person)-[:RELATED_TO]->(concept)

Security Architecture

Security Layers

graph TB subgraph "External Layer" A[Caddy Proxy] B[TLS Termination] C[Rate Limiting] end subgraph "API Layer" D[Authentication] E[Authorization] F[Input Validation] end subgraph "Application Layer" G[MCP Resource Indicators] H[API Key Management] I[Session Management] end subgraph "Network Layer" J[Docker Network Isolation] K[Container Security] L[Port Restrictions] end subgraph "Data Layer" M[Database Authentication] N[Encryption at Rest] O[Backup Security] end A --> D B --> D C --> D D --> G E --> G F --> G G --> J H --> J I --> J J --> M K --> M L --> M style A fill:#ef4444,stroke:#dc2626,stroke-width:2px,color:#fff style D fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style G fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style J fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff style M fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff

Performance Considerations

⚡ Vector Search

HNSW indexing for sub-second search
Dimension optimization (1536)
Batch processing for bulk operations
Query result caching

🔍 Graph Queries

Property and relationship indexing
Cypher query optimization
Limited traversal depth
Result set pagination

🦙 Model Inference

Model quantization strategies
Embedding batch processing
Local GPU acceleration
Response caching

Scalability Patterns

Scaling Strategy

graph LR subgraph "Current (Local)" A[Single Node] --> B[Docker Compose] B --> C[Local Resources] end subgraph "Stage 1 (Optimized)" D[Resource Limits] --> E[Connection Pooling] E --> F[Query Optimization] end subgraph "Stage 2 (Distributed)" G[Load Balancer] --> H[Multiple API Instances] H --> I[Shared Storage] end subgraph "Stage 3 (Cloud)" J[Managed Services] --> K[Auto Scaling] K --> L[Multi-Region] end C --> D F --> G I --> J style A fill:#ef4444,stroke:#dc2626,stroke-width:2px,color:#fff style D fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style G fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style J fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff

System Architecture

System Overview

Data Flow Architecture

Component Details

🧠 LangMem SDK

🐘 Supabase + pgvector

🔗 Neo4j Graph Database

🦙 Ollama Local LLM

Docker Network Architecture

Database Schema Design

📊 Supabase Schema

🔗 Neo4j Schema

Security Architecture

Performance Considerations

⚡ Vector Search

🔍 Graph Queries

🦙 Model Inference

Scalability Patterns

Next Steps