System Architecture
Comprehensive overview of the LangMem system architecture, components, and data flow patterns.
System Overview
High-Level Architecture
graph TB
subgraph "Client Applications"
A[n8n Workflows]
B[Claude Code CLI]
C[Custom Applications]
end
subgraph "API Gateway Layer"
D[Memory Service API]
E[Authentication Layer]
F[Rate Limiting]
end
subgraph "Core Processing Layer"
G[LangMem SDK]
H[Memory Manager]
I[Context Assembler]
J[Hybrid Retrieval Engine]
end
subgraph "Model Layer"
K[Ollama Local LLM]
L[Embedding Generator]
M[Entity Extractor]
end
subgraph "Storage Layer"
N[Supabase PostgreSQL]
O[pgvector Extension]
P[Neo4j Graph DB]
Q[Vector Indexes]
end
subgraph "Infrastructure"
R[Docker Network]
S[Container Orchestration]
T[Health Monitoring]
end
A --> D
B --> D
C --> D
D --> E
D --> F
D --> G
G --> H
G --> I
G --> J
J --> K
J --> L
J --> M
H --> N
H --> P
N --> O
N --> Q
R --> S
S --> T
style G fill:#2563eb,stroke:#1e40af,stroke-width:3px,color:#fff
style N fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff
style P fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
style K fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff
Data Flow Architecture
Data Ingestion Flow
sequenceDiagram
participant Client
participant API as Memory Service API
participant LM as LangMem SDK
participant OL as Ollama
participant SB as Supabase
participant N4J as Neo4j
Client->>API: POST /v1/ingest
API->>LM: Process Document
LM->>LM: Text Chunking
LM->>OL: Generate Embeddings
OL-->>LM: Vector Embeddings
LM->>SB: Store Chunks + Embeddings
SB-->>LM: Chunk IDs
LM->>OL: Extract Entities
OL-->>LM: Entity List
LM->>N4J: Store Graph Data
N4J-->>LM: Graph Node IDs
LM->>N4J: Link to Chunk IDs
LM-->>API: Ingestion Complete
API-->>Client: Success Response
Data Retrieval Flow
sequenceDiagram
participant Client
participant API as Memory Service API
participant LM as LangMem SDK
participant OL as Ollama
participant SB as Supabase
participant N4J as Neo4j
Client->>API: POST /v1/context/retrieve
API->>LM: Query Processing
LM->>OL: Generate Query Embedding
OL-->>LM: Query Vector
LM->>SB: Vector Similarity Search
SB-->>LM: Relevant Chunks
LM->>LM: Extract Entities from Chunks
LM->>N4J: Graph Traversal Query
N4J-->>LM: Related Entities/Facts
LM->>LM: Context Assembly
LM->>LM: Ranking & Filtering
LM-->>API: Augmented Context
API-->>Client: Context Response
Component Details
🧠 LangMem SDK
Purpose: Core memory orchestration layer
Key Features:
- Storage-agnostic memory API
- Active memory tools
- Background memory management
- LangGraph integration
Integration: Coordinates between vector and graph storage
🐘 Supabase + pgvector
Purpose: Vector storage and semantic search
Key Features:
- 1536-dimensional embeddings
- HNSW indexing for performance
- Unified data + vector storage
- SQL query capabilities
Scale: Handles 1.6M+ embeddings efficiently
🔗 Neo4j Graph Database
Purpose: Relationship storage and graph queries
Key Features:
- Entity relationship modeling
- Graph traversal capabilities
- Community detection algorithms
- Cypher query language
Integration: Links to Supabase via chunk IDs
🦙 Ollama Local LLM
Purpose: Local model inference and embeddings
Key Features:
- Privacy-first local processing
- OpenAI-compatible API
- Multiple model support
- Efficient quantization
Models: Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3
Docker Network Architecture
Container Network Topology
graph TB
subgraph "localai_network (Bridge)"
subgraph "Memory Stack"
A[memory_service:8000]
B[supabase:5432]
C[neo4j:7687]
D[ollama:11434]
end
subgraph "Existing Services"
E[n8n:5678]
F[Other Services]
end
end
subgraph "External Access"
G[Caddy Proxy]
H[docs.klas.chat]
end
A <--> B
A <--> C
A <--> D
A <--> E
G --> H
G --> A
style A fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff
style B fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff
style C fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
style D fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff
Database Schema Design
📊 Supabase Schema
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(1536),
metadata JSONB,
source_url TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
CREATE INDEX ON documents
USING gin (metadata);
🔗 Neo4j Schema
// Node Types
CREATE (doc:DocumentChunk {
id: $uuid,
supabase_id: $supabase_id,
title: $title,
created_at: datetime()
})
CREATE (person:Person {
name: $name,
type: "person"
})
CREATE (concept:Concept {
name: $name,
type: "concept"
})
// Relationships
CREATE (doc)-[:MENTIONS]->(person)
CREATE (doc)-[:DISCUSSES]->(concept)
CREATE (person)-[:RELATED_TO]->(concept)
Security Architecture
Security Layers
graph TB
subgraph "External Layer"
A[Caddy Proxy]
B[TLS Termination]
C[Rate Limiting]
end
subgraph "API Layer"
D[Authentication]
E[Authorization]
F[Input Validation]
end
subgraph "Application Layer"
G[MCP Resource Indicators]
H[API Key Management]
I[Session Management]
end
subgraph "Network Layer"
J[Docker Network Isolation]
K[Container Security]
L[Port Restrictions]
end
subgraph "Data Layer"
M[Database Authentication]
N[Encryption at Rest]
O[Backup Security]
end
A --> D
B --> D
C --> D
D --> G
E --> G
F --> G
G --> J
H --> J
I --> J
J --> M
K --> M
L --> M
style A fill:#ef4444,stroke:#dc2626,stroke-width:2px,color:#fff
style D fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
style G fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff
style J fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff
style M fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff
Performance Considerations
⚡ Vector Search
- HNSW indexing for sub-second search
- Dimension optimization (1536)
- Batch processing for bulk operations
- Query result caching
🔍 Graph Queries
- Property and relationship indexing
- Cypher query optimization
- Limited traversal depth
- Result set pagination
🦙 Model Inference
- Model quantization strategies
- Embedding batch processing
- Local GPU acceleration
- Response caching
Scalability Patterns
Scaling Strategy
graph LR
subgraph "Current (Local)"
A[Single Node] --> B[Docker Compose]
B --> C[Local Resources]
end
subgraph "Stage 1 (Optimized)"
D[Resource Limits] --> E[Connection Pooling]
E --> F[Query Optimization]
end
subgraph "Stage 2 (Distributed)"
G[Load Balancer] --> H[Multiple API Instances]
H --> I[Shared Storage]
end
subgraph "Stage 3 (Cloud)"
J[Managed Services] --> K[Auto Scaling]
K --> L[Multi-Region]
end
C --> D
F --> G
I --> J
style A fill:#ef4444,stroke:#dc2626,stroke-width:2px,color:#fff
style D fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
style G fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff
style J fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff
Next Steps
Ready to implement this architecture? Follow our detailed implementation guide.