System Architecture

Comprehensive overview of the LangMem system architecture, components, and data flow patterns.

System Overview

High-Level Architecture
graph TB subgraph "Client Applications" A[n8n Workflows] B[Claude Code CLI] C[Custom Applications] end subgraph "API Gateway Layer" D[Memory Service API] E[Authentication Layer] F[Rate Limiting] end subgraph "Core Processing Layer" G[LangMem SDK] H[Memory Manager] I[Context Assembler] J[Hybrid Retrieval Engine] end subgraph "Model Layer" K[Ollama Local LLM] L[Embedding Generator] M[Entity Extractor] end subgraph "Storage Layer" N[Supabase PostgreSQL] O[pgvector Extension] P[Neo4j Graph DB] Q[Vector Indexes] end subgraph "Infrastructure" R[Docker Network] S[Container Orchestration] T[Health Monitoring] end A --> D B --> D C --> D D --> E D --> F D --> G G --> H G --> I G --> J J --> K J --> L J --> M H --> N H --> P N --> O N --> Q R --> S S --> T style G fill:#2563eb,stroke:#1e40af,stroke-width:3px,color:#fff style N fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style P fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style K fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff

Data Flow Architecture

Data Ingestion Flow
sequenceDiagram participant Client participant API as Memory Service API participant LM as LangMem SDK participant OL as Ollama participant SB as Supabase participant N4J as Neo4j Client->>API: POST /v1/ingest API->>LM: Process Document LM->>LM: Text Chunking LM->>OL: Generate Embeddings OL-->>LM: Vector Embeddings LM->>SB: Store Chunks + Embeddings SB-->>LM: Chunk IDs LM->>OL: Extract Entities OL-->>LM: Entity List LM->>N4J: Store Graph Data N4J-->>LM: Graph Node IDs LM->>N4J: Link to Chunk IDs LM-->>API: Ingestion Complete API-->>Client: Success Response
Data Retrieval Flow
sequenceDiagram participant Client participant API as Memory Service API participant LM as LangMem SDK participant OL as Ollama participant SB as Supabase participant N4J as Neo4j Client->>API: POST /v1/context/retrieve API->>LM: Query Processing LM->>OL: Generate Query Embedding OL-->>LM: Query Vector LM->>SB: Vector Similarity Search SB-->>LM: Relevant Chunks LM->>LM: Extract Entities from Chunks LM->>N4J: Graph Traversal Query N4J-->>LM: Related Entities/Facts LM->>LM: Context Assembly LM->>LM: Ranking & Filtering LM-->>API: Augmented Context API-->>Client: Context Response

Component Details

🧠 LangMem SDK

Purpose: Core memory orchestration layer

Key Features:

  • Storage-agnostic memory API
  • Active memory tools
  • Background memory management
  • LangGraph integration

Integration: Coordinates between vector and graph storage

🐘 Supabase + pgvector

Purpose: Vector storage and semantic search

Key Features:

  • 1536-dimensional embeddings
  • HNSW indexing for performance
  • Unified data + vector storage
  • SQL query capabilities

Scale: Handles 1.6M+ embeddings efficiently

🔗 Neo4j Graph Database

Purpose: Relationship storage and graph queries

Key Features:

  • Entity relationship modeling
  • Graph traversal capabilities
  • Community detection algorithms
  • Cypher query language

Integration: Links to Supabase via chunk IDs

🦙 Ollama Local LLM

Purpose: Local model inference and embeddings

Key Features:

  • Privacy-first local processing
  • OpenAI-compatible API
  • Multiple model support
  • Efficient quantization

Models: Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3

Docker Network Architecture

Container Network Topology
graph TB subgraph "localai_network (Bridge)" subgraph "Memory Stack" A[memory_service:8000] B[supabase:5432] C[neo4j:7687] D[ollama:11434] end subgraph "Existing Services" E[n8n:5678] F[Other Services] end end subgraph "External Access" G[Caddy Proxy] H[docs.klas.chat] end A <--> B A <--> C A <--> D A <--> E G --> H G --> A style A fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff style B fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style C fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style D fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff

Database Schema Design

📊 Supabase Schema

CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT NOT NULL,
    embedding VECTOR(1536),
    metadata JSONB,
    source_url TEXT,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);

CREATE INDEX ON documents 
USING gin (metadata);

🔗 Neo4j Schema

// Node Types
CREATE (doc:DocumentChunk {
    id: $uuid,
    supabase_id: $supabase_id,
    title: $title,
    created_at: datetime()
})

CREATE (person:Person {
    name: $name,
    type: "person"
})

CREATE (concept:Concept {
    name: $name,
    type: "concept"
})

// Relationships
CREATE (doc)-[:MENTIONS]->(person)
CREATE (doc)-[:DISCUSSES]->(concept)
CREATE (person)-[:RELATED_TO]->(concept)

Security Architecture

Security Layers
graph TB subgraph "External Layer" A[Caddy Proxy] B[TLS Termination] C[Rate Limiting] end subgraph "API Layer" D[Authentication] E[Authorization] F[Input Validation] end subgraph "Application Layer" G[MCP Resource Indicators] H[API Key Management] I[Session Management] end subgraph "Network Layer" J[Docker Network Isolation] K[Container Security] L[Port Restrictions] end subgraph "Data Layer" M[Database Authentication] N[Encryption at Rest] O[Backup Security] end A --> D B --> D C --> D D --> G E --> G F --> G G --> J H --> J I --> J J --> M K --> M L --> M style A fill:#ef4444,stroke:#dc2626,stroke-width:2px,color:#fff style D fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style G fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style J fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff style M fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff

Performance Considerations

⚡ Vector Search

  • HNSW indexing for sub-second search
  • Dimension optimization (1536)
  • Batch processing for bulk operations
  • Query result caching

🔍 Graph Queries

  • Property and relationship indexing
  • Cypher query optimization
  • Limited traversal depth
  • Result set pagination

🦙 Model Inference

  • Model quantization strategies
  • Embedding batch processing
  • Local GPU acceleration
  • Response caching

Scalability Patterns

Scaling Strategy
graph LR subgraph "Current (Local)" A[Single Node] --> B[Docker Compose] B --> C[Local Resources] end subgraph "Stage 1 (Optimized)" D[Resource Limits] --> E[Connection Pooling] E --> F[Query Optimization] end subgraph "Stage 2 (Distributed)" G[Load Balancer] --> H[Multiple API Instances] H --> I[Shared Storage] end subgraph "Stage 3 (Cloud)" J[Managed Services] --> K[Auto Scaling] K --> L[Multi-Region] end C --> D F --> G I --> J style A fill:#ef4444,stroke:#dc2626,stroke-width:2px,color:#fff style D fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff style G fill:#22c55e,stroke:#16a34a,stroke-width:2px,color:#fff style J fill:#2563eb,stroke:#1e40af,stroke-width:2px,color:#fff

Next Steps

Ready to implement this architecture? Follow our detailed implementation guide.