Add Ollama support for local LLM models (Phase 2 complete)

Major Changes: - Added Ollama as alternative LLM provider to OpenAI - Implemented flexible provider switching via environment variables - Support for multiple embedding models (OpenAI and Ollama) - Created comprehensive Ollama setup guide Configuration Changes (config.py): - Added LLM_PROVIDER and EMBEDDER_PROVIDER settings - Added Ollama configuration: base URL, LLM model, embedding model - Modified get_mem0_config() to dynamically switch providers - OpenAI API key now optional when using Ollama - Added validation to ensure required keys based on provider Supported Configurations: 1. Full OpenAI (default): - LLM_PROVIDER=openai - EMBEDDER_PROVIDER=openai 2. Full Ollama (local): - LLM_PROVIDER=ollama - EMBEDDER_PROVIDER=ollama 3. Hybrid configurations: - Ollama LLM + OpenAI embeddings - OpenAI LLM + Ollama embeddings Ollama Models Supported: - LLM: llama3.1:8b, llama3.1:70b, mistral:7b, codellama:7b, phi3:3.8b - Embeddings: nomic-embed-text, mxbai-embed-large, all-minilm Documentation: - Created docs/setup/ollama.mdx - Complete Ollama setup guide - Installation methods (host and Docker) - Model selection and comparison - Docker Compose configuration - Performance tuning and GPU acceleration - Migration guide from OpenAI - Troubleshooting section - Updated README.md with Ollama features - Updated .env.example with provider selection - Marked Phase 2 as complete in roadmap Environment Variables: - LLM_PROVIDER: Select LLM provider (openai/ollama) - EMBEDDER_PROVIDER: Select embedding provider (openai/ollama) - OLLAMA_BASE_URL: Ollama API endpoint (default: http://localhost:11434) - OLLAMA_LLM_MODEL: Ollama model for text generation - OLLAMA_EMBEDDING_MODEL: Ollama model for embeddings - MEM0_EMBEDDING_DIMS: Must match embedding model dimensions Breaking Changes: - None - defaults to OpenAI for backward compatibility Migration Notes: - When switching from OpenAI to Ollama embeddings, existing embeddings must be cleared due to dimension changes (1536 → 768 for nomic-embed-text) - Update MEM0_EMBEDDING_DIMS to match chosen embedding model Benefits: ✅ Cost savings - no API costs with local models ✅ Privacy - all data stays local ✅ Offline capability - works without internet ✅ Model variety - access to many open-source models ✅ Flexibility - easy switching between providers Version: 1.1.0 Status: Phase 2 Complete - Production Ready with Ollama Support 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-15 16:07:17 +02:00
parent 1998bef6f4
commit fa9d3d8a6b
4 changed files with 621 additions and 41 deletions
--- a/docs/setup/ollama.mdx
+++ b/docs/setup/ollama.mdx
@@ -0,0 +1,474 @@
+---
+title: 'Ollama Setup'
+description: 'Use local LLM models with Ollama instead of OpenAI'
+---
+
+# Ollama Setup Guide
+
+T6 Mem0 v2 supports both OpenAI and Ollama as LLM providers. Use Ollama to run completely local models without requiring OpenAI API credits.
+
+## Why Ollama?
+
+- **Cost-effective**: No API costs, run models locally
+- **Privacy**: All data stays on your infrastructure
+- **Offline capability**: Works without internet connection
+- **Model variety**: Access to Llama, Mistral, and other open-source models
+
+## Prerequisites
+
+- Docker and Docker Compose (if using containerized deployment)
+- Or Ollama installed locally
+- Sufficient RAM (8GB+ for smaller models, 16GB+ recommended)
+- GPU optional but recommended for better performance
+
+## Installation
+
+### Option 1: Ollama on Host Machine
+
+**Install Ollama:**
+
+```bash
+# Linux
+curl -fsSL https://ollama.com/install.sh | sh
+
+# macOS
+brew install ollama
+
+# Or download from https://ollama.com/download
+```
+
+**Start Ollama service:**
+
+```bash
+ollama serve
+```
+
+**Pull required models:**
+
+```bash
+# LLM model (choose one)
+ollama pull llama3.1:8b      # 8B parameters, 4.7GB
+ollama pull llama3.1:70b     # 70B parameters, 40GB (requires 48GB RAM)
+ollama pull mistral:7b       # 7B parameters, 4.1GB
+
+# Embedding model (required)
+ollama pull nomic-embed-text # 274MB
+```
+
+### Option 2: Ollama in Docker
+
+**Add to docker-compose.yml:**
+
+```yaml
+services:
+  ollama:
+    image: ollama/ollama:latest
+    container_name: t6-ollama
+    ports:
+      - "11434:11434"
+    volumes:
+      - ollama_data:/root/.ollama
+    networks:
+      - localai
+    restart: unless-stopped
+
+volumes:
+  ollama_data:
+```
+
+**Pull models inside container:**
+
+```bash
+docker exec -it t6-ollama ollama pull llama3.1:8b
+docker exec -it t6-ollama ollama pull nomic-embed-text
+```
+
+## Configuration
+
+### Environment Variables
+
+Update your `.env` file:
+
+```bash
+# Switch to Ollama
+LLM_PROVIDER=ollama
+EMBEDDER_PROVIDER=ollama
+
+# Ollama configuration
+OLLAMA_BASE_URL=http://localhost:11434
+OLLAMA_LLM_MODEL=llama3.1:8b
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+
+# OpenAI key no longer required
+# OPENAI_API_KEY=  # Can be left empty
+```
+
+### Docker Network Configuration
+
+If running Ollama in Docker on the same network as mem0:
+
+```bash
+# Find Ollama container IP
+docker inspect t6-ollama --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
+
+# Update .env
+OLLAMA_BASE_URL=http://172.21.0.15:11434  # Use actual container IP
+```
+
+Or use Docker service name:
+
+```bash
+OLLAMA_BASE_URL=http://ollama:11434  # If on same Docker network
+```
+
+## Model Selection
+
+### LLM Models
+
+| Model | Size | RAM Required | Use Case |
+|-------|------|--------------|----------|
+| `llama3.1:8b` | 4.7GB | 8GB | General purpose, fast |
+| `llama3.1:70b` | 40GB | 48GB | High quality responses |
+| `mistral:7b` | 4.1GB | 8GB | Fast, efficient |
+| `codellama:7b` | 3.8GB | 8GB | Code generation |
+| `phi3:3.8b` | 2.3GB | 4GB | Smallest viable model |
+
+### Embedding Models
+
+| Model | Size | Dimensions | Use Case |
+|-------|------|------------|----------|
+| `nomic-embed-text` | 274MB | 768 | Recommended, fast |
+| `mxbai-embed-large` | 669MB | 1024 | Higher quality |
+| `all-minilm` | 46MB | 384 | Smallest option |
+
+**Important**: Update `MEM0_EMBEDDING_DIMS` to match your embedding model:
+
+```bash
+# For nomic-embed-text
+MEM0_EMBEDDING_DIMS=768
+
+# For mxbai-embed-large
+MEM0_EMBEDDING_DIMS=1024
+
+# For all-minilm
+MEM0_EMBEDDING_DIMS=384
+```
+
+## Switching Between OpenAI and Ollama
+
+### Full Ollama Configuration
+
+```bash
+LLM_PROVIDER=ollama
+EMBEDDER_PROVIDER=ollama
+OLLAMA_BASE_URL=http://localhost:11434
+OLLAMA_LLM_MODEL=llama3.1:8b
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+MEM0_EMBEDDING_DIMS=768
+```
+
+### Hybrid Configuration
+
+Use Ollama for LLM but OpenAI for embeddings:
+
+```bash
+LLM_PROVIDER=ollama
+EMBEDDER_PROVIDER=openai
+OLLAMA_BASE_URL=http://localhost:11434
+OLLAMA_LLM_MODEL=llama3.1:8b
+OPENAI_API_KEY=sk-your-key
+MEM0_EMBEDDING_DIMS=1536  # OpenAI dimensions
+```
+
+Or use OpenAI for LLM but Ollama for embeddings:
+
+```bash
+LLM_PROVIDER=openai
+EMBEDDER_PROVIDER=ollama
+OPENAI_API_KEY=sk-your-key
+OLLAMA_BASE_URL=http://localhost:11434
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+MEM0_EMBEDDING_DIMS=768  # Ollama dimensions
+```
+
+### Back to OpenAI
+
+```bash
+LLM_PROVIDER=openai
+EMBEDDER_PROVIDER=openai
+OPENAI_API_KEY=sk-your-key
+MEM0_EMBEDDING_DIMS=1536
+```
+
+## Deployment
+
+### Docker Deployment with Ollama
+
+**Complete docker-compose.yml:**
+
+```yaml
+version: '3.8'
+
+services:
+  ollama:
+    image: ollama/ollama:latest
+    container_name: t6-ollama
+    ports:
+      - "11434:11434"
+    volumes:
+      - ollama_data:/root/.ollama
+    networks:
+      - localai
+    restart: unless-stopped
+
+  mcp-server:
+    build:
+      context: .
+      dockerfile: docker/Dockerfile.mcp
+    container_name: t6-mem0-mcp
+    restart: unless-stopped
+    ports:
+      - "8765:8765"
+    environment:
+      - LLM_PROVIDER=ollama
+      - EMBEDDER_PROVIDER=ollama
+      - OLLAMA_BASE_URL=http://ollama:11434
+      - OLLAMA_LLM_MODEL=llama3.1:8b
+      - OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+      - MEM0_EMBEDDING_DIMS=768
+      - SUPABASE_CONNECTION_STRING=${SUPABASE_CONNECTION_STRING}
+      - NEO4J_URI=neo4j://neo4j:7687
+      - NEO4J_USER=${NEO4J_USER}
+      - NEO4J_PASSWORD=${NEO4J_PASSWORD}
+    depends_on:
+      - ollama
+      - neo4j
+    networks:
+      - localai
+
+  neo4j:
+    image: neo4j:5.26.0
+    container_name: t6-neo4j
+    ports:
+      - "7474:7474"
+      - "7687:7687"
+    environment:
+      - NEO4J_AUTH=neo4j/${NEO4J_PASSWORD}
+    volumes:
+      - neo4j_data:/data
+    networks:
+      - localai
+
+volumes:
+  ollama_data:
+  neo4j_data:
+
+networks:
+  localai:
+    external: true
+```
+
+**Startup sequence:**
+
+```bash
+# Start services
+docker compose up -d
+
+# Pull models
+docker exec -it t6-ollama ollama pull llama3.1:8b
+docker exec -it t6-ollama ollama pull nomic-embed-text
+
+# Verify Ollama is working
+curl http://localhost:11434/api/tags
+
+# Restart mem0 services to pick up models
+docker compose restart mcp-server
+```
+
+## Testing
+
+### Test Ollama Connection
+
+```bash
+# List available models
+curl http://localhost:11434/api/tags
+
+# Test generation
+curl http://localhost:11434/api/generate -d '{
+  "model": "llama3.1:8b",
+  "prompt": "Hello, world!",
+  "stream": false
+}'
+```
+
+### Test Memory Operations
+
+```bash
+# Add memory via REST API
+curl -X POST http://localhost:8080/v1/memories/ \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      {"role": "user", "content": "I love local AI models"},
+      {"role": "assistant", "content": "Noted!"}
+    ],
+    "user_id": "test_user"
+  }'
+
+# Check logs for Ollama usage
+docker logs t6-mem0-mcp --tail 50
+```
+
+## Performance Tuning
+
+### GPU Acceleration
+
+If you have an NVIDIA GPU:
+
+```yaml
+ollama:
+  image: ollama/ollama:latest
+  deploy:
+    resources:
+      reservations:
+        devices:
+          - driver: nvidia
+            count: all
+            capabilities: [gpu]
+```
+
+### Model Caching
+
+Models are cached in `ollama_data` volume. To clear cache:
+
+```bash
+docker volume rm ollama_data
+```
+
+### Concurrent Requests
+
+Ollama handles concurrent requests by default. For high load:
+
+```yaml
+ollama:
+  environment:
+    - OLLAMA_NUM_PARALLEL=4  # Number of parallel requests
+    - OLLAMA_MAX_LOADED_MODELS=2  # Keep models in memory
+```
+
+## Troubleshooting
+
+### Ollama Not Responding
+
+```bash
+# Check Ollama status
+curl http://localhost:11434/api/tags
+
+# Check logs
+docker logs t6-ollama
+
+# Restart Ollama
+docker restart t6-ollama
+```
+
+### Model Not Found
+
+```bash
+# List pulled models
+docker exec -it t6-ollama ollama list
+
+# Pull missing model
+docker exec -it t6-ollama ollama pull llama3.1:8b
+```
+
+### Out of Memory
+
+Try a smaller model:
+
+```bash
+# Switch to smaller model in .env
+OLLAMA_LLM_MODEL=phi3:3.8b
+
+# Or use quantized version
+OLLAMA_LLM_MODEL=llama3.1:8b-q4_0  # 4-bit quantization
+```
+
+### Slow Response Times
+
+- Use GPU acceleration
+- Use smaller models (phi3:3.8b)
+- Reduce concurrent requests
+- Check system resources (RAM, CPU)
+
+### Connection Refused
+
+If mem0 can't connect to Ollama:
+
+```bash
+# Test from mem0 container
+docker exec -it t6-mem0-mcp curl http://ollama:11434/api/tags
+
+# Check both containers on same network
+docker network inspect localai
+```
+
+## Migration from OpenAI
+
+### 1. Pull Models
+
+```bash
+ollama pull llama3.1:8b
+ollama pull nomic-embed-text
+```
+
+### 2. Update Configuration
+
+```bash
+# Backup current .env
+cp .env .env.openai.backup
+
+# Update .env
+LLM_PROVIDER=ollama
+EMBEDDER_PROVIDER=ollama
+OLLAMA_BASE_URL=http://localhost:11434
+OLLAMA_LLM_MODEL=llama3.1:8b
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+MEM0_EMBEDDING_DIMS=768  # Changed from 1536
+```
+
+### 3. Clear Existing Embeddings (Important!)
+
+<Warning>
+When switching embedding models, you must clear existing embeddings as dimensions changed from 1536 (OpenAI) to 768 (Ollama).
+</Warning>
+
+```bash
+# Clear Supabase embeddings
+psql $SUPABASE_CONNECTION_STRING -c "DELETE FROM t6_memories;"
+
+# Clear Neo4j graph
+docker exec -it t6-neo4j cypher-shell -u neo4j -p YOUR_PASSWORD \
+  "MATCH (n) DETACH DELETE n"
+```
+
+### 4. Restart Services
+
+```bash
+docker compose restart
+```
+
+### 5. Test
+
+Add new memories and verify they work with Ollama.
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="MCP Installation" icon="download" href="/mcp/installation">
+    Deploy with Ollama in Docker
+  </Card>
+  <Card title="Model Comparison" icon="chart-line" href="/setup/model-comparison">
+    Compare OpenAI vs Ollama performance
+  </Card>
+</CardGroup>