t6_mem0_v2/docs/setup/ollama.mdx

---
title: 'Ollama Setup'
description: 'Use local LLM models with Ollama instead of OpenAI'
---

# Ollama Setup Guide

T6 Mem0 v2 supports both OpenAI and Ollama as LLM providers. Use Ollama to run completely local models without requiring OpenAI API credits.

## Why Ollama?

- **Cost-effective**: No API costs, run models locally
- **Privacy**: All data stays on your infrastructure
- **Offline capability**: Works without internet connection
- **Model variety**: Access to Llama, Mistral, and other open-source models

## Prerequisites

- Docker and Docker Compose (if using containerized deployment)
- Or Ollama installed locally
- Sufficient RAM (8GB+ for smaller models, 16GB+ recommended)
- GPU optional but recommended for better performance

## Installation

### Option 1: Ollama on Host Machine

**Install Ollama:**

```bash
# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Or download from https://ollama.com/download
```

**Start Ollama service:**

```bash
ollama serve
```

**Pull required models:**

```bash
# LLM model (choose one)
ollama pull llama3.1:8b      # 8B parameters, 4.7GB
ollama pull llama3.1:70b     # 70B parameters, 40GB (requires 48GB RAM)
ollama pull mistral:7b       # 7B parameters, 4.1GB

# Embedding model (required)
ollama pull nomic-embed-text # 274MB
```

### Option 2: Ollama in Docker

**Add to docker-compose.yml:**

```yaml
services:
  ollama:
    image: ollama/ollama:latest
    container_name: t6-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    networks:
      - localai
    restart: unless-stopped

volumes:
  ollama_data:
```

**Pull models inside container:**

```bash
docker exec -it t6-ollama ollama pull llama3.1:8b
docker exec -it t6-ollama ollama pull nomic-embed-text
```

## Configuration

### Environment Variables

Update your `.env` file:

```bash
# Switch to Ollama
LLM_PROVIDER=ollama
EMBEDDER_PROVIDER=ollama

# Ollama configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=llama3.1:8b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

# OpenAI key no longer required
# OPENAI_API_KEY=  # Can be left empty
```

### Docker Network Configuration

If running Ollama in Docker on the same network as mem0:

```bash
# Find Ollama container IP
docker inspect t6-ollama --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'

# Update .env
OLLAMA_BASE_URL=http://172.21.0.15:11434  # Use actual container IP
```

Or use Docker service name:

```bash
OLLAMA_BASE_URL=http://ollama:11434  # If on same Docker network
```

## Model Selection

### LLM Models

| Model | Size | RAM Required | Use Case |
|-------|------|--------------|----------|
| `llama3.1:8b` | 4.7GB | 8GB | General purpose, fast |
| `llama3.1:70b` | 40GB | 48GB | High quality responses |
| `mistral:7b` | 4.1GB | 8GB | Fast, efficient |
| `codellama:7b` | 3.8GB | 8GB | Code generation |
| `phi3:3.8b` | 2.3GB | 4GB | Smallest viable model |

### Embedding Models

| Model | Size | Dimensions | Use Case |
|-------|------|------------|----------|
| `nomic-embed-text` | 274MB | 768 | Recommended, fast |
| `mxbai-embed-large` | 669MB | 1024 | Higher quality |
| `all-minilm` | 46MB | 384 | Smallest option |

**Important**: Update `MEM0_EMBEDDING_DIMS` to match your embedding model:

```bash
# For nomic-embed-text
MEM0_EMBEDDING_DIMS=768

# For mxbai-embed-large
MEM0_EMBEDDING_DIMS=1024

# For all-minilm
MEM0_EMBEDDING_DIMS=384
```

## Switching Between OpenAI and Ollama

### Full Ollama Configuration

```bash
LLM_PROVIDER=ollama
EMBEDDER_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=llama3.1:8b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
MEM0_EMBEDDING_DIMS=768
```

### Hybrid Configuration

Use Ollama for LLM but OpenAI for embeddings:

```bash
LLM_PROVIDER=ollama
EMBEDDER_PROVIDER=openai
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=llama3.1:8b
OPENAI_API_KEY=sk-your-key
MEM0_EMBEDDING_DIMS=1536  # OpenAI dimensions
```

Or use OpenAI for LLM but Ollama for embeddings:

```bash
LLM_PROVIDER=openai
EMBEDDER_PROVIDER=ollama
OPENAI_API_KEY=sk-your-key
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
MEM0_EMBEDDING_DIMS=768  # Ollama dimensions
```

### Back to OpenAI

```bash
LLM_PROVIDER=openai
EMBEDDER_PROVIDER=openai
OPENAI_API_KEY=sk-your-key
MEM0_EMBEDDING_DIMS=1536
```

## Deployment

### Docker Deployment with Ollama

**Complete docker-compose.yml:**

```yaml
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: t6-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    networks:
      - localai
    restart: unless-stopped

  mcp-server:
    build:
      context: .
      dockerfile: docker/Dockerfile.mcp
    container_name: t6-mem0-mcp
    restart: unless-stopped
    ports:
      - "8765:8765"
    environment:
      - LLM_PROVIDER=ollama
      - EMBEDDER_PROVIDER=ollama
      - OLLAMA_BASE_URL=http://ollama:11434
      - OLLAMA_LLM_MODEL=llama3.1:8b
      - OLLAMA_EMBEDDING_MODEL=nomic-embed-text
      - MEM0_EMBEDDING_DIMS=768
      - SUPABASE_CONNECTION_STRING=${SUPABASE_CONNECTION_STRING}
      - NEO4J_URI=neo4j://neo4j:7687
      - NEO4J_USER=${NEO4J_USER}
      - NEO4J_PASSWORD=${NEO4J_PASSWORD}
    depends_on:
      - ollama
      - neo4j
    networks:
      - localai

  neo4j:
    image: neo4j:5.26.0
    container_name: t6-neo4j
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      - NEO4J_AUTH=neo4j/${NEO4J_PASSWORD}
    volumes:
      - neo4j_data:/data
    networks:
      - localai

volumes:
  ollama_data:
  neo4j_data:

networks:
  localai:
    external: true
```

**Startup sequence:**

```bash
# Start services
docker compose up -d

# Pull models
docker exec -it t6-ollama ollama pull llama3.1:8b
docker exec -it t6-ollama ollama pull nomic-embed-text

# Verify Ollama is working
curl http://localhost:11434/api/tags

# Restart mem0 services to pick up models
docker compose restart mcp-server
```

## Testing

### Test Ollama Connection

```bash
# List available models
curl http://localhost:11434/api/tags

# Test generation
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Hello, world!",
  "stream": false
}'
```

### Test Memory Operations

```bash
# Add memory via REST API
curl -X POST http://localhost:8080/v1/memories/ \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "I love local AI models"},
      {"role": "assistant", "content": "Noted!"}
    ],
    "user_id": "test_user"
  }'

# Check logs for Ollama usage
docker logs t6-mem0-mcp --tail 50
```

## Performance Tuning

### GPU Acceleration

If you have an NVIDIA GPU:

```yaml
ollama:
  image: ollama/ollama:latest
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
```

### Model Caching

Models are cached in `ollama_data` volume. To clear cache:

```bash
docker volume rm ollama_data
```

### Concurrent Requests

Ollama handles concurrent requests by default. For high load:

```yaml
ollama:
  environment:
    - OLLAMA_NUM_PARALLEL=4  # Number of parallel requests
    - OLLAMA_MAX_LOADED_MODELS=2  # Keep models in memory
```

## Troubleshooting

### Ollama Not Responding

```bash
# Check Ollama status
curl http://localhost:11434/api/tags

# Check logs
docker logs t6-ollama

# Restart Ollama
docker restart t6-ollama
```

### Model Not Found

```bash
# List pulled models
docker exec -it t6-ollama ollama list

# Pull missing model
docker exec -it t6-ollama ollama pull llama3.1:8b
```

### Out of Memory

Try a smaller model:

```bash
# Switch to smaller model in .env
OLLAMA_LLM_MODEL=phi3:3.8b

# Or use quantized version
OLLAMA_LLM_MODEL=llama3.1:8b-q4_0  # 4-bit quantization
```

### Slow Response Times

- Use GPU acceleration
- Use smaller models (phi3:3.8b)
- Reduce concurrent requests
- Check system resources (RAM, CPU)

### Connection Refused

If mem0 can't connect to Ollama:

```bash
# Test from mem0 container
docker exec -it t6-mem0-mcp curl http://ollama:11434/api/tags

# Check both containers on same network
docker network inspect localai
```

## Migration from OpenAI

### 1. Pull Models

```bash
ollama pull llama3.1:8b
ollama pull nomic-embed-text
```

### 2. Update Configuration

```bash
# Backup current .env
cp .env .env.openai.backup

# Update .env
LLM_PROVIDER=ollama
EMBEDDER_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_LLM_MODEL=llama3.1:8b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
MEM0_EMBEDDING_DIMS=768  # Changed from 1536
```

### 3. Clear Existing Embeddings (Important!)

<Warning>
When switching embedding models, you must clear existing embeddings as dimensions changed from 1536 (OpenAI) to 768 (Ollama).
</Warning>

```bash
# Clear Supabase embeddings
psql $SUPABASE_CONNECTION_STRING -c "DELETE FROM t6_memories;"

# Clear Neo4j graph
docker exec -it t6-neo4j cypher-shell -u neo4j -p YOUR_PASSWORD \
  "MATCH (n) DETACH DELETE n"
```

### 4. Restart Services

```bash
docker compose restart
```

### 5. Test

Add new memories and verify they work with Ollama.

## Next Steps

<CardGroup cols={2}>
  <Card title="MCP Installation" icon="download" href="/mcp/installation">
    Deploy with Ollama in Docker
  </Card>
  <Card title="Model Comparison" icon="chart-line" href="/setup/model-comparison">
    Compare OpenAI vs Ollama performance
  </Card>
</CardGroup>