Integrate Supabase VectorDB (#2290)

2025-03-03 23:16:24 +05:30
parent 2556c5fe88
commit 8452dd598f
11 changed files with 542 additions and 4 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -52,7 +52,7 @@ jobs:
          virtualenvs-in-project: true
      - name: Load cached venv
        id: cached-poetry-dependencies
-        uses: actions/cache@v2
+        uses: actions/cache@v3
        with:
          path: .venv
          key: venv-mem0-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
@@ -83,7 +83,7 @@ jobs:
          virtualenvs-in-project: true
      - name: Load cached venv
        id: cached-poetry-dependencies
-        uses: actions/cache@v2
+        uses: actions/cache@v3
        with:
          path: .venv
          key: venv-embedchain-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}
--- a/2
+++ b/2
@@ -13,7 +13,7 @@ install:
 install_all:
 	poetry install
 	poetry run pip install groq together boto3 litellm ollama chromadb sentence_transformers vertexai \
-	                        google-generativeai elasticsearch opensearch-py
+	                        google-generativeai elasticsearch opensearch-py vecs
 # Format code with ruff
 format:
--- a/docs/components/vectordbs/config.mdx
+++ b/docs/components/vectordbs/config.mdx
@@ -86,6 +86,9 @@ Here's a comprehensive list of all parameters that can be used across different
 | `url` | Full URL for the server |
 | `api_key` | API key for the server |
 | `on_disk` | Enable persistent storage |
 | `connection_string` | PostgreSQL connection string (for Supabase/PGVector) |
 | `index_method` | Vector index method (for Supabase) |
 | `index_measure` | Distance measure for similarity search (for Supabase) |
 </Tab>
 <Tab title="TypeScript">
 | Parameter | Description |
--- a/docs/components/vectordbs/dbs/supabase.mdx
+++ b/docs/components/vectordbs/dbs/supabase.mdx
@@ -0,0 +1,78 @@
 [Supabase](https://supabase.com/) is an open-source Firebase alternative that provides a PostgreSQL database with pgvector extension for vector similarity search. It offers a powerful and scalable solution for storing and querying vector embeddings.
 Create a [Supabase](https://supabase.com/dashboard/projects) account and project, then get your connection string from Project Settings > Database. See the [docs](https://supabase.github.io/vecs/hosting/) for details.
 ### Usage
 ```python
 import os
 from mem0 import Memory
 os.environ["OPENAI_API_KEY"] = "sk-xx"
 config = {
    "vector_store": {
        "provider": "supabase",
        "config": {
            "connection_string": "postgresql://user:password@host:port/database",
            "collection_name": "memories",
            "index_method": "hnsw",  # Optional: defaults to "auto"
            "index_measure": "cosine_distance"  # Optional: defaults to "cosine_distance"
        }
    }
 }
 m = Memory.from_config(config)
 messages = [
    {"role": "user", "content": "I'm planning to watch a movie tonight. Any recommendations?"},
    {"role": "assistant", "content": "How about a thriller movies? They can be quite engaging."},
    {"role": "user", "content": "I'm not a big fan of thriller movies but I love sci-fi movies."},
    {"role": "assistant", "content": "Got it! I'll avoid thriller recommendations and suggest sci-fi movies in the future."}
 ]
 m.add(messages, user_id="alice", metadata={"category": "movies"})
 ```
 ### Config
 Here are the parameters available for configuring Supabase:
 | Parameter | Description | Default Value |
 | --- | --- | --- |
 | `connection_string` | PostgreSQL connection string (required) | None |
 | `collection_name` | Name for the vector collection | `mem0` |
 | `embedding_model_dims` | Dimensions of the embedding model | `1536` |
 | `index_method` | Vector index method to use | `auto` |
 | `index_measure` | Distance measure for similarity search | `cosine_distance` |
 ### Index Methods
 The following index methods are supported:
 - `auto`: Automatically selects the best available index method
 - `hnsw`: Hierarchical Navigable Small World graph index (faster search, more memory usage)
 - `ivfflat`: Inverted File Flat index (good balance of speed and memory)
 ### Distance Measures
 Available distance measures for similarity search:
 - `cosine_distance`: Cosine similarity (recommended for most embedding models)
 - `l2_distance`: Euclidean distance
 - `l1_distance`: Manhattan distance
 - `max_inner_product`: Maximum inner product similarity
 ### Best Practices
 1. **Index Method Selection**:
   - Use `hnsw` for fastest search performance when memory is not a constraint
   - Use `ivfflat` for a good balance of search speed and memory usage
   - Use `auto` if unsure, it will select the best method based on your data
 2. **Distance Measure Selection**:
   - Use `cosine_distance` for most embedding models (OpenAI, Hugging Face, etc.)
   - Use `max_inner_product` if your vectors are normalized
   - Use `l2_distance` or `l1_distance` if working with raw feature vectors
 3. **Connection String**:
   - Always use environment variables for sensitive information in the connection string
   - Format: `postgresql://user:password@host:port/database`
--- a/docs/components/vectordbs/overview.mdx
+++ b/docs/components/vectordbs/overview.mdx
@@ -23,6 +23,7 @@ See the list of supported vector databases below.
  <Card title="Redis" href="/components/vectordbs/dbs/redis"></Card>
  <Card title="Elasticsearch" href="/components/vectordbs/dbs/elasticsearch"></Card>
  <Card title="OpenSearch" href="/components/vectordbs/dbs/opensearch"></Card>
  <Card title="Supabase" href="/components/vectordbs/dbs/supabase"></Card>
 </CardGroup>
 ## Usage
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -128,7 +128,8 @@
                          "components/vectordbs/dbs/azure_ai_search",
                          "components/vectordbs/dbs/redis",
                          "components/vectordbs/dbs/elasticsearch",
-                          "components/vectordbs/dbs/opensearch"
+                          "components/vectordbs/dbs/opensearch",
                          "components/vectordbs/dbs/supabase"
                        ]
                      }
                    ]
--- a/mem0/configs/vector_stores/supabase.py
+++ b/mem0/configs/vector_stores/supabase.py
@@ -0,0 +1,44 @@
 from typing import Any, Dict, Optional
 from enum import Enum
 from pydantic import BaseModel, Field, model_validator
 class IndexMethod(str, Enum):
    AUTO = "auto"
    HNSW = "hnsw"
    IVFFLAT = "ivfflat"
 class IndexMeasure(str, Enum):
    COSINE = "cosine_distance"
    L2 = "l2_distance"
    L1 = "l1_distance"
    MAX_INNER_PRODUCT = "max_inner_product"
 class SupabaseConfig(BaseModel):
    connection_string: str = Field(..., description="PostgreSQL connection string")
    collection_name: str = Field("mem0", description="Name for the vector collection")
    embedding_model_dims: Optional[int] = Field(1536, description="Dimensions of the embedding model")
    index_method: Optional[IndexMethod] = Field(IndexMethod.AUTO, description="Index method to use")
    index_measure: Optional[IndexMeasure] = Field(IndexMeasure.COSINE, description="Distance measure to use")
    @model_validator(mode="before")
    def check_connection_string(cls, values):
        conn_str = values.get("connection_string")
        if not conn_str or not conn_str.startswith("postgresql://"):
            raise ValueError("A valid PostgreSQL connection string must be provided")
        return values
    @model_validator(mode="before")
    @classmethod
    def validate_extra_fields(cls, values: Dict[str, Any]) -> Dict[str, Any]:
        allowed_fields = set(cls.model_fields.keys())
        input_fields = set(values.keys())
        extra_fields = input_fields - allowed_fields
        if extra_fields:
            raise ValueError(
                f"Extra fields not allowed: {', '.join(extra_fields)}. Please input only the following fields: {', '.join(allowed_fields)}"
            )
        return values
--- a/mem0/utils/factory.py
+++ b/mem0/utils/factory.py
@@ -70,6 +70,7 @@ class VectorStoreFactory:
        "redis": "mem0.vector_stores.redis.RedisDB",
        "elasticsearch": "mem0.vector_stores.elasticsearch.ElasticsearchDB",
        "opensearch": "mem0.vector_stores.opensearch.OpenSearchDB",
        "supabase": "mem0.vector_stores.supabase.Supabase",
    }
    @classmethod
--- a/mem0/vector_stores/configs.py
+++ b/mem0/vector_stores/configs.py
@@ -19,6 +19,7 @@ class VectorStoreConfig(BaseModel):
        "redis": "RedisDBConfig",
        "elasticsearch": "ElasticsearchConfig",
        "opensearch": "OpenSearchConfig",
        "supabase": "SupabaseConfig",
    }
    @model_validator(mode="after")
--- a/mem0/vector_stores/supabase.py
+++ b/mem0/vector_stores/supabase.py
@@ -0,0 +1,231 @@
 import logging
 import uuid
 from typing import List, Optional, Dict, Any
 from pydantic import BaseModel
 try:
    import vecs
 except ImportError:
    raise ImportError("The 'vecs' library is required. Please install it using 'pip install vecs'.")
 from mem0.vector_stores.base import VectorStoreBase
 from mem0.configs.vector_stores.supabase import IndexMethod, IndexMeasure
 logger = logging.getLogger(__name__)
 class OutputData(BaseModel):
    id: Optional[str]
    score: Optional[float]
    payload: Optional[dict]
 class Supabase(VectorStoreBase):
    def __init__(
        self,
        connection_string: str,
        collection_name: str,
        embedding_model_dims: int,
        index_method: IndexMethod = IndexMethod.AUTO,
        index_measure: IndexMeasure = IndexMeasure.COSINE,
    ):
        """
        Initialize the Supabase vector store using vecs.
        Args:
            connection_string (str): PostgreSQL connection string
            collection_name (str): Collection name
            embedding_model_dims (int): Dimension of the embedding vector
            index_method (IndexMethod): Index method to use. Defaults to AUTO.
            index_measure (IndexMeasure): Distance measure to use. Defaults to COSINE.
        """
        self.db = vecs.create_client(connection_string)
        self.collection_name = collection_name
        self.embedding_model_dims = embedding_model_dims
        self.index_method = index_method
        self.index_measure = index_measure
        collections = self.list_cols()
        if collection_name not in collections:
            self.create_col(embedding_model_dims)
    def _preprocess_filters(self, filters: Optional[dict] = None) -> Optional[dict]:
        """
        Preprocess filters to be compatible with vecs.
        Args:
            filters (Dict, optional): Filters to preprocess. Multiple filters will be
                combined with AND logic.
        """
        if filters is None:
            return None
        if len(filters) == 1:
            # For single filter, keep the simple format
            key, value = next(iter(filters.items()))
            return {key: {"$eq": value}}
        # For multiple filters, use $and clause
        return {"$and": [{key: {"$eq": value}} for key, value in filters.items()]}
    def create_col(self, embedding_model_dims: Optional[int] = None) -> None:
        """
        Create a new collection with vector support.
        Will also initialize vector search index.
        Args:
            embedding_model_dims (int, optional): Dimension of the embedding vector.
                If not provided, uses the dimension specified in initialization.
        """
        dims = embedding_model_dims or self.embedding_model_dims
        if not dims:
            raise ValueError(
                "embedding_model_dims must be provided either during initialization or when creating collection"
            )
        logger.info(f"Creating new collection: {self.collection_name}")
        try:
            self.collection = self.db.get_or_create_collection(name=self.collection_name, dimension=dims)
            self.collection.create_index(method=self.index_method.value, measure=self.index_measure.value)
            logger.info(f"Successfully created collection {self.collection_name} with dimension {dims}")
        except Exception as e:
            logger.error(f"Failed to create collection: {str(e)}")
            raise
    def insert(
        self, vectors: List[List[float]], payloads: Optional[List[dict]] = None, ids: Optional[List[str]] = None
    ):
        """
        Insert vectors into the collection.
        Args:
            vectors (List[List[float]]): List of vectors to insert
            payloads (List[Dict], optional): List of payloads corresponding to vectors
            ids (List[str], optional): List of IDs corresponding to vectors
        """
        logger.info(f"Inserting {len(vectors)} vectors into collection {self.collection_name}")
        if not ids:
            ids = [str(uuid.uuid4()) for _ in vectors]
        if not payloads:
            payloads = [{} for _ in vectors]
        records = [(id, vector, payload) for id, vector, payload in zip(ids, vectors, payloads)]
        print(records)
        self.collection.upsert(records)
    def search(self, query: List[float], limit: int = 5, filters: Optional[dict] = None) -> List[OutputData]:
        """
        Search for similar vectors.
        Args:
            query (List[float]): Query vector
            limit (int, optional): Number of results to return. Defaults to 5.
            filters (Dict, optional): Filters to apply to the search. Defaults to None.
        Returns:
            List[OutputData]: Search results
        """
        filters = self._preprocess_filters(filters)
        print(filters)
        results = self.collection.query(
            data=query, limit=limit, filters=filters, include_metadata=True, include_value=True
        )
        print(results)
        return [OutputData(id=str(result[0]), score=float(result[1]), payload=result[2]) for result in results]
    def delete(self, vector_id: str):
        """
        Delete a vector by ID.
        Args:
            vector_id (str): ID of the vector to delete
        """
        self.collection.delete([(vector_id,)])
    def update(self, vector_id: str, vector: Optional[List[float]] = None, payload: Optional[dict] = None):
        """
        Update a vector and/or its payload.
        Args:
            vector_id (str): ID of the vector to update
            vector (List[float], optional): Updated vector
            payload (Dict, optional): Updated payload
        """
        if vector is None:
            # If only updating metadata, we need to get the existing vector
            existing = self.get(vector_id)
            if existing and existing.payload:
                vector = existing.payload.get("vector", [])
        if vector:
            self.collection.upsert([(vector_id, vector, payload or {})])
    def get(self, vector_id: str) -> Optional[OutputData]:
        """
        Retrieve a vector by ID.
        Args:
            vector_id (str): ID of the vector to retrieve
        Returns:
            Optional[OutputData]: Retrieved vector data or None if not found
        """
        result = self.collection.fetch([(vector_id,)])
        if not result:
            return []
        record = result[0]
        return OutputData(id=str(record.id), score=None, payload=record.metadata)
    def list_cols(self) -> List[str]:
        """
        List all collections.
        Returns:
            List[str]: List of collection names
        """
        return self.db.list_collections()
    def delete_col(self):
        """Delete the collection."""
        self.db.delete_collection(self.collection_name)
    def col_info(self) -> dict:
        """
        Get information about the collection.
        Returns:
            Dict: Collection information including name and configuration
        """
        info = self.collection.describe()
        return {
            "name": info.name,
            "count": info.vectors,
            "dimension": info.dimension,
            "index": {"method": info.index_method, "metric": info.distance_metric},
        }
    def list(self, filters: Optional[dict] = None, limit: int = 100) -> List[OutputData]:
        """
        List vectors in the collection.
        Args:
            filters (Dict, optional): Filters to apply
            limit (int, optional): Maximum number of results to return. Defaults to 100.
        Returns:
            List[OutputData]: List of vectors
        """
        filters = self._preprocess_filters(filters)
        query = [0] * self.embedding_model_dims
        ids = self.collection.query(
            data=query, limit=limit, filters=filters, include_metadata=True, include_value=False
        )
        ids = [id[0] for id in ids]
        records = self.collection.fetch(ids=ids)
        return [[OutputData(id=str(record[0]), score=None, payload=record[2]) for record in records]]
--- a/tests/vector_stores/test_supabase.py
+++ b/tests/vector_stores/test_supabase.py
@@ -0,0 +1,178 @@
 from unittest.mock import Mock, patch
 import pytest
 from mem0.configs.vector_stores.supabase import IndexMeasure, IndexMethod
 from mem0.vector_stores.supabase import Supabase
@pytest.fixture
 def mock_vecs_client():
    with patch("vecs.create_client") as mock_client:
        yield mock_client
@pytest.fixture
 def mock_collection():
    collection = Mock()
    collection.name = "test_collection"
    collection.vectors = 100
    collection.dimension = 1536
    collection.index_method = "hnsw"
    collection.distance_metric = "cosine_distance"
    collection.describe.return_value = collection
    return collection
@pytest.fixture
 def supabase_instance(mock_vecs_client, mock_collection):
    # Set up the mock client to return our mock collection
    mock_vecs_client.return_value.get_or_create_collection.return_value = mock_collection
    mock_vecs_client.return_value.list_collections.return_value = ["test_collection"]
    instance = Supabase(
        connection_string="postgresql://user:password@localhost:5432/test",
        collection_name="test_collection",
        embedding_model_dims=1536,
        index_method=IndexMethod.HNSW,
        index_measure=IndexMeasure.COSINE,
    )
    # Manually set the collection attribute since we're mocking the initialization
    instance.collection = mock_collection
    return instance
 def test_create_col(supabase_instance, mock_vecs_client, mock_collection):
    supabase_instance.create_col(1536)
    mock_vecs_client.return_value.get_or_create_collection.assert_called_with(
        name="test_collection",
        dimension=1536
    )
    mock_collection.create_index.assert_called_with(
        method="hnsw",
        measure="cosine_distance"
    )
 def test_insert_vectors(supabase_instance, mock_collection):
    vectors = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
    payloads = [{"name": "vector1"}, {"name": "vector2"}]
    ids = ["id1", "id2"]
    supabase_instance.insert(vectors=vectors, payloads=payloads, ids=ids)
    expected_records = [
        ("id1", [0.1, 0.2, 0.3], {"name": "vector1"}),
        ("id2", [0.4, 0.5, 0.6], {"name": "vector2"})
    ]
    mock_collection.upsert.assert_called_once_with(expected_records)
 def test_search_vectors(supabase_instance, mock_collection):
    mock_results = [
        ("id1", 0.9, {"name": "vector1"}),
        ("id2", 0.8, {"name": "vector2"})
    ]
    mock_collection.query.return_value = mock_results
    query = [0.1, 0.2, 0.3]
    filters = {"category": "test"}
    results = supabase_instance.search(query=query, limit=2, filters=filters)
    mock_collection.query.assert_called_once_with(
        data=query,
        limit=2,
        filters={"category": {"$eq": "test"}},
        include_metadata=True,
        include_value=True
    )
    assert len(results) == 2
    assert results[0].id == "id1"
    assert results[0].score == 0.9
    assert results[0].payload == {"name": "vector1"}
 def test_delete_vector(supabase_instance, mock_collection):
    vector_id = "id1"
    supabase_instance.delete(vector_id=vector_id)
    mock_collection.delete.assert_called_once_with([("id1",)])
 def test_update_vector(supabase_instance, mock_collection):
    vector_id = "id1"
    new_vector = [0.7, 0.8, 0.9]
    new_payload = {"name": "updated_vector"}
    supabase_instance.update(vector_id=vector_id, vector=new_vector, payload=new_payload)
    mock_collection.upsert.assert_called_once_with([("id1", new_vector, new_payload)])
 def test_get_vector(supabase_instance, mock_collection):
    # Create a Mock object to represent the record
    mock_record = Mock()
    mock_record.id = "id1"
    mock_record.metadata = {"name": "vector1"}
    mock_record.values = [0.1, 0.2, 0.3]
    # Set the fetch return value to a list containing our mock record
    mock_collection.fetch.return_value = [mock_record]
    result = supabase_instance.get(vector_id="id1")
    mock_collection.fetch.assert_called_once_with([("id1",)])
    assert result.id == "id1"
    assert result.payload == {"name": "vector1"}
 def test_list_vectors(supabase_instance, mock_collection):
    mock_query_results = [("id1", 0.9, {}), ("id2", 0.8, {})]
    mock_fetch_results = [
        ("id1", [0.1, 0.2, 0.3], {"name": "vector1"}),
        ("id2", [0.4, 0.5, 0.6], {"name": "vector2"})
    ]
    mock_collection.query.return_value = mock_query_results
    mock_collection.fetch.return_value = mock_fetch_results
    results = supabase_instance.list(limit=2, filters={"category": "test"})
    assert len(results[0]) == 2
    assert results[0][0].id == "id1"
    assert results[0][0].payload == {"name": "vector1"}
    assert results[0][1].id == "id2"
    assert results[0][1].payload == {"name": "vector2"}
 def test_col_info(supabase_instance, mock_collection):
    info = supabase_instance.col_info()
    assert info == {
        "name": "test_collection",
        "count": 100,
        "dimension": 1536,
        "index": {
            "method": "hnsw",
            "metric": "cosine_distance"
        }
    }
 def test_preprocess_filters(supabase_instance):
    # Test single filter
    single_filter = {"category": "test"}
    assert supabase_instance._preprocess_filters(single_filter) == {"category": {"$eq": "test"}}
    # Test multiple filters
    multi_filter = {"category": "test", "type": "document"}
    assert supabase_instance._preprocess_filters(multi_filter) == {
        "$and": [
            {"category": {"$eq": "test"}},
            {"type": {"$eq": "document"}}
        ]
    }
    # Test None filters
    assert supabase_instance._preprocess_filters(None) is None