Rename embedchain to mem0 and open sourcing code for long term memory (#1474)

Co-authored-by: Deshraj Yadav <deshrajdry@gmail.com>
This commit is contained in:
Taranjeet Singh
2024-07-12 07:51:33 -07:00
committed by GitHub
parent 83e8c97295
commit f842a92e25
665 changed files with 9427 additions and 6592 deletions

View File

@@ -0,0 +1,35 @@
---
title: ChromaDB
---
<CodeGroup>
```python main.py
from embedchain import App
# load chroma configuration from yaml file
app = App.from_config(config_path="config1.yaml")
```
```yaml config1.yaml
vectordb:
provider: chroma
config:
collection_name: 'my-collection'
dir: db
allow_reset: true
```
```yaml config2.yaml
vectordb:
provider: chroma
config:
collection_name: 'my-collection'
host: localhost
port: 5200
allow_reset: true
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />

View File

@@ -0,0 +1,39 @@
---
title: Elasticsearch
---
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[elasticsearch]'
```
<Note>
You can configure the Elasticsearch connection by providing either `es_url` or `cloud_id`. If you are using the Elasticsearch Service on Elastic Cloud, you can find the `cloud_id` on the [Elastic Cloud dashboard](https://cloud.elastic.co/deployments).
</Note>
You can authorize the connection to Elasticsearch by providing either `basic_auth`, `api_key`, or `bearer_auth`.
<CodeGroup>
```python main.py
from embedchain import App
# load elasticsearch configuration from yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
vectordb:
provider: elasticsearch
config:
collection_name: 'es-index'
cloud_id: 'deployment-name:xxxx'
basic_auth:
- elastic
- <your_password>
verify_certs: false
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />

View File

@@ -0,0 +1,100 @@
---
title: LanceDB
---
## Install Embedchain with LanceDB
Install Embedchain, LanceDB and related dependencies using the following command:
```bash
pip install "embedchain[lancedb]"
```
LanceDB is a developer-friendly, open source database for AI. From hyper scalable vector search and advanced retrieval for RAG, to streaming training data and interactive exploration of large scale AI datasets.
In order to use LanceDB as vector database, not need to set any key for local use.
### With OPENAI
<CodeGroup>
```python main.py
import os
from embedchain import App
# set OPENAI_API_KEY as env variable
os.environ["OPENAI_API_KEY"] = "sk-xxx"
# create Embedchain App and set config
app = App.from_config(config={
"vectordb": {
"provider": "lancedb",
"config": {
"collection_name": "lancedb-index"
}
}
}
)
# add data source and start query in
app.add("https://www.forbes.com/profile/elon-musk")
# query continuously
while(True):
question = input("Enter question: ")
if question in ['q', 'exit', 'quit']:
break
answer = app.query(question)
print(answer)
```
</CodeGroup>
### With Local LLM
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
# config for Embedchain App
config = {
'llm': {
'provider': 'huggingface',
'config': {
'model': 'mistralai/Mistral-7B-v0.1',
'temperature': 0.1,
'max_tokens': 250,
'top_p': 0.1,
'stream': True
}
},
'embedder': {
'provider': 'huggingface',
'config': {
'model': 'sentence-transformers/all-mpnet-base-v2'
}
},
'vectordb': {
'provider': 'lancedb',
'config': {
'collection_name': 'lancedb-index'
}
}
}
app = App.from_config(config=config)
# add data source and start query in
app.add("https://www.tesla.com/ns_videos/2022-tesla-impact-report.pdf")
# query continuously
while(True):
question = input("Enter question: ")
if question in ['q', 'exit', 'quit']:
break
answer = app.query(question)
print(answer)
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />

View File

@@ -0,0 +1,36 @@
---
title: OpenSearch
---
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[opensearch]'
```
<CodeGroup>
```python main.py
from embedchain import App
# load opensearch configuration from yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
vectordb:
provider: opensearch
config:
collection_name: 'my-app'
opensearch_url: 'https://localhost:9200'
http_auth:
- admin
- admin
vector_dimension: 1536
use_ssl: false
verify_certs: false
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />

View File

@@ -0,0 +1,109 @@
---
title: Pinecone
---
## Overview
Install pinecone related dependencies using the following command:
```bash
pip install --upgrade 'pinecone-client pinecone-text'
```
In order to use Pinecone as vector database, set the environment variable `PINECONE_API_KEY` which you can find on [Pinecone dashboard](https://app.pinecone.io/).
<CodeGroup>
```python main.py
from embedchain import App
# Load pinecone configuration from yaml file
app = App.from_config(config_path="pod_config.yaml")
# Or
app = App.from_config(config_path="serverless_config.yaml")
```
```yaml pod_config.yaml
vectordb:
provider: pinecone
config:
metric: cosine
vector_dimension: 1536
index_name: my-pinecone-index
pod_config:
environment: gcp-starter
metadata_config:
indexed:
- "url"
- "hash"
```
```yaml serverless_config.yaml
vectordb:
provider: pinecone
config:
metric: cosine
vector_dimension: 1536
index_name: my-pinecone-index
serverless_config:
cloud: aws
region: us-west-2
```
</CodeGroup>
<br />
<Note>
You can find more information about Pinecone configuration [here](https://docs.pinecone.io/docs/manage-indexes#create-a-pod-based-index).
You can also optionally provide `index_name` as a config param in yaml file to specify the index name. If not provided, the index name will be `{collection_name}-{vector_dimension}`.
</Note>
## Usage
### Hybrid search
Here is an example of how you can do hybrid search using Pinecone as a vector database through Embedchain.
```python
import os
from embedchain import App
config = {
'app': {
"config": {
"id": "ec-docs-hybrid-search"
}
},
'vectordb': {
'provider': 'pinecone',
'config': {
'metric': 'dotproduct',
'vector_dimension': 1536,
'index_name': 'my-index',
'serverless_config': {
'cloud': 'aws',
'region': 'us-west-2'
},
'hybrid_search': True, # Remember to set this for hybrid search
}
}
}
# Initialize app
app = App.from_config(config=config)
# Add documents
app.add("/path/to/file.pdf", data_type="pdf_file", namespace="my-namespace")
# Query
app.query("<YOUR QUESTION HERE>", namespace="my-namespace")
# Chat
app.chat("<YOUR QUESTION HERE>", namespace="my-namespace")
```
Under the hood, Embedchain fetches the relevant chunks from the documents you added by doing hybrid search on the pinecone index.
If you have questions on how pinecone hybrid search works, please refer to their [offical documentation here](https://docs.pinecone.io/docs/hybrid-search).
<Snippet file="missing-vector-db-tip.mdx" />

View File

@@ -0,0 +1,23 @@
---
title: Qdrant
---
In order to use Qdrant as a vector database, set the environment variables `QDRANT_URL` and `QDRANT_API_KEY` which you can find on [Qdrant Dashboard](https://cloud.qdrant.io/).
<CodeGroup>
```python main.py
from embedchain import App
# load qdrant configuration from yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
vectordb:
provider: qdrant
config:
collection_name: my_qdrant_index
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />

View File

@@ -0,0 +1,24 @@
---
title: Weaviate
---
In order to use Weaviate as a vector database, set the environment variables `WEAVIATE_ENDPOINT` and `WEAVIATE_API_KEY` which you can find on [Weaviate dashboard](https://console.weaviate.cloud/dashboard).
<CodeGroup>
```python main.py
from embedchain import App
# load weaviate configuration from yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
vectordb:
provider: weaviate
config:
collection_name: my_weaviate_index
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />

View File

@@ -0,0 +1,39 @@
---
title: Zilliz
---
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[milvus]'
```
Set the Zilliz environment variables `ZILLIZ_CLOUD_URI` and `ZILLIZ_CLOUD_TOKEN` which you can find it on their [cloud platform](https://cloud.zilliz.com/).
<CodeGroup>
```python main.py
import os
from embedchain import App
os.environ['ZILLIZ_CLOUD_URI'] = 'https://xxx.zillizcloud.com'
os.environ['ZILLIZ_CLOUD_TOKEN'] = 'xxx'
# load zilliz configuration from yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
vectordb:
provider: zilliz
config:
collection_name: 'zilliz_app'
uri: https://xxxx.api.gcp-region.zillizcloud.com
token: xxx
vector_dim: 1536
metric_type: L2
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />