[Feature] Add support for hybrid search for pinecone vector database (#1259)
This commit is contained in:
35
docs/components/vector-databases/chromadb.mdx
Normal file
35
docs/components/vector-databases/chromadb.mdx
Normal file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
title: ChromaDB
|
||||
---
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load chroma configuration from yaml file
|
||||
app = App.from_config(config_path="config1.yaml")
|
||||
```
|
||||
|
||||
```yaml config1.yaml
|
||||
vectordb:
|
||||
provider: chroma
|
||||
config:
|
||||
collection_name: 'my-collection'
|
||||
dir: db
|
||||
allow_reset: true
|
||||
```
|
||||
|
||||
```yaml config2.yaml
|
||||
vectordb:
|
||||
provider: chroma
|
||||
config:
|
||||
collection_name: 'my-collection'
|
||||
host: localhost
|
||||
port: 5200
|
||||
allow_reset: true
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
<Snippet file="missing-vector-db-tip.mdx" />
|
||||
39
docs/components/vector-databases/elasticsearch.mdx
Normal file
39
docs/components/vector-databases/elasticsearch.mdx
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
title: Elasticsearch
|
||||
---
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[elasticsearch]'
|
||||
```
|
||||
|
||||
<Note>
|
||||
You can configure the Elasticsearch connection by providing either `es_url` or `cloud_id`. If you are using the Elasticsearch Service on Elastic Cloud, you can find the `cloud_id` on the [Elastic Cloud dashboard](https://cloud.elastic.co/deployments).
|
||||
</Note>
|
||||
|
||||
You can authorize the connection to Elasticsearch by providing either `basic_auth`, `api_key`, or `bearer_auth`.
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load elasticsearch configuration from yaml file
|
||||
app = App.from_config(config_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: elasticsearch
|
||||
config:
|
||||
collection_name: 'es-index'
|
||||
cloud_id: 'deployment-name:xxxx'
|
||||
basic_auth:
|
||||
- elastic
|
||||
- <your_password>
|
||||
verify_certs: false
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
<Snippet file="missing-vector-db-tip.mdx" />
|
||||
36
docs/components/vector-databases/opensearch.mdx
Normal file
36
docs/components/vector-databases/opensearch.mdx
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
title: OpenSearch
|
||||
---
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[opensearch]'
|
||||
```
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load opensearch configuration from yaml file
|
||||
app = App.from_config(config_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: opensearch
|
||||
config:
|
||||
collection_name: 'my-app'
|
||||
opensearch_url: 'https://localhost:9200'
|
||||
http_auth:
|
||||
- admin
|
||||
- admin
|
||||
vector_dimension: 1536
|
||||
use_ssl: false
|
||||
verify_certs: false
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
<Snippet file="missing-vector-db-tip.mdx" />
|
||||
106
docs/components/vector-databases/pinecone.mdx
Normal file
106
docs/components/vector-databases/pinecone.mdx
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
title: Pinecone
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Install pinecone related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[pinecone]'
|
||||
```
|
||||
|
||||
In order to use Pinecone as vector database, set the environment variable `PINECONE_API_KEY` which you can find on [Pinecone dashboard](https://app.pinecone.io/).
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# Load pinecone configuration from yaml file
|
||||
app = App.from_config(config_path="pod_config.yaml")
|
||||
# Or
|
||||
app = App.from_config(config_path="serverless_config.yaml")
|
||||
```
|
||||
|
||||
```yaml pod_config.yaml
|
||||
vectordb:
|
||||
provider: pinecone
|
||||
config:
|
||||
metric: cosine
|
||||
vector_dimension: 1536
|
||||
index_name: my-pinecone-index
|
||||
pod_config:
|
||||
environment: gcp-starter
|
||||
metadata_config:
|
||||
indexed:
|
||||
- "url"
|
||||
- "hash"
|
||||
```
|
||||
|
||||
```yaml serverless_config.yaml
|
||||
vectordb:
|
||||
provider: pinecone
|
||||
config:
|
||||
metric: cosine
|
||||
vector_dimension: 1536
|
||||
index_name: my-pinecone-index
|
||||
serverless_config:
|
||||
cloud: aws
|
||||
region: us-west-2
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
<br />
|
||||
<Note>
|
||||
You can find more information about Pinecone configuration [here](https://docs.pinecone.io/docs/manage-indexes#create-a-pod-based-index).
|
||||
You can also optionally provide `index_name` as a config param in yaml file to specify the index name. If not provided, the index name will be `{collection_name}-{vector_dimension}`.
|
||||
</Note>
|
||||
|
||||
## Usage
|
||||
|
||||
### Hybrid search
|
||||
|
||||
Here is an example of how you can do hybrid search using Pinecone as a vector database through Embedchain.
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
from embedchain import App
|
||||
|
||||
config = {
|
||||
'app': {
|
||||
"config": {
|
||||
"id": "ec-docs-hybrid-search"
|
||||
}
|
||||
},
|
||||
'vectordb': {
|
||||
'provider': 'pinecone',
|
||||
'config': {
|
||||
'metric': 'dotproduct',
|
||||
'vector_dimension': 1536,
|
||||
'index_name': 'my-index',
|
||||
'serverless_config': {
|
||||
'cloud': 'aws',
|
||||
'region': 'us-west-2'
|
||||
},
|
||||
'hybrid_search': True, # Remember to set this for hybrid search
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Initialize app
|
||||
app = App.from_config(config=config)
|
||||
|
||||
# Add documents
|
||||
app.add("/path/to/file.pdf", data_type="pdf_file", namespace="my-namespace")
|
||||
|
||||
# Query
|
||||
app.query("<YOUR QUESTION HERE>", namespace="my-namespace")
|
||||
```
|
||||
|
||||
Under the hood, Embedchain fetches the relevant chunks from the documents you added by doing hybrid search on the pinecone index.
|
||||
If you have questions on how pinecone hybrid search works, please refer to their [offical documentation here](https://docs.pinecone.io/docs/hybrid-search).
|
||||
|
||||
<Snippet file="missing-vector-db-tip.mdx" />
|
||||
23
docs/components/vector-databases/qdrant.mdx
Normal file
23
docs/components/vector-databases/qdrant.mdx
Normal file
@@ -0,0 +1,23 @@
|
||||
---
|
||||
title: Qdrant
|
||||
---
|
||||
|
||||
In order to use Qdrant as a vector database, set the environment variables `QDRANT_URL` and `QDRANT_API_KEY` which you can find on [Qdrant Dashboard](https://cloud.qdrant.io/).
|
||||
|
||||
<CodeGroup>
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load qdrant configuration from yaml file
|
||||
app = App.from_config(config_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: qdrant
|
||||
config:
|
||||
collection_name: my_qdrant_index
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
<Snippet file="missing-vector-db-tip.mdx" />
|
||||
24
docs/components/vector-databases/weaviate.mdx
Normal file
24
docs/components/vector-databases/weaviate.mdx
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
title: Weaviate
|
||||
---
|
||||
|
||||
|
||||
In order to use Weaviate as a vector database, set the environment variables `WEAVIATE_ENDPOINT` and `WEAVIATE_API_KEY` which you can find on [Weaviate dashboard](https://console.weaviate.cloud/dashboard).
|
||||
|
||||
<CodeGroup>
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load weaviate configuration from yaml file
|
||||
app = App.from_config(config_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: weaviate
|
||||
config:
|
||||
collection_name: my_weaviate_index
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
<Snippet file="missing-vector-db-tip.mdx" />
|
||||
39
docs/components/vector-databases/zilliz.mdx
Normal file
39
docs/components/vector-databases/zilliz.mdx
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
title: Zilliz
|
||||
---
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[milvus]'
|
||||
```
|
||||
|
||||
Set the Zilliz environment variables `ZILLIZ_CLOUD_URI` and `ZILLIZ_CLOUD_TOKEN` which you can find it on their [cloud platform](https://cloud.zilliz.com/).
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
import os
|
||||
from embedchain import App
|
||||
|
||||
os.environ['ZILLIZ_CLOUD_URI'] = 'https://xxx.zillizcloud.com'
|
||||
os.environ['ZILLIZ_CLOUD_TOKEN'] = 'xxx'
|
||||
|
||||
# load zilliz configuration from yaml file
|
||||
app = App.from_config(config_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: zilliz
|
||||
config:
|
||||
collection_name: 'zilliz_app'
|
||||
uri: https://xxxx.api.gcp-region.zillizcloud.com
|
||||
token: xxx
|
||||
vector_dim: 1536
|
||||
metric_type: L2
|
||||
```
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
<Snippet file="missing-vector-db-tip.mdx" />
|
||||
Reference in New Issue
Block a user