136 lines
2.8 KiB
Plaintext
136 lines
2.8 KiB
Plaintext
---
|
|
title: 🧩 Embedding models
|
|
---
|
|
|
|
## Overview
|
|
|
|
Embedchain supports several embedding models from the following providers:
|
|
|
|
<CardGroup cols={4}>
|
|
<Card title="OpenAI" href="#openai"></Card>
|
|
<Card title="GPT4All" href="#gpt4all"></Card>
|
|
<Card title="Hugging Face" href="#hugging-face"></Card>
|
|
<Card title="Vertex AI" href="#vertex-ai"></Card>
|
|
</CardGroup>
|
|
|
|
## OpenAI
|
|
|
|
To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
|
|
|
|
Once you have obtained the key, you can use it like this:
|
|
|
|
<CodeGroup>
|
|
|
|
```python main.py
|
|
import os
|
|
from embedchain import App
|
|
|
|
os.environ['OPENAI_API_KEY'] = 'xxx'
|
|
|
|
# load embedding model configuration from openai.yaml file
|
|
app = App.from_config(yaml_path="openai.yaml")
|
|
|
|
app.add("https://en.wikipedia.org/wiki/OpenAI")
|
|
app.query("What is OpenAI?")
|
|
```
|
|
|
|
```yaml openai.yaml
|
|
embedder:
|
|
provider: openai
|
|
config:
|
|
model: 'text-embedding-ada-002'
|
|
```
|
|
|
|
</CodeGroup>
|
|
|
|
## GPT4ALL
|
|
|
|
GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
|
|
|
|
<CodeGroup>
|
|
|
|
```python main.py
|
|
from embedchain import App
|
|
|
|
# load embedding model configuration from gpt4all.yaml file
|
|
app = App.from_config(yaml_path="gpt4all.yaml")
|
|
```
|
|
|
|
```yaml gpt4all.yaml
|
|
llm:
|
|
provider: gpt4all
|
|
model: 'orca-mini-3b.ggmlv3.q4_0.bin'
|
|
config:
|
|
temperature: 0.5
|
|
max_tokens: 1000
|
|
top_p: 1
|
|
stream: false
|
|
|
|
embedder:
|
|
provider: gpt4all
|
|
config:
|
|
model: 'all-MiniLM-L6-v2'
|
|
```
|
|
|
|
</CodeGroup>
|
|
|
|
## Hugging Face
|
|
|
|
Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
|
|
|
|
<CodeGroup>
|
|
|
|
```python main.py
|
|
from embedchain import App
|
|
|
|
# load embedding model configuration from huggingface.yaml file
|
|
app = App.from_config(yaml_path="huggingface.yaml")
|
|
```
|
|
|
|
```yaml huggingface.yaml
|
|
llm:
|
|
provider: huggingface
|
|
model: 'google/flan-t5-xxl'
|
|
config:
|
|
temperature: 0.5
|
|
max_tokens: 1000
|
|
top_p: 0.5
|
|
stream: false
|
|
|
|
embedder:
|
|
provider: huggingface
|
|
config:
|
|
model: 'sentence-transformers/all-mpnet-base-v2'
|
|
```
|
|
|
|
</CodeGroup>
|
|
|
|
## Vertex AI
|
|
|
|
Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
|
|
|
|
<CodeGroup>
|
|
|
|
```python main.py
|
|
from embedchain import App
|
|
|
|
# load embedding model configuration from vertexai.yaml file
|
|
app = App.from_config(yaml_path="vertexai.yaml")
|
|
```
|
|
|
|
```yaml vertexai.yaml
|
|
llm:
|
|
provider: vertexai
|
|
model: 'chat-bison'
|
|
config:
|
|
temperature: 0.5
|
|
top_p: 0.5
|
|
|
|
embedder:
|
|
provider: vertexai
|
|
config:
|
|
model: 'textembedding-gecko'
|
|
```
|
|
|
|
</CodeGroup>
|