[Docs]: Clean up docs (#802)
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
[](https://colab.research.google.com/drive/138lMWhENGeEu7Q1-6lNbNTHGLZXBBz_B?usp=sharing)
|
||||
[](https://codecov.io/gh/embedchain/embedchain)
|
||||
|
||||
Embedchain is a framework to easily create LLM powered bots over any dataset. If you want a javascript version, check out [embedchain-js](https://github.com/embedchain/embedchain/tree/main/embedchain-js)
|
||||
Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data. If you want a javascript version, check out [embedchain-js](https://github.com/embedchain/embedchain/tree/main/embedchain-js)
|
||||
|
||||
## Community
|
||||
|
||||
@@ -94,7 +94,7 @@ If you utilize this repository, please consider citing it with:
|
||||
```
|
||||
@misc{embedchain,
|
||||
author = {Taranjeet Singh, Deshraj Yadav},
|
||||
title = {Embedchain: Framework to easily create LLM powered bots over any dataset},
|
||||
title = {Embedchain: Data platform for LLMs - load, index, retrieve, and sync any unstructured data},
|
||||
year = {2023},
|
||||
publisher = {GitHub},
|
||||
journal = {GitHub repository},
|
||||
|
||||
@@ -25,7 +25,7 @@ llm:
|
||||
vectordb:
|
||||
provider: chroma
|
||||
config:
|
||||
collection_name: 'full-stack-app'
|
||||
collection_name: 'my-collection-name'
|
||||
dir: db
|
||||
allow_reset: true
|
||||
|
||||
|
||||
@@ -1,66 +0,0 @@
|
||||
---
|
||||
title: '🔍 Query configurations'
|
||||
---
|
||||
|
||||
## AppConfig
|
||||
|
||||
| option | description | type | default |
|
||||
|-----------|-----------------------|---------------------------------|------------------------|
|
||||
| log_level | log level | string | WARNING |
|
||||
| embedding_fn| embedding function | chromadb.utils.embedding_functions | \{text-embedding-ada-002\} |
|
||||
| db | vector database (experimental) | BaseVectorDB | ChromaDB |
|
||||
| collection_name | initial collection name for the database | string | embedchain_store |
|
||||
| collect_metrics | collect anonymous telemetry data to improve embedchain | boolean | true |
|
||||
|
||||
|
||||
## AddConfig
|
||||
|
||||
|option|description|type|default|
|
||||
|---|---|---|---|
|
||||
|chunker|chunker config|ChunkerConfig|Default values for chunker depends on the `data_type`. Please refer [ChunkerConfig](#chunker-config)|
|
||||
|loader|loader config|LoaderConfig|None|
|
||||
|
||||
Yes, you are passing `ChunkerConfig` to `AddConfig`, like so:
|
||||
|
||||
```python
|
||||
chunker_config = ChunkerConfig(chunk_size=100)
|
||||
add_config = AddConfig(chunker=chunker_config)
|
||||
app.add("lorem ipsum", config=add_config)
|
||||
```
|
||||
|
||||
### ChunkerConfig
|
||||
|
||||
|option|description|type|default|
|
||||
|---|---|---|---|
|
||||
|chunk_size|Maximum size of chunks to return|int|Default value for various `data_type` mentioned below|
|
||||
|chunk_overlap|Overlap in characters between chunks|int|Default value for various `data_type` mentioned below|
|
||||
|length_function|Function that measures the length of given chunks|typing.Callable|Default value for various `data_type` mentioned below|
|
||||
|
||||
Default values of chunker config parameters for different `data_type`:
|
||||
|
||||
|data_type|chunk_size|chunk_overlap|length_function|
|
||||
|---|---|---|---|
|
||||
|docx|1000|0|len|
|
||||
|text|300|0|len|
|
||||
|qna_pair|300|0|len|
|
||||
|web_page|500|0|len|
|
||||
|pdf_file|1000|0|len|
|
||||
|youtube_video|2000|0|len|
|
||||
|docs_site|500|50|len|
|
||||
|notion|300|0|len|
|
||||
|
||||
## BaseLlmConfig
|
||||
|
||||
|option|description|type|default|
|
||||
|---|---|---|---|
|
||||
|number_documents|Absolute number of documents to pull from the database as context.|int|1
|
||||
|template|custom template for prompt. If history is used with query, $history has to be included as well.|Template|Template("Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. \$context Query: \$query Helpful Answer:")|
|
||||
|model|name of the model used.|string|depends on app type|
|
||||
|temperature|Controls the randomness of the model's output. Higher values (closer to 1) make output more random, lower values make it more deterministic.|float|0|
|
||||
|max_tokens|Controls how many tokens are used. Exact implementation (whether it counts prompt and/or response) depends on the model.|int|1000|
|
||||
|top_p|Controls the diversity of words. Higher values (closer to 1) make word selection more diverse, lower values make words less diverse.|float|1|
|
||||
|history|include conversation history from your client or database.|any (recommendation: list[str])|None|
|
||||
|stream|control if response is streamed back to the user.|bool|False|
|
||||
|deployment_name|t.b.a.|str|None|
|
||||
|system_prompt|System prompt string. Unused if none.|str|None|
|
||||
|where|filter for context search.|dict|None|
|
||||
@@ -1,40 +0,0 @@
|
||||
---
|
||||
title: '🧪 Testing'
|
||||
---
|
||||
|
||||
## Methods for testing
|
||||
|
||||
### Dry Run
|
||||
|
||||
Before you consume valueable tokens, you should make sure that data chunks are properly created and the embedding you have done works and that it's receiving the correct document from the database.
|
||||
|
||||
- For `query` or `chat` method, you can add this to your script:
|
||||
|
||||
```python
|
||||
print(naval_chat_bot.query('Can you tell me who Naval Ravikant is?', dry_run=True))
|
||||
|
||||
'''
|
||||
Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
|
||||
Q: Who is Naval Ravikant?
|
||||
A: Naval Ravikant is an Indian-American entrepreneur and investor.
|
||||
Query: Can you tell me who Naval Ravikant is?
|
||||
Helpful Answer:
|
||||
'''
|
||||
```
|
||||
|
||||
_The embedding is confirmed to work as expected. It returns the right document, even if the question is asked slightly different. No prompt tokens have been consumed._
|
||||
|
||||
The dry run will still consume tokens to embed your query, but it is only **~1/15 of the prompt.**
|
||||
|
||||
|
||||
- For `add` method, you can add this to your script:
|
||||
|
||||
```python
|
||||
print(naval_chat_bot.add('https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', dry_run=True))
|
||||
|
||||
'''
|
||||
{'chunks': ['THE ALMANACK OF NAVAL RAVIKANT', 'GETTING RICH IS NOT JUST ABOUT LUCK;', 'HAPPINESS IS NOT JUST A TRAIT WE ARE'], 'metadata': [{'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 0, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}, {'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 2, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}, {'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 2, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}], 'count': 7358, 'type': <DataType.PDF_FILE: 'pdf_file'>}
|
||||
|
||||
# less items to show for readability
|
||||
'''
|
||||
```
|
||||
@@ -1,118 +0,0 @@
|
||||
---
|
||||
title: '💾 Vector Database'
|
||||
---
|
||||
|
||||
We support `Chroma`, `Elasticsearch` and `OpenSearch` as vector databases.
|
||||
`Chroma` is used as a default database.
|
||||
|
||||
## Elasticsearch
|
||||
|
||||
### Minimal Example
|
||||
|
||||
In order to use `Elasticsearch` as vector database we need to use App type `CustomApp`.
|
||||
|
||||
1. Set the environment variables in a `.env` file.
|
||||
```
|
||||
OPENAI_API_KEY=sk-SECRETKEY
|
||||
ELASTICSEARCH_API_KEY=SECRETKEY==
|
||||
ELASTICSEARCH_URL=https://secret-domain.europe-west3.gcp.cloud.es.io:443
|
||||
```
|
||||
Please note that the key needs certain privileges. For testing you can just toggle off `restrict privileges` under `/app/management/security/api_keys/` in your web interface.
|
||||
|
||||
2. Load the app
|
||||
```python
|
||||
from embedchain import CustomApp
|
||||
from embedchain.embedder.openai import OpenAIEmbedder
|
||||
from embedchain.llm.openai import OpenAILlm
|
||||
from embedchain.vectordb.elasticsearch import ElasticsearchDB
|
||||
|
||||
es_app = CustomApp(
|
||||
llm=OpenAILlm(),
|
||||
embedder=OpenAIEmbedder(),
|
||||
db=ElasticsearchDB(),
|
||||
)
|
||||
```
|
||||
|
||||
### More custom settings
|
||||
|
||||
You can get a URL for elasticsearch in the cloud, or run it locally.
|
||||
The following example shows you how to configure embedchain to work with a locally running elasticsearch.
|
||||
|
||||
Instead of using an API key, we use http login credentials. The localhost url can be defined in .env or in the config.
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
from embedchain import CustomApp
|
||||
from embedchain.config import CustomAppConfig, ElasticsearchDBConfig
|
||||
from embedchain.embedder.openai import OpenAIEmbedder
|
||||
from embedchain.llm.openai import OpenAILlm
|
||||
from embedchain.vectordb.elasticsearch import ElasticsearchDB
|
||||
|
||||
es_config = ElasticsearchDBConfig(
|
||||
# elasticsearch url or list of nodes url with different hosts and ports.
|
||||
es_url='https://localhost:9200',
|
||||
# pass named parameters supported by Python Elasticsearch client
|
||||
http_auth=("elastic", "secret"),
|
||||
ca_certs="~/binaries/elasticsearch-8.7.0/config/certs/http_ca.crt" # your cert path
|
||||
# verify_certs=False # Alternative, if you aren't using certs
|
||||
) # pass named parameters supported by elasticsearch-py
|
||||
|
||||
es_app = CustomApp(
|
||||
config=CustomAppConfig(log_level="INFO"),
|
||||
llm=OpenAILlm(),
|
||||
embedder=OpenAIEmbedder(),
|
||||
db=ElasticsearchDB(config=es_config),
|
||||
)
|
||||
```
|
||||
3. This should log your connection details to the console.
|
||||
4. Alternatively to a URL, you `ElasticsearchDBConfig` accepts `es_url` as a list of nodes url with different hosts and ports.
|
||||
5. Additionally we can pass named parameters supported by Python Elasticsearch client.
|
||||
|
||||
|
||||
## OpenSearch 🔍
|
||||
|
||||
To use OpenSearch as a vector database with a CustomApp, follow these simple steps:
|
||||
|
||||
1. Set the `OPENAI_API_KEY` environment variable:
|
||||
|
||||
```
|
||||
OPENAI_API_KEY=sk-xxxx
|
||||
```
|
||||
|
||||
2. Define the OpenSearch configuration in your Python code:
|
||||
|
||||
```python
|
||||
from embedchain import CustomApp
|
||||
from embedchain.config import OpenSearchDBConfig
|
||||
from embedchain.embedder.openai import OpenAIEmbedder
|
||||
from embedchain.llm.openai import OpenAILlm
|
||||
from embedchain.vectordb.opensearch import OpenSearchDB
|
||||
|
||||
opensearch_url = "https://localhost:9200"
|
||||
http_auth = ("username", "password")
|
||||
|
||||
db_config = OpenSearchDBConfig(
|
||||
opensearch_url=opensearch_url,
|
||||
http_auth=http_auth,
|
||||
collection_name="embedchain-app",
|
||||
use_ssl=True,
|
||||
timeout=30,
|
||||
)
|
||||
db = OpenSearchDB(config=db_config)
|
||||
```
|
||||
|
||||
2. Instantiate the app and add data:
|
||||
|
||||
```python
|
||||
app = CustomApp(llm=OpenAILlm(), embedder=OpenAIEmbedder(), db=db)
|
||||
app.add("https://en.wikipedia.org/wiki/Elon_Musk")
|
||||
app.add("https://www.forbes.com/profile/elon-musk")
|
||||
app.add("https://www.britannica.com/biography/Elon-Musk")
|
||||
```
|
||||
|
||||
3. You're all set! Start querying using the following command:
|
||||
|
||||
```python
|
||||
app.query("What is the net worth of Elon Musk?")
|
||||
```
|
||||
@@ -4,15 +4,25 @@ title: 🤝 Connect with Us
|
||||
|
||||
We believe in building a vibrant and supportive community around embedchain. There are various channels through which you can connect with us, stay updated, and contribute to the ongoing discussions:
|
||||
|
||||
|
||||
* Slack: Our Slack workspace provides a platform for more structured discussions and channels dedicated to different topics. Feel free to jump in and start contributing. [Join Slack](https://join.slack.com/t/embedchain/shared_invite/zt-22uwz3c46-Zg7cIh5rOBteT_xe1jwLDw).
|
||||
|
||||
* Discord: Join our Discord server to engage in real-time conversations with the community members and the project maintainers. It’s a great place to seek help and discuss anything related to the project. [Join Discord](https://discord.gg/CUU9FPhRNt).
|
||||
|
||||
* Twitter: Follow us on Twitter for the latest news, announcements, and highlights from our community. It’s also a quick way to reach out to us. [Follow @embedchain](https://twitter.com/embedchain).
|
||||
|
||||
* LinkedIn: Connect with us on LinkedIn to stay updated on official announcements, job openings, and professional networking opportunities within our community. [Follow Our Page](https://www.linkedin.com/company/embedchain/).
|
||||
|
||||
* Newsletter: Subscribe to our newsletter for a curated list of project updates, community contributions, and upcoming events. It’s a compact way to stay in the loop with what’s happening in our community. [Subscribe Now](https://embedchain.substack.com/).
|
||||
<CardGroup cols={3}>
|
||||
<Card title="Twitter" icon="twitter" href="https://twitter.com/embedchain">
|
||||
Follow us on Twitter
|
||||
</Card>
|
||||
<Card title="Slack" icon="slack" href="https://join.slack.com/t/embedchain/shared_invite/zt-22uwz3c46-Zg7cIh5rOBteT_xe1jwLDw" color="#4A154B">
|
||||
Join our slack community
|
||||
</Card>
|
||||
<Card title="Discord" icon="discord" href="https://discord.gg/6PzXDgEjG5" color="#7289DA">
|
||||
Join our discord community
|
||||
</Card>
|
||||
<Card title="LinkedIn" icon="linkedin" href="https://www.linkedin.com/company/embedchain/">
|
||||
Connect with us on LinkedIn
|
||||
</Card>
|
||||
<Card title="Schedule a call" icon="calendar" href="https://cal.com/taranjeetio/ec">
|
||||
Schedule a call with Embedchain founder
|
||||
</Card>
|
||||
<Card title="Newsletter" icon="message" href="https://embedchain.substack.com/">
|
||||
Subscribe to our newsletter
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
We look forward to connecting with you and seeing how we can create amazing things together!
|
||||
|
||||
@@ -27,14 +27,14 @@ from embedchain import App
|
||||
|
||||
os.environ['OPENAI_API_KEY'] = 'xxx'
|
||||
|
||||
# load embedding model configuration from openai.yaml file
|
||||
app = App.from_config(yaml_path="openai.yaml")
|
||||
# load embedding model configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
|
||||
app.add("https://en.wikipedia.org/wiki/OpenAI")
|
||||
app.query("What is OpenAI?")
|
||||
```
|
||||
|
||||
```yaml openai.yaml
|
||||
```yaml config.yaml
|
||||
embedder:
|
||||
provider: openai
|
||||
config:
|
||||
@@ -52,11 +52,11 @@ GPT4All supports generating high quality embeddings of arbitrary length document
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load embedding model configuration from gpt4all.yaml file
|
||||
app = App.from_config(yaml_path="gpt4all.yaml")
|
||||
# load embedding model configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml gpt4all.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: gpt4all
|
||||
model: 'orca-mini-3b.ggmlv3.q4_0.bin'
|
||||
@@ -83,11 +83,11 @@ Hugging Face supports generating embeddings of arbitrary length documents of tex
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load embedding model configuration from huggingface.yaml file
|
||||
app = App.from_config(yaml_path="huggingface.yaml")
|
||||
# load embedding model configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml huggingface.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: huggingface
|
||||
model: 'google/flan-t5-xxl'
|
||||
@@ -114,11 +114,11 @@ Embedchain supports Google's VertexAI embeddings model through a simple interfac
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load embedding model configuration from vertexai.yaml file
|
||||
app = App.from_config(yaml_path="vertexai.yaml")
|
||||
# load embedding model configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml vertexai.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: vertexai
|
||||
model: 'chat-bison'
|
||||
|
||||
@@ -35,7 +35,7 @@ app.add("https://en.wikipedia.org/wiki/OpenAI")
|
||||
app.query("What is OpenAI?")
|
||||
```
|
||||
|
||||
If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/embedchain/yaml/chroma.yaml) file.
|
||||
If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/configs/chroma.yaml) file.
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
@@ -45,11 +45,11 @@ from embedchain import App
|
||||
|
||||
os.environ['OPENAI_API_KEY'] = 'xxx'
|
||||
|
||||
# load llm configuration from openai.yaml file
|
||||
app = App.from_config(yaml_path="openai.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml openai.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: openai
|
||||
model: 'gpt-3.5-turbo'
|
||||
@@ -79,11 +79,11 @@ from embedchain import App
|
||||
|
||||
os.environ["ANTHROPIC_API_KEY"] = "xxx"
|
||||
|
||||
# load llm configuration from anthropic.yaml file
|
||||
app = App.from_config(yaml_path="anthropic.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml anthropic.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: anthropic
|
||||
model: 'claude-instant-1'
|
||||
@@ -96,15 +96,14 @@ llm:
|
||||
|
||||
</CodeGroup>
|
||||
|
||||
<br />
|
||||
|
||||
<Tip>
|
||||
You may also have to set the `OPENAI_API_KEY` if you use the OpenAI's embedding model.
|
||||
</Tip>
|
||||
|
||||
|
||||
## Cohere
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[cohere]'
|
||||
```
|
||||
|
||||
Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys).
|
||||
|
||||
Once you have the API key, you are all set to use it with Embedchain.
|
||||
@@ -117,11 +116,11 @@ from embedchain import App
|
||||
|
||||
os.environ["COHERE_API_KEY"] = "xxx"
|
||||
|
||||
# load llm configuration from cohere.yaml file
|
||||
app = App.from_config(yaml_path="cohere.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml cohere.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: cohere
|
||||
model: large
|
||||
@@ -135,6 +134,12 @@ llm:
|
||||
|
||||
## GPT4ALL
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[opensource]'
|
||||
```
|
||||
|
||||
GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code:
|
||||
|
||||
<CodeGroup>
|
||||
@@ -142,11 +147,11 @@ GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or inte
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load llm configuration from gpt4all.yaml file
|
||||
app = App.from_config(yaml_path="gpt4all.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml gpt4all.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: gpt4all
|
||||
model: 'orca-mini-3b.ggmlv3.q4_0.bin'
|
||||
@@ -177,11 +182,11 @@ import os
|
||||
from embedchain import App
|
||||
|
||||
os.environ["JINACHAT_API_KEY"] = "xxx"
|
||||
# load llm configuration from jina.yaml file
|
||||
app = App.from_config(yaml_path="jina.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml jina.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: jina
|
||||
config:
|
||||
@@ -195,6 +200,13 @@ llm:
|
||||
|
||||
## Hugging Face
|
||||
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[huggingface_hub]'
|
||||
```
|
||||
|
||||
First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens).
|
||||
|
||||
Once you have the token, load the app using the config yaml file:
|
||||
@@ -207,11 +219,11 @@ from embedchain import App
|
||||
|
||||
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"
|
||||
|
||||
# load llm configuration from huggingface.yaml file
|
||||
app = App.from_config(yaml_path="huggingface.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml huggingface.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: huggingface
|
||||
model: 'google/flan-t5-xxl'
|
||||
@@ -237,11 +249,11 @@ from embedchain import App
|
||||
|
||||
os.environ["REPLICATE_API_TOKEN"] = "xxx"
|
||||
|
||||
# load llm configuration from llama2.yaml file
|
||||
app = App.from_config(yaml_path="llama2.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml llama2.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: llama2
|
||||
model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5'
|
||||
@@ -262,11 +274,11 @@ Setup Google Cloud Platform application credentials by following the instruction
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load llm configuration from vertexai.yaml file
|
||||
app = App.from_config(yaml_path="vertexai.yaml")
|
||||
# load llm configuration from config.yaml file
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml vertexai.yaml
|
||||
```yaml config.yaml
|
||||
llm:
|
||||
provider: vertexai
|
||||
model: 'chat-bison'
|
||||
|
||||
@@ -25,10 +25,10 @@ Utilizing a vector database alongside Embedchain is a seamless process. All you
|
||||
from embedchain import App
|
||||
|
||||
# load chroma configuration from yaml file
|
||||
app = App.from_config(yaml_path="chroma-config-1.yaml")
|
||||
app = App.from_config(yaml_path="config1.yaml")
|
||||
```
|
||||
|
||||
```yaml chroma-config-1.yaml
|
||||
```yaml config1.yaml
|
||||
vectordb:
|
||||
provider: chroma
|
||||
config:
|
||||
@@ -37,7 +37,7 @@ vectordb:
|
||||
allow_reset: true
|
||||
```
|
||||
|
||||
```yaml chroma-config-2.yaml
|
||||
```yaml config2.yaml
|
||||
vectordb:
|
||||
provider: chroma
|
||||
config:
|
||||
@@ -52,16 +52,22 @@ vectordb:
|
||||
|
||||
## Elasticsearch
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[elasticsearch]'
|
||||
```
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load elasticsearch configuration from yaml file
|
||||
app = App.from_config(yaml_path="elasticsearch.yaml")
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml elasticsearch.yaml
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: elasticsearch
|
||||
config:
|
||||
@@ -74,16 +80,22 @@ vectordb:
|
||||
|
||||
## OpenSearch
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[opensearch]'
|
||||
```
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load opensearch configuration from yaml file
|
||||
app = App.from_config(yaml_path="opensearch.yaml")
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml opensearch.yaml
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: opensearch
|
||||
config:
|
||||
@@ -101,16 +113,22 @@ vectordb:
|
||||
|
||||
## Zilliz
|
||||
|
||||
Install related dependencies using the following command:
|
||||
|
||||
```bash
|
||||
pip install --upgrade 'embedchain[milvus]'
|
||||
```
|
||||
|
||||
<CodeGroup>
|
||||
|
||||
```python main.py
|
||||
from embedchain import App
|
||||
|
||||
# load zilliz configuration from yaml file
|
||||
app = App.from_config(yaml_path="zilliz.yaml")
|
||||
app = App.from_config(yaml_path="config.yaml")
|
||||
```
|
||||
|
||||
```yaml zilliz.yaml
|
||||
```yaml config.yaml
|
||||
vectordb:
|
||||
provider: zilliz
|
||||
config:
|
||||
|
||||
@@ -35,12 +35,9 @@ embedchain is built on the following stack:
|
||||
|
||||
## Team
|
||||
|
||||
### Author
|
||||
### Authors
|
||||
|
||||
- Taranjeet Singh ([@taranjeetio](https://twitter.com/taranjeetio))
|
||||
|
||||
### Maintainer
|
||||
|
||||
- Deshraj Yadav ([@deshrajdry](https://twitter.com/taranjeetio))
|
||||
|
||||
### Citation
|
||||
@@ -49,8 +46,8 @@ If you utilize this repository, please consider citing it with:
|
||||
|
||||
```
|
||||
@misc{embedchain,
|
||||
author = {Taranjeet Singh},
|
||||
title = {Embechain: Framework to easily create LLM powered bots over any dataset},
|
||||
author = {Taranjeet Singh, Deshraj Yadav},
|
||||
title = {Embechain: Data platform for LLMs - Load, index, retrieve and sync any unstructured data},
|
||||
year = {2023},
|
||||
publisher = {GitHub},
|
||||
journal = {GitHub repository},
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
title: 📚 Introduction
|
||||
description: '📝 Embedchain is a framework to easily create LLM powered apps on your data.'
|
||||
description: '📝 Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data'
|
||||
---
|
||||
|
||||
## 🤔 What is Embedchain?
|
||||
@@ -27,9 +27,6 @@ naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American e
|
||||
|
||||
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
|
||||
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
|
||||
|
||||
# Ask questions with specific context
|
||||
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", where={'chapter': 'philosophy'})
|
||||
```
|
||||
|
||||
## 🚀 How it works?
|
||||
|
||||
@@ -3,6 +3,8 @@ title: '🚀 Quickstart'
|
||||
description: '💡 Start building LLM powered apps under 30 seconds'
|
||||
---
|
||||
|
||||
Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data.
|
||||
|
||||
Install embedchain python package:
|
||||
|
||||
```bash
|
||||
@@ -20,9 +22,11 @@ app = App()
|
||||
</Step>
|
||||
<Step title="🗃️ Add data sources">
|
||||
```python
|
||||
# Embed online resources
|
||||
# Add different data sources
|
||||
elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk")
|
||||
elon_bot.add("https://www.forbes.com/profile/elon-musk")
|
||||
# You can also add local data sources such as pdf, csv files etc.
|
||||
# elon_bot.add("/path/to/file.pdf")
|
||||
```
|
||||
</Step>
|
||||
<Step title="💬 Query or chat on your data and get answers">
|
||||
@@ -42,9 +46,11 @@ from embedchain import App
|
||||
os.environ["OPENAI_API_KEY"] = "xxx"
|
||||
elon_bot = App()
|
||||
|
||||
# Embed online resources
|
||||
# Add different data sources
|
||||
elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk")
|
||||
elon_bot.add("https://www.forbes.com/profile/elon-musk")
|
||||
# You can also add local data sources such as pdf, csv files etc.
|
||||
# elon_bot.add("/path/to/file.pdf")
|
||||
|
||||
response = elon_bot.query("What is the net worth of Elon Musk today?")
|
||||
print(response)
|
||||
|
||||
@@ -28,7 +28,7 @@ export default function Home() {
|
||||
Welcome to Embedchain Playground
|
||||
</h1>
|
||||
<p className="mb-6 text-lg font-normal text-gray-500 lg:text-xl">
|
||||
embedchain is a framework to easily create LLM powered bots over any
|
||||
Embedchain is a Data Platform for LLMs - Load, index, retrieve, and sync any unstructured data
|
||||
dataset
|
||||
</p>
|
||||
</div>
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
[tool.poetry]
|
||||
name = "embedchain"
|
||||
version = "0.0.70"
|
||||
description = "Embedchain is a framework to easily create LLM powered apps over any dataset"
|
||||
description = "Data platform for LLMs - Load, index, retrieve and sync any unstructured data"
|
||||
authors = ["Taranjeet Singh, Deshraj Yadav"]
|
||||
license = "Apache License"
|
||||
readme = "README.md"
|
||||
|
||||
Reference in New Issue
Block a user