[Docs]: Clean up docs (#802)

This commit is contained in:
Deshraj Yadav
2023-10-14 19:14:24 -07:00
committed by GitHub
parent 4a8c50f886
commit 77c90a308e
14 changed files with 120 additions and 304 deletions

View File

@@ -8,7 +8,7 @@
[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/138lMWhENGeEu7Q1-6lNbNTHGLZXBBz_B?usp=sharing) [![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/138lMWhENGeEu7Q1-6lNbNTHGLZXBBz_B?usp=sharing)
[![codecov](https://codecov.io/gh/embedchain/embedchain/graph/badge.svg?token=EMRRHZXW1Q)](https://codecov.io/gh/embedchain/embedchain) [![codecov](https://codecov.io/gh/embedchain/embedchain/graph/badge.svg?token=EMRRHZXW1Q)](https://codecov.io/gh/embedchain/embedchain)
Embedchain is a framework to easily create LLM powered bots over any dataset. If you want a javascript version, check out [embedchain-js](https://github.com/embedchain/embedchain/tree/main/embedchain-js) Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data. If you want a javascript version, check out [embedchain-js](https://github.com/embedchain/embedchain/tree/main/embedchain-js)
## Community ## Community
@@ -94,7 +94,7 @@ If you utilize this repository, please consider citing it with:
``` ```
@misc{embedchain, @misc{embedchain,
author = {Taranjeet Singh, Deshraj Yadav}, author = {Taranjeet Singh, Deshraj Yadav},
title = {Embedchain: Framework to easily create LLM powered bots over any dataset}, title = {Embedchain: Data platform for LLMs - load, index, retrieve, and sync any unstructured data},
year = {2023}, year = {2023},
publisher = {GitHub}, publisher = {GitHub},
journal = {GitHub repository}, journal = {GitHub repository},

View File

@@ -25,7 +25,7 @@ llm:
vectordb: vectordb:
provider: chroma provider: chroma
config: config:
collection_name: 'full-stack-app' collection_name: 'my-collection-name'
dir: db dir: db
allow_reset: true allow_reset: true

View File

@@ -1,66 +0,0 @@
---
title: '🔍 Query configurations'
---
## AppConfig
| option | description | type | default |
|-----------|-----------------------|---------------------------------|------------------------|
| log_level | log level | string | WARNING |
| embedding_fn| embedding function | chromadb.utils.embedding_functions | \{text-embedding-ada-002\} |
| db | vector database (experimental) | BaseVectorDB | ChromaDB |
| collection_name | initial collection name for the database | string | embedchain_store |
| collect_metrics | collect anonymous telemetry data to improve embedchain | boolean | true |
## AddConfig
|option|description|type|default|
|---|---|---|---|
|chunker|chunker config|ChunkerConfig|Default values for chunker depends on the `data_type`. Please refer [ChunkerConfig](#chunker-config)|
|loader|loader config|LoaderConfig|None|
Yes, you are passing `ChunkerConfig` to `AddConfig`, like so:
```python
chunker_config = ChunkerConfig(chunk_size=100)
add_config = AddConfig(chunker=chunker_config)
app.add("lorem ipsum", config=add_config)
```
### ChunkerConfig
|option|description|type|default|
|---|---|---|---|
|chunk_size|Maximum size of chunks to return|int|Default value for various `data_type` mentioned below|
|chunk_overlap|Overlap in characters between chunks|int|Default value for various `data_type` mentioned below|
|length_function|Function that measures the length of given chunks|typing.Callable|Default value for various `data_type` mentioned below|
Default values of chunker config parameters for different `data_type`:
|data_type|chunk_size|chunk_overlap|length_function|
|---|---|---|---|
|docx|1000|0|len|
|text|300|0|len|
|qna_pair|300|0|len|
|web_page|500|0|len|
|pdf_file|1000|0|len|
|youtube_video|2000|0|len|
|docs_site|500|50|len|
|notion|300|0|len|
## BaseLlmConfig
|option|description|type|default|
|---|---|---|---|
|number_documents|Absolute number of documents to pull from the database as context.|int|1
|template|custom template for prompt. If history is used with query, $history has to be included as well.|Template|Template("Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. \$context Query: \$query Helpful Answer:")|
|model|name of the model used.|string|depends on app type|
|temperature|Controls the randomness of the model's output. Higher values (closer to 1) make output more random, lower values make it more deterministic.|float|0|
|max_tokens|Controls how many tokens are used. Exact implementation (whether it counts prompt and/or response) depends on the model.|int|1000|
|top_p|Controls the diversity of words. Higher values (closer to 1) make word selection more diverse, lower values make words less diverse.|float|1|
|history|include conversation history from your client or database.|any (recommendation: list[str])|None|
|stream|control if response is streamed back to the user.|bool|False|
|deployment_name|t.b.a.|str|None|
|system_prompt|System prompt string. Unused if none.|str|None|
|where|filter for context search.|dict|None|

View File

@@ -1,40 +0,0 @@
---
title: '🧪 Testing'
---
## Methods for testing
### Dry Run
Before you consume valueable tokens, you should make sure that data chunks are properly created and the embedding you have done works and that it's receiving the correct document from the database.
- For `query` or `chat` method, you can add this to your script:
```python
print(naval_chat_bot.query('Can you tell me who Naval Ravikant is?', dry_run=True))
'''
Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Q: Who is Naval Ravikant?
A: Naval Ravikant is an Indian-American entrepreneur and investor.
Query: Can you tell me who Naval Ravikant is?
Helpful Answer:
'''
```
_The embedding is confirmed to work as expected. It returns the right document, even if the question is asked slightly different. No prompt tokens have been consumed._
The dry run will still consume tokens to embed your query, but it is only **~1/15 of the prompt.**
- For `add` method, you can add this to your script:
```python
print(naval_chat_bot.add('https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', dry_run=True))
'''
{'chunks': ['THE ALMANACK OF NAVAL RAVIKANT', 'GETTING RICH IS NOT JUST ABOUT LUCK;', 'HAPPINESS IS NOT JUST A TRAIT WE ARE'], 'metadata': [{'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 0, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}, {'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 2, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}, {'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 2, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}], 'count': 7358, 'type': <DataType.PDF_FILE: 'pdf_file'>}
# less items to show for readability
'''
```

View File

@@ -1,118 +0,0 @@
---
title: '💾 Vector Database'
---
We support `Chroma`, `Elasticsearch` and `OpenSearch` as vector databases.
`Chroma` is used as a default database.
## Elasticsearch
### Minimal Example
In order to use `Elasticsearch` as vector database we need to use App type `CustomApp`.
1. Set the environment variables in a `.env` file.
```
OPENAI_API_KEY=sk-SECRETKEY
ELASTICSEARCH_API_KEY=SECRETKEY==
ELASTICSEARCH_URL=https://secret-domain.europe-west3.gcp.cloud.es.io:443
```
Please note that the key needs certain privileges. For testing you can just toggle off `restrict privileges` under `/app/management/security/api_keys/` in your web interface.
2. Load the app
```python
from embedchain import CustomApp
from embedchain.embedder.openai import OpenAIEmbedder
from embedchain.llm.openai import OpenAILlm
from embedchain.vectordb.elasticsearch import ElasticsearchDB
es_app = CustomApp(
llm=OpenAILlm(),
embedder=OpenAIEmbedder(),
db=ElasticsearchDB(),
)
```
### More custom settings
You can get a URL for elasticsearch in the cloud, or run it locally.
The following example shows you how to configure embedchain to work with a locally running elasticsearch.
Instead of using an API key, we use http login credentials. The localhost url can be defined in .env or in the config.
```python
import os
from embedchain import CustomApp
from embedchain.config import CustomAppConfig, ElasticsearchDBConfig
from embedchain.embedder.openai import OpenAIEmbedder
from embedchain.llm.openai import OpenAILlm
from embedchain.vectordb.elasticsearch import ElasticsearchDB
es_config = ElasticsearchDBConfig(
# elasticsearch url or list of nodes url with different hosts and ports.
es_url='https://localhost:9200',
# pass named parameters supported by Python Elasticsearch client
http_auth=("elastic", "secret"),
ca_certs="~/binaries/elasticsearch-8.7.0/config/certs/http_ca.crt" # your cert path
# verify_certs=False # Alternative, if you aren't using certs
) # pass named parameters supported by elasticsearch-py
es_app = CustomApp(
config=CustomAppConfig(log_level="INFO"),
llm=OpenAILlm(),
embedder=OpenAIEmbedder(),
db=ElasticsearchDB(config=es_config),
)
```
3. This should log your connection details to the console.
4. Alternatively to a URL, you `ElasticsearchDBConfig` accepts `es_url` as a list of nodes url with different hosts and ports.
5. Additionally we can pass named parameters supported by Python Elasticsearch client.
## OpenSearch 🔍
To use OpenSearch as a vector database with a CustomApp, follow these simple steps:
1. Set the `OPENAI_API_KEY` environment variable:
```
OPENAI_API_KEY=sk-xxxx
```
2. Define the OpenSearch configuration in your Python code:
```python
from embedchain import CustomApp
from embedchain.config import OpenSearchDBConfig
from embedchain.embedder.openai import OpenAIEmbedder
from embedchain.llm.openai import OpenAILlm
from embedchain.vectordb.opensearch import OpenSearchDB
opensearch_url = "https://localhost:9200"
http_auth = ("username", "password")
db_config = OpenSearchDBConfig(
opensearch_url=opensearch_url,
http_auth=http_auth,
collection_name="embedchain-app",
use_ssl=True,
timeout=30,
)
db = OpenSearchDB(config=db_config)
```
2. Instantiate the app and add data:
```python
app = CustomApp(llm=OpenAILlm(), embedder=OpenAIEmbedder(), db=db)
app.add("https://en.wikipedia.org/wiki/Elon_Musk")
app.add("https://www.forbes.com/profile/elon-musk")
app.add("https://www.britannica.com/biography/Elon-Musk")
```
3. You're all set! Start querying using the following command:
```python
app.query("What is the net worth of Elon Musk?")
```

View File

@@ -4,15 +4,25 @@ title: 🤝 Connect with Us
We believe in building a vibrant and supportive community around embedchain. There are various channels through which you can connect with us, stay updated, and contribute to the ongoing discussions: We believe in building a vibrant and supportive community around embedchain. There are various channels through which you can connect with us, stay updated, and contribute to the ongoing discussions:
<CardGroup cols={3}>
* Slack: Our Slack workspace provides a platform for more structured discussions and channels dedicated to different topics. Feel free to jump in and start contributing. [Join Slack](https://join.slack.com/t/embedchain/shared_invite/zt-22uwz3c46-Zg7cIh5rOBteT_xe1jwLDw). <Card title="Twitter" icon="twitter" href="https://twitter.com/embedchain">
Follow us on Twitter
* Discord: Join our Discord server to engage in real-time conversations with the community members and the project maintainers. Its a great place to seek help and discuss anything related to the project. [Join Discord](https://discord.gg/CUU9FPhRNt). </Card>
<Card title="Slack" icon="slack" href="https://join.slack.com/t/embedchain/shared_invite/zt-22uwz3c46-Zg7cIh5rOBteT_xe1jwLDw" color="#4A154B">
* Twitter: Follow us on Twitter for the latest news, announcements, and highlights from our community. Its also a quick way to reach out to us. [Follow @embedchain](https://twitter.com/embedchain). Join our slack community
</Card>
* LinkedIn: Connect with us on LinkedIn to stay updated on official announcements, job openings, and professional networking opportunities within our community. [Follow Our Page](https://www.linkedin.com/company/embedchain/). <Card title="Discord" icon="discord" href="https://discord.gg/6PzXDgEjG5" color="#7289DA">
Join our discord community
* Newsletter: Subscribe to our newsletter for a curated list of project updates, community contributions, and upcoming events. Its a compact way to stay in the loop with whats happening in our community. [Subscribe Now](https://embedchain.substack.com/). </Card>
<Card title="LinkedIn" icon="linkedin" href="https://www.linkedin.com/company/embedchain/">
Connect with us on LinkedIn
</Card>
<Card title="Schedule a call" icon="calendar" href="https://cal.com/taranjeetio/ec">
Schedule a call with Embedchain founder
</Card>
<Card title="Newsletter" icon="message" href="https://embedchain.substack.com/">
Subscribe to our newsletter
</Card>
</CardGroup>
We look forward to connecting with you and seeing how we can create amazing things together! We look forward to connecting with you and seeing how we can create amazing things together!

View File

@@ -27,14 +27,14 @@ from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx' os.environ['OPENAI_API_KEY'] = 'xxx'
# load embedding model configuration from openai.yaml file # load embedding model configuration from config.yaml file
app = App.from_config(yaml_path="openai.yaml") app = App.from_config(yaml_path="config.yaml")
app.add("https://en.wikipedia.org/wiki/OpenAI") app.add("https://en.wikipedia.org/wiki/OpenAI")
app.query("What is OpenAI?") app.query("What is OpenAI?")
``` ```
```yaml openai.yaml ```yaml config.yaml
embedder: embedder:
provider: openai provider: openai
config: config:
@@ -52,11 +52,11 @@ GPT4All supports generating high quality embeddings of arbitrary length document
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load embedding model configuration from gpt4all.yaml file # load embedding model configuration from config.yaml file
app = App.from_config(yaml_path="gpt4all.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml gpt4all.yaml ```yaml config.yaml
llm: llm:
provider: gpt4all provider: gpt4all
model: 'orca-mini-3b.ggmlv3.q4_0.bin' model: 'orca-mini-3b.ggmlv3.q4_0.bin'
@@ -83,11 +83,11 @@ Hugging Face supports generating embeddings of arbitrary length documents of tex
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load embedding model configuration from huggingface.yaml file # load embedding model configuration from config.yaml file
app = App.from_config(yaml_path="huggingface.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml huggingface.yaml ```yaml config.yaml
llm: llm:
provider: huggingface provider: huggingface
model: 'google/flan-t5-xxl' model: 'google/flan-t5-xxl'
@@ -114,11 +114,11 @@ Embedchain supports Google's VertexAI embeddings model through a simple interfac
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load embedding model configuration from vertexai.yaml file # load embedding model configuration from config.yaml file
app = App.from_config(yaml_path="vertexai.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml vertexai.yaml ```yaml config.yaml
llm: llm:
provider: vertexai provider: vertexai
model: 'chat-bison' model: 'chat-bison'

View File

@@ -35,7 +35,7 @@ app.add("https://en.wikipedia.org/wiki/OpenAI")
app.query("What is OpenAI?") app.query("What is OpenAI?")
``` ```
If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/embedchain/yaml/chroma.yaml) file. If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/configs/chroma.yaml) file.
<CodeGroup> <CodeGroup>
@@ -45,11 +45,11 @@ from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx' os.environ['OPENAI_API_KEY'] = 'xxx'
# load llm configuration from openai.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="openai.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml openai.yaml ```yaml config.yaml
llm: llm:
provider: openai provider: openai
model: 'gpt-3.5-turbo' model: 'gpt-3.5-turbo'
@@ -79,11 +79,11 @@ from embedchain import App
os.environ["ANTHROPIC_API_KEY"] = "xxx" os.environ["ANTHROPIC_API_KEY"] = "xxx"
# load llm configuration from anthropic.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="anthropic.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml anthropic.yaml ```yaml config.yaml
llm: llm:
provider: anthropic provider: anthropic
model: 'claude-instant-1' model: 'claude-instant-1'
@@ -96,15 +96,14 @@ llm:
</CodeGroup> </CodeGroup>
<br />
<Tip>
You may also have to set the `OPENAI_API_KEY` if you use the OpenAI's embedding model.
</Tip>
## Cohere ## Cohere
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[cohere]'
```
Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys). Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys).
Once you have the API key, you are all set to use it with Embedchain. Once you have the API key, you are all set to use it with Embedchain.
@@ -117,11 +116,11 @@ from embedchain import App
os.environ["COHERE_API_KEY"] = "xxx" os.environ["COHERE_API_KEY"] = "xxx"
# load llm configuration from cohere.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="cohere.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml cohere.yaml ```yaml config.yaml
llm: llm:
provider: cohere provider: cohere
model: large model: large
@@ -135,6 +134,12 @@ llm:
## GPT4ALL ## GPT4ALL
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[opensource]'
```
GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code: GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code:
<CodeGroup> <CodeGroup>
@@ -142,11 +147,11 @@ GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or inte
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load llm configuration from gpt4all.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="gpt4all.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml gpt4all.yaml ```yaml config.yaml
llm: llm:
provider: gpt4all provider: gpt4all
model: 'orca-mini-3b.ggmlv3.q4_0.bin' model: 'orca-mini-3b.ggmlv3.q4_0.bin'
@@ -177,11 +182,11 @@ import os
from embedchain import App from embedchain import App
os.environ["JINACHAT_API_KEY"] = "xxx" os.environ["JINACHAT_API_KEY"] = "xxx"
# load llm configuration from jina.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="jina.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml jina.yaml ```yaml config.yaml
llm: llm:
provider: jina provider: jina
config: config:
@@ -195,6 +200,13 @@ llm:
## Hugging Face ## Hugging Face
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[huggingface_hub]'
```
First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens). First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens).
Once you have the token, load the app using the config yaml file: Once you have the token, load the app using the config yaml file:
@@ -207,11 +219,11 @@ from embedchain import App
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx" os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"
# load llm configuration from huggingface.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="huggingface.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml huggingface.yaml ```yaml config.yaml
llm: llm:
provider: huggingface provider: huggingface
model: 'google/flan-t5-xxl' model: 'google/flan-t5-xxl'
@@ -237,11 +249,11 @@ from embedchain import App
os.environ["REPLICATE_API_TOKEN"] = "xxx" os.environ["REPLICATE_API_TOKEN"] = "xxx"
# load llm configuration from llama2.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="llama2.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml llama2.yaml ```yaml config.yaml
llm: llm:
provider: llama2 provider: llama2
model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5' model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5'
@@ -262,11 +274,11 @@ Setup Google Cloud Platform application credentials by following the instruction
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load llm configuration from vertexai.yaml file # load llm configuration from config.yaml file
app = App.from_config(yaml_path="vertexai.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml vertexai.yaml ```yaml config.yaml
llm: llm:
provider: vertexai provider: vertexai
model: 'chat-bison' model: 'chat-bison'

View File

@@ -25,10 +25,10 @@ Utilizing a vector database alongside Embedchain is a seamless process. All you
from embedchain import App from embedchain import App
# load chroma configuration from yaml file # load chroma configuration from yaml file
app = App.from_config(yaml_path="chroma-config-1.yaml") app = App.from_config(yaml_path="config1.yaml")
``` ```
```yaml chroma-config-1.yaml ```yaml config1.yaml
vectordb: vectordb:
provider: chroma provider: chroma
config: config:
@@ -37,7 +37,7 @@ vectordb:
allow_reset: true allow_reset: true
``` ```
```yaml chroma-config-2.yaml ```yaml config2.yaml
vectordb: vectordb:
provider: chroma provider: chroma
config: config:
@@ -52,16 +52,22 @@ vectordb:
## Elasticsearch ## Elasticsearch
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[elasticsearch]'
```
<CodeGroup> <CodeGroup>
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load elasticsearch configuration from yaml file # load elasticsearch configuration from yaml file
app = App.from_config(yaml_path="elasticsearch.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml elasticsearch.yaml ```yaml config.yaml
vectordb: vectordb:
provider: elasticsearch provider: elasticsearch
config: config:
@@ -74,16 +80,22 @@ vectordb:
## OpenSearch ## OpenSearch
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[opensearch]'
```
<CodeGroup> <CodeGroup>
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load opensearch configuration from yaml file # load opensearch configuration from yaml file
app = App.from_config(yaml_path="opensearch.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml opensearch.yaml ```yaml config.yaml
vectordb: vectordb:
provider: opensearch provider: opensearch
config: config:
@@ -101,16 +113,22 @@ vectordb:
## Zilliz ## Zilliz
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[milvus]'
```
<CodeGroup> <CodeGroup>
```python main.py ```python main.py
from embedchain import App from embedchain import App
# load zilliz configuration from yaml file # load zilliz configuration from yaml file
app = App.from_config(yaml_path="zilliz.yaml") app = App.from_config(yaml_path="config.yaml")
``` ```
```yaml zilliz.yaml ```yaml config.yaml
vectordb: vectordb:
provider: zilliz provider: zilliz
config: config:

View File

@@ -35,12 +35,9 @@ embedchain is built on the following stack:
## Team ## Team
### Author ### Authors
- Taranjeet Singh ([@taranjeetio](https://twitter.com/taranjeetio)) - Taranjeet Singh ([@taranjeetio](https://twitter.com/taranjeetio))
### Maintainer
- Deshraj Yadav ([@deshrajdry](https://twitter.com/taranjeetio)) - Deshraj Yadav ([@deshrajdry](https://twitter.com/taranjeetio))
### Citation ### Citation
@@ -49,8 +46,8 @@ If you utilize this repository, please consider citing it with:
``` ```
@misc{embedchain, @misc{embedchain,
author = {Taranjeet Singh}, author = {Taranjeet Singh, Deshraj Yadav},
title = {Embechain: Framework to easily create LLM powered bots over any dataset}, title = {Embechain: Data platform for LLMs - Load, index, retrieve and sync any unstructured data},
year = {2023}, year = {2023},
publisher = {GitHub}, publisher = {GitHub},
journal = {GitHub repository}, journal = {GitHub repository},

View File

@@ -1,6 +1,6 @@
--- ---
title: 📚 Introduction title: 📚 Introduction
description: '📝 Embedchain is a framework to easily create LLM powered apps on your data.' description: '📝 Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data'
--- ---
## 🤔 What is Embedchain? ## 🤔 What is Embedchain?
@@ -27,9 +27,6 @@ naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American e
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?") naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality. # Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
# Ask questions with specific context
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", where={'chapter': 'philosophy'})
``` ```
## 🚀 How it works? ## 🚀 How it works?

View File

@@ -3,6 +3,8 @@ title: '🚀 Quickstart'
description: '💡 Start building LLM powered apps under 30 seconds' description: '💡 Start building LLM powered apps under 30 seconds'
--- ---
Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data.
Install embedchain python package: Install embedchain python package:
```bash ```bash
@@ -20,9 +22,11 @@ app = App()
</Step> </Step>
<Step title="🗃️ Add data sources"> <Step title="🗃️ Add data sources">
```python ```python
# Embed online resources # Add different data sources
elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk") elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk")
elon_bot.add("https://www.forbes.com/profile/elon-musk") elon_bot.add("https://www.forbes.com/profile/elon-musk")
# You can also add local data sources such as pdf, csv files etc.
# elon_bot.add("/path/to/file.pdf")
``` ```
</Step> </Step>
<Step title="💬 Query or chat on your data and get answers"> <Step title="💬 Query or chat on your data and get answers">
@@ -42,9 +46,11 @@ from embedchain import App
os.environ["OPENAI_API_KEY"] = "xxx" os.environ["OPENAI_API_KEY"] = "xxx"
elon_bot = App() elon_bot = App()
# Embed online resources # Add different data sources
elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk") elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk")
elon_bot.add("https://www.forbes.com/profile/elon-musk") elon_bot.add("https://www.forbes.com/profile/elon-musk")
# You can also add local data sources such as pdf, csv files etc.
# elon_bot.add("/path/to/file.pdf")
response = elon_bot.query("What is the net worth of Elon Musk today?") response = elon_bot.query("What is the net worth of Elon Musk today?")
print(response) print(response)

View File

@@ -28,7 +28,7 @@ export default function Home() {
Welcome to Embedchain Playground Welcome to Embedchain Playground
</h1> </h1>
<p className="mb-6 text-lg font-normal text-gray-500 lg:text-xl"> <p className="mb-6 text-lg font-normal text-gray-500 lg:text-xl">
embedchain is a framework to easily create LLM powered bots over any Embedchain is a Data Platform for LLMs - Load, index, retrieve, and sync any unstructured data
dataset dataset
</p> </p>
</div> </div>

View File

@@ -1,7 +1,7 @@
[tool.poetry] [tool.poetry]
name = "embedchain" name = "embedchain"
version = "0.0.70" version = "0.0.70"
description = "Embedchain is a framework to easily create LLM powered apps over any dataset" description = "Data platform for LLMs - Load, index, retrieve and sync any unstructured data"
authors = ["Taranjeet Singh, Deshraj Yadav"] authors = ["Taranjeet Singh, Deshraj Yadav"]
license = "Apache License" license = "Apache License"
readme = "README.md" readme = "README.md"