[docs]: Revamp embedchain docs (#799)

This commit is contained in:
Deshraj Yadav
2023-10-13 15:38:15 -07:00
committed by GitHub
parent a86d7f52e9
commit 4a8c50f886
68 changed files with 1175 additions and 673 deletions

68
docs/get-started/faq.mdx Normal file
View File

@@ -0,0 +1,68 @@
---
title: ❓ FAQs
description: 'Collections of all the frequently asked questions'
---
#### How to use GPT-4 as the LLM model?
<CodeGroup>
```python main.py
import os
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
# load llm configuration from gpt4.yaml file
app = App.from_config(yaml_path="gpt4.yaml")
```
```yaml gpt4.yaml
llm:
provider: openai
model: 'gpt-4'
config:
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
```
</CodeGroup>
#### I don't have OpenAI credits. How can I use some open source model?
<CodeGroup>
```python main.py
import os
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
# load llm configuration from opensource.yaml file
app = App.from_config(yaml_path="opensource.yaml")
```
```yaml opensource.yaml
llm:
provider: gpt4all
model: 'orca-mini-3b.ggmlv3.q4_0.bin'
config:
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
embedder:
provider: gpt4all
config:
model: 'all-MiniLM-L6-v2'
```
</CodeGroup>
#### How to contact support?
If docs aren't sufficient, please feel free to reach out to us using one of the following methods:
<Snippet file="get-help.mdx" />

View File

@@ -0,0 +1,58 @@
---
title: 📚 Introduction
description: '📝 Embedchain is a framework to easily create LLM powered apps on your data.'
---
## 🤔 What is Embedchain?
Embedchain abstracts the entire process of loading data, chunking it, creating embeddings, and storing it in a vector database.
You can add data from different data sources using the `.add()` method. Then, simply use the `.query()` method to find answers from the added datasets.
If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you.
```python
from embedchain import App
naval_bot = App()
# Add online data
naval_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_bot.add("https://nav.al/feedback")
naval_bot.add("https://nav.al/agi")
naval_bot.add("The Meanings of Life", 'text', metadata={'chapter': 'philosphy'})
# Add local resources
naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
# Ask questions with specific context
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", where={'chapter': 'philosophy'})
```
## 🚀 How it works?
Embedchain abstracts out the following steps from you to easily create LLM powered apps:
1. Detect the data type and load data
2. Create meaningful chunks
3. Create embeddings for each chunk
4. Store chunks in a vector database
When a user asks a query, the following process happens to find the answer:
1. Create an embedding for the query
2. Find similar documents for the query from the vector database
3. Pass the similar documents as context to LLM to get the final answer
The process of loading the dataset and querying involves multiple steps, each with its own nuances:
- How should I chunk the data? What is a meaningful chunk size?
- How should I create embeddings for each chunk? Which embedding model should I use?
- How should I store the chunks in a vector database? Which vector database should I use?
- Should I store metadata along with the embeddings?
- How should I find similar documents for a query? Which ranking model should I use?
Embedchain takes care of all these nuances and provides a simple interface to create apps on any data.

View File

@@ -0,0 +1,52 @@
---
title: '🚀 Quickstart'
description: '💡 Start building LLM powered apps under 30 seconds'
---
Install embedchain python package:
```bash
pip install embedchain
```
Creating an app involves 3 steps:
<Steps>
<Step title="⚙️ Import app instance">
```python
from embedchain import App
app = App()
```
</Step>
<Step title="🗃️ Add data sources">
```python
# Embed online resources
elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk")
elon_bot.add("https://www.forbes.com/profile/elon-musk")
```
</Step>
<Step title="💬 Query or chat on your data and get answers">
```python
elon_bot.query("What is the net worth of Elon Musk today?")
# Answer: The net worth of Elon Musk today is $258.7 billion.
```
</Step>
</Steps>
Putting it together, you can run your first app using the following code. Make sure to set the `OPENAI_API_KEY` 🔑 environment variable in the code.
```python
import os
from embedchain import App
os.environ["OPENAI_API_KEY"] = "xxx"
elon_bot = App()
# Embed online resources
elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk")
elon_bot.add("https://www.forbes.com/profile/elon-musk")
response = elon_bot.query("What is the net worth of Elon Musk today?")
print(response)
# Answer: The net worth of Elon Musk today is $258.7 billion.
```