docs: setup docs for embedchain (#287)

This commit is contained in:
Deshraj Yadav
2023-07-16 16:33:30 -07:00
committed by GitHub
parent 05a4eef6ae
commit c595003481
21 changed files with 914 additions and 619 deletions

View File

@@ -0,0 +1,138 @@
---
title: '📱 App types'
---
Creating a chatbot involves 3 steps:
- ⚙️ Import the App instance
- 🗃️ Add Dataset
- 💬 Query or Chat on the dataset and get answers (Interface Types)
## App Types
We have three types of App.
### App
```python
from embedchain import App
naval_chat_bot = App()
```
- `App` uses OpenAI's model, so these are paid models. 💸 You will be charged for embedding model usage and LLM usage.
- `App` uses OpenAI's embedding model to create embeddings for chunks and ChatGPT API as LLM to get answer given the relevant docs. Make sure that you have an OpenAI account and an API key. If you have don't have an API key, you can create one by visiting [this link](https://platform.openai.com/account/api-keys).
- Once you have the API key, set it in an environment variable called `OPENAI_API_KEY`
```python
import os
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
```
### OpenSourceApp
```python
from embedchain import OpenSourceApp
naval_chat_bot = OpenSourceApp()
```
- `OpenSourceApp` uses open source embedding and LLM model. It uses `all-MiniLM-L6-v2` from Sentence Transformers library as the embedding model and `gpt4all` as the LLM.
- Here there is no need to setup any api keys. You just need to install embedchain package and these will get automatically installed. 📦
- Once you have imported and instantiated the app, every functionality from here onwards is the same for either type of app. 📚
### PersonApp
```python
from embedchain import PersonApp
naval_chat_bot = PersonApp("name_of_person_or_character") #Like "Yoda"
```
- `PersonApp` uses OpenAI's model, so these are paid models. 💸 You will be charged for embedding model usage and LLM usage.
- `PersonApp` uses OpenAI's embedding model to create embeddings for chunks and ChatGPT API as LLM to get answer given the relevant docs. Make sure that you have an OpenAI account and an API key. If you have don't have an API key, you can create one by visiting [this link](https://platform.openai.com/account/api-keys).
- Once you have the API key, set it in an environment variable called `OPENAI_API_KEY`
```python
import os
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
```
## Add Dataset
- This step assumes that you have already created an `app` instance by either using `App` or `OpenSourceApp`. We are calling our app instance as `naval_chat_bot` 🤖
- Now use `.add()` function to add any dataset.
```python
# naval_chat_bot = App() or
# naval_chat_bot = OpenSourceApp()
# Embed Online Resources
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/agi")
# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
```
- If there is any other app instance in your script or app, you can change the import as
```python
from embedchain import App as EmbedChainApp
from embedchain import OpenSourceApp as EmbedChainOSApp
from embedchain import PersonApp as EmbedChainPersonApp
# or
from embedchain import App as ECApp
from embedchain import OpenSourceApp as ECOSApp
from embedchain import PersonApp as ECPApp
```
## Interface Types
### Query Interface
- This interface is like a question answering bot. It takes a question and gets the answer. It does not maintain context about the previous chats.❓
- To use this, call `.query()` function to get the answer for any query.
```python
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
```
### Chat Interface
- This interface is chat interface where it remembers previous conversation. Right now it remembers 5 conversation by default. 💬
- To use this, call `.chat` function to get the answer for any query.
```python
print(naval_chat_bot.chat("How to be happy in life?"))
# answer: The most important trick to being happy is to realize happiness is a skill you develop and a choice you make. You choose to be happy, and then you work at it. It's just like building muscles or succeeding at your job. It's about recognizing the abundance and gifts around you at all times.
print(naval_chat_bot.chat("who is naval ravikant?"))
# answer: Naval Ravikant is an Indian-American entrepreneur and investor.
print(naval_chat_bot.chat("what did the author say about happiness?"))
# answer: The author, Naval Ravikant, believes that happiness is a choice you make and a skill you develop. He compares the mind to the body, stating that just as the body can be molded and changed, so can the mind. He emphasizes the importance of being present in the moment and not getting caught up in regrets of the past or worries about the future. By being present and grateful for where you are, you can experience true happiness.
```
### Stream Response
- You can add config to your query method to stream responses like ChatGPT does. You would require a downstream handler to render the chunk in your desirable format. Supports both OpenAI model and OpenSourceApp. 📊
- To use this, instantiate a `QueryConfig` or `ChatConfig` object with `stream=True`. Then pass it to the `.chat()` or `.query()` method. The following example iterates through the chunks and prints them as they appear.
```python
app = App()
query_config = QueryConfig(stream = True)
resp = app.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config)
for chunk in resp:
print(chunk, end="", flush=True)
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
```

View File

@@ -0,0 +1,114 @@
---
title: '⚙️ Custom configurations'
---
Embedchain is made to work out of the box. However, for advanced users we're also offering configuration options. All of these configuration options are optional and have sane defaults.
## Examples
### Custom embedding function
Here's the readme example with configuration options.
```python
import os
from embedchain import App
from embedchain.config import InitConfig, AddConfig, QueryConfig
from chromadb.utils import embedding_functions
# Example: use your own embedding function
config = InitConfig(ef=embedding_functions.OpenAIEmbeddingFunction(
api_key=os.getenv("OPENAI_API_KEY"),
organization_id=os.getenv("OPENAI_ORGANIZATION"),
model_name="text-embedding-ada-002"
))
naval_chat_bot = App(config)
# Example: define your own chunker config for `youtube_video`
youtube_add_config = {
"chunker": {
"chunk_size": 1000,
"chunk_overlap": 100,
"length_function": len,
}
}
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44", AddConfig(**youtube_add_config))
add_config = AddConfig()
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf", add_config)
naval_chat_bot.add("web_page", "https://nav.al/feedback", add_config)
naval_chat_bot.add("web_page", "https://nav.al/agi", add_config)
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."), add_config)
query_config = QueryConfig() # Currently no options
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))
```
### Custom prompt template
Here's the example of using custom prompt template with `.query`
```python
from embedchain.config import QueryConfig
from embedchain.embedchain import App
from string import Template
import wikipedia
einstein_chat_bot = App()
# Embed Wikipedia page
page = wikipedia.page("Albert Einstein")
einstein_chat_bot.add("text", page.content)
# Example: use your own custom template with `$context` and `$query`
einstein_chat_template = Template("""
You are Albert Einstein, a German-born theoretical physicist,
widely ranked among the greatest and most influential scientists of all time.
Use the following information about Albert Einstein to respond to
the human's query acting as Albert Einstein.
Context: $context
Keep the response brief. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Human: $query
Albert Einstein:""")
query_config = QueryConfig(einstein_chat_template)
queries = [
"Where did you complete your studies?",
"Why did you win nobel prize?",
"Why did you divorce your first wife?",
]
for query in queries:
response = einstein_chat_bot.query(query, query_config)
print("Query: ", query)
print("Response: ", response)
# Output
# Query: Where did you complete your studies?
# Response: I completed my secondary education at the Argovian cantonal school in Aarau, Switzerland.
# Query: Why did you win nobel prize?
# Response: I won the Nobel Prize in Physics in 1921 for my services to Theoretical Physics, particularly for my discovery of the law of the photoelectric effect.
# Query: Why did you divorce your first wife?
# Response: We divorced due to living apart for five years.
```
## Other methods
### Reset
Resets the database and deletes all embeddings. Irreversible. Requires reinitialization afterwards.
```python
app.reset()
```
### Count
Counts the number of embeddings (chunks) in the database.
```python
print(app.count())
# returns: 481
```

View File

@@ -0,0 +1,84 @@
---
title: '📋 Supported data formats'
---
Embedchain supports following data formats:
### Youtube video
To add any youtube video to your app, use the data_type (first argument to `.add()` method) as `youtube_video`. Eg:
```python
app.add('youtube_video', 'a_valid_youtube_url_here')
```
### PDF file
To add any pdf file, use the data_type as `pdf_file`. Eg:
```python
app.add('pdf_file', 'a_valid_url_where_pdf_file_can_be_accessed')
```
Note that we do not support password protected pdfs.
### Web page
To add any web page, use the data_type as `web_page`. Eg:
```python
app.add('web_page', 'a_valid_web_page_url')
```
### Doc file
To add any doc/docx file, use the data_type as `docx`. Eg:
```python
app.add('docx', 'a_local_docx_file_path')
```
### Text
To supply your own text, use the data_type as `text` and enter a string. The text is not processed, this can be very versatile. Eg:
```python
app.add_local('text', 'Seek wealth, not money or status. Wealth is having assets that earn while you sleep. Money is how we transfer time and wealth. Status is your place in the social hierarchy.')
```
Note: This is not used in the examples because in most cases you will supply a whole paragraph or file, which did not fit.
### QnA pair
To supply your own QnA pair, use the data_type as `qna_pair` and enter a tuple. Eg:
```python
app.add_local('qna_pair', ("Question", "Answer"))
```
## Reusing a vector database
Default behavior is to create a persistent vector DB in the directory **./db**. You can split your application into two Python scripts: one to create a local vector DB and the other to reuse this local persistent vector DB. This is useful when you want to index hundreds of documents and separately implement a chat interface.
Create a local index:
```python
from embedchain import App
naval_chat_bot = App()
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
```
You can reuse the local index with the same code, but without adding new documents:
```python
from embedchain import App
naval_chat_bot = App()
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
```
### More formats (coming soon!)
- If you want to add any other format, please create an [issue](https://github.com/embedchain/embedchain/issues) and we will add it to the list of supported formats.

View File

@@ -0,0 +1,58 @@
---
title: '🔍 Query configurations'
---
## InitConfig
| option | description | type | default |
|-----------|-----------------------|---------------------------------|------------------------|
| log_level | log level | string | WARNING |
| ef | embedding function | chromadb.utils.embedding_functions | \{text-embedding-ada-002\} |
| db | vector database (experimental) | BaseVectorDB | ChromaDB |
## AddConfig
|option|description|type|default|
|---|---|---|---|
|chunker|chunker config|ChunkerConfig|Default values for chunker depends on the `data_type`. Please refer [ChunkerConfig](#chunker-config)|
|loader|loader config|LoaderConfig|None|
### ChunkerConfig
|option|description|type|default|
|---|---|---|---|
|chunk_size|Maximum size of chunks to return|int|Default value for various `data_type` mentioned below|
|chunk_overlap|Overlap in characters between chunks|int|Default value for various `data_type` mentioned below|
|length_function|Function that measures the length of given chunks|typing.Callable|Default value for various `data_type` mentioned below|
Default values of chunker config parameters for different `data_type`:
|data_type|chunk_size|chunk_overlap|length_function|
|---|---|---|---|
|docx|1000|0|len|
|text|300|0|len|
|qna_pair|300|0|len|
|web_page|500|0|len|
|pdf_file|1000|0|len|
|youtube_video|2000|0|len|
### LoaderConfig
_coming soon_
## QueryConfig
|option|description|type|default|
|---|---|---|---|
|template|custom template for prompt|Template|Template("Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. \$context Query: \$query Helpful Answer:")|
|history|include conversation history from your client or database|any (recommendation: list[str])|None
|stream|control if response is streamed back to the user|bool|False|
## ChatConfig
All options for query and...
_coming soon_
History is handled automatically, the config option is not supported.

View File

@@ -0,0 +1,57 @@
---
title: '🎪 Community showcase'
---
Embedchain community has been super active in creating demos on top of Embedchain. On this page, we showcase all the apps, blogs, videos, and tutorials created by the community. ❤️
## Apps
### Open Source
- [Discord Bot for LLM chat](https://github.com/Reidond/discord_bots_playground/tree/c8b0c36541e4b393782ee506804c4b6962426dd6/python/chat-channel-bot) by Reidond
- [EmbedChain-Streamlit-Docker App](https://github.com/amjadraza/embedchain-streamlit-app) by amjadraza
- [Harry Potter Philosphers Stone Bot](https://github.com/vinayak-kempawad/Harry_Potter_Philosphers_Stone_Bot/) by Vinayak Kempawad, ([linkedin post](https://www.linkedin.com/feed/update/urn:li:activity:7080907532155686912/))
- [LLM bot trained on own messages](https://github.com/Harin329/harinBot) by Hao Wu
### Closed Source
- [Taobot.io](https://taobot.io) - chatbot & knowledgebase hybrid by [cachho](https://github.com/cachho)
## Templates
### Replit
- [Embedchain Chat Bot](https://replit.com/@taranjeet1/Embedchain-Chat-Bot) by taranjeetio
- [Embedchain Memory Chat Bot Template](https://replit.com/@taranjeetio/Embedchain-Memory-Chat-Bot-Template) by taranjeetio
## Posts
### Blogs
- [Customer Service LINE Bot](https://www.evanlin.com/langchain-embedchain/)
### LinkedIn
- [What is embedchain](https://www.linkedin.com/posts/activity-7079393104423698432-wRyi/) by Rithesh Sreenivasan
- [Building a chatbot with EmbedChain](https://www.linkedin.com/posts/activity-7078434598984060928-Zdso/) by Lior Sinclair
- [Making chatbot without vs with embedchain](https://www.linkedin.com/posts/kalyanksnlp_llms-chatbots-langchain-activity-7077453416221863936-7N1L/) by Kalyan KS
### Twitter
- [What is embedchain](https://twitter.com/AlphaSignalAI/status/1672668574450847745) by Lior
- [Building a chatbot with Embedchain](https://twitter.com/Saboo_Shubham_/status/1673537044419686401) by Shubham Saboo
## Videos
- [embedChain Create LLM powered bots over any dataset Python Demo Tesla Neurallink Chatbot Example](https://www.youtube.com/watch?v=bJqAn22a6Gc) by Rithesh Sreenivasan
- [Embedchain - NEW 🔥 Langchain BABY to build LLM Bots](https://www.youtube.com/watch?v=qj_GNQ06I8o) by 1littlecoder
- [EmbedChain -- NEW!: Build LLM-Powered Bots with Any Dataset](https://www.youtube.com/watch?v=XmaBezzGHu4) by DataInsightEdge
- [Chat With Your PDFs in less than 10 lines of code! EMBEDCHAIN tutorial](https://www.youtube.com/watch?v=1ugkcsAcw44) by Phani Reddy
- [How To Create A Custom Knowledge AI Powered Bot | Install + How To Use](https://www.youtube.com/watch?v=VfCrIiAst-c) by The Ai Solopreneur
- [Build Custom Chatbot in 6 min with this Framework [Beginner Friendly]](https://www.youtube.com/watch?v=-8HxOpaFySM) by Maya Akim
- [embedchain-streamlit-app](https://www.youtube.com/watch?v=3-9GVd-3v74) by Amjad Raza
## Mentions
### Github repos
- [awesome-ChatGPT-repositories](https://github.com/taishi-i/awesome-ChatGPT-repositories)

25
docs/advanced/testing.mdx Normal file
View File

@@ -0,0 +1,25 @@
---
title: '🧪 Testing'
---
Before you consume valueable tokens, you should make sure that the embedding you have done works and that it's receiving the correct document from the database.
For this you can use the `dry_run` method.
Following the example above, add this to your script:
```python
print(naval_chat_bot.dry_run('Can you tell me who Naval Ravikant is?'))
'''
Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
Q: Who is Naval Ravikant?
A: Naval Ravikant is an Indian-American entrepreneur and investor.
Query: Can you tell me who Naval Ravikant is?
Helpful Answer:
'''
```
_The embedding is confirmed to work as expected. It returns the right document, even if the question is asked slightly different. No prompt tokens have been consumed._
**The dry run will still consume tokens to embed your query, but it is only ~1/15 of the prompt.**