feat: add new custom app (#313)

This commit is contained in:
cachho
2023-07-18 21:24:23 +02:00
committed by GitHub
parent 96143ac496
commit adb7206639
24 changed files with 455 additions and 147 deletions

View File

@@ -0,0 +1,25 @@
---
title: ' Adding Data'
---
## Add Dataset
- This step assumes that you have already created an `app` instance by either using `App`, `OpenSourceApp` or `CustomApp`. We are calling our app instance as `naval_chat_bot` 🤖
- Now use `.add()` function to add any dataset.
```python
# naval_chat_bot = App() or
# naval_chat_bot = OpenSourceApp()
# Embed Online Resources
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/agi")
# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
```
The possible formats to add data can be found on the [Supported Data Formats](/advanced/data_types) page.

View File

@@ -2,12 +2,6 @@
title: '📱 App types'
---
Creating a chatbot involves 3 steps:
- ⚙️ Import the App instance
- 🗃️ Add Dataset
- 💬 Query or Chat on the dataset and get answers (Interface Types)
## App Types
We have three types of App.
@@ -16,13 +10,12 @@ We have three types of App.
```python
from embedchain import App
naval_chat_bot = App()
app = App()
```
- `App` uses OpenAI's model, so these are paid models. 💸 You will be charged for embedding model usage and LLM usage.
- `App` uses OpenAI's embedding model to create embeddings for chunks and ChatGPT API as LLM to get answer given the relevant docs. Make sure that you have an OpenAI account and an API key. If you have don't have an API key, you can create one by visiting [this link](https://platform.openai.com/account/api-keys).
- `App` is opinionated. It uses the best embedding model and LLM on the market.
- Once you have the API key, set it in an environment variable called `OPENAI_API_KEY`
```python
@@ -34,12 +27,31 @@ os.environ["OPENAI_API_KEY"] = "sk-xxxx"
```python
from embedchain import OpenSourceApp
naval_chat_bot = OpenSourceApp()
app = OpenSourceApp()
```
- `OpenSourceApp` uses open source embedding and LLM model. It uses `all-MiniLM-L6-v2` from Sentence Transformers library as the embedding model and `gpt4all` as the LLM.
- Here there is no need to setup any api keys. You just need to install embedchain package and these will get automatically installed. 📦
- Once you have imported and instantiated the app, every functionality from here onwards is the same for either type of app. 📚
- `OpenSourceApp` is opinionated. It uses the best open source embedding model and LLM on the market.
### CustomApp
```python
from embedchain import CustomApp
from embedchain.config import CustomAppConfig
from embedchain.models import Providers, EmbeddingFunctions
config = CustomAppConfig(embedding_fn=EmbeddingFunctions.OPENAI, provider=Providers.OPENAI)
app = CustomApp()
```
- `CustomApp` is not opinionated.
- Configuration required. It's for advanced users who want to mix and match different embedding models and LLMs. Configuration required.
- while it's doing that, it's still providing abstractions through `Providers`.
- paid and free/open source providers included.
- Once you have imported and instantiated the app, every functionality from here onwards is the same for either type of app. 📚
### PersonApp
@@ -57,25 +69,7 @@ import os
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
```
## Add Dataset
- This step assumes that you have already created an `app` instance by either using `App` or `OpenSourceApp`. We are calling our app instance as `naval_chat_bot` 🤖
- Now use `.add()` function to add any dataset.
```python
# naval_chat_bot = App() or
# naval_chat_bot = OpenSourceApp()
# Embed Online Resources
naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/agi")
# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
```
#### Compatibility with other apps
- If there is any other app instance in your script or app, you can change the import as
@@ -90,49 +84,3 @@ from embedchain import App as ECApp
from embedchain import OpenSourceApp as ECOSApp
from embedchain import PersonApp as ECPApp
```
## Interface Types
### Query Interface
- This interface is like a question answering bot. It takes a question and gets the answer. It does not maintain context about the previous chats.❓
- To use this, call `.query()` function to get the answer for any query.
```python
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
```
### Chat Interface
- This interface is chat interface where it remembers previous conversation. Right now it remembers 5 conversation by default. 💬
- To use this, call `.chat` function to get the answer for any query.
```python
print(naval_chat_bot.chat("How to be happy in life?"))
# answer: The most important trick to being happy is to realize happiness is a skill you develop and a choice you make. You choose to be happy, and then you work at it. It's just like building muscles or succeeding at your job. It's about recognizing the abundance and gifts around you at all times.
print(naval_chat_bot.chat("who is naval ravikant?"))
# answer: Naval Ravikant is an Indian-American entrepreneur and investor.
print(naval_chat_bot.chat("what did the author say about happiness?"))
# answer: The author, Naval Ravikant, believes that happiness is a choice you make and a skill you develop. He compares the mind to the body, stating that just as the body can be molded and changed, so can the mind. He emphasizes the importance of being present in the moment and not getting caught up in regrets of the past or worries about the future. By being present and grateful for where you are, you can experience true happiness.
```
### Stream Response
- You can add config to your query method to stream responses like ChatGPT does. You would require a downstream handler to render the chunk in your desirable format. Supports both OpenAI model and OpenSourceApp. 📊
- To use this, instantiate a `QueryConfig` or `ChatConfig` object with `stream=True`. Then pass it to the `.chat()` or `.query()` method. The following example iterates through the chunks and prints them as they appear.
```python
app = App()
query_config = QueryConfig(stream = True)
resp = app.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config)
for chunk in resp:
print(chunk, end="", flush=True)
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
```

View File

@@ -6,7 +6,7 @@ Embedchain is made to work out of the box. However, for advanced users we're als
## Examples
### Custom embedding function
### General
Here's the readme example with configuration options.
@@ -16,13 +16,8 @@ from embedchain import App
from embedchain.config import AppConfig, AddConfig, QueryConfig, ChunkerConfig
from chromadb.utils import embedding_functions
# Example: use your own embedding function
# Warning: We are currenty reworking the concept of custom apps, this might not be working.
config = AppConfig(ef=embedding_functions.OpenAIEmbeddingFunction(
api_key=os.getenv("OPENAI_API_KEY"),
organization_id=os.getenv("OPENAI_ORGANIZATION"),
model_name="text-embedding-ada-002"
))
# Example: set the log level for debugging
config = AppConfig(log_level="DEBUG")
naval_chat_bot = App(config)
# Example: define your own chunker config for `youtube_video`
@@ -36,7 +31,7 @@ naval_chat_bot.add("web_page", "https://nav.al/agi", add_config)
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."), add_config)
query_config = QueryConfig() # Currently no options
query_config = QueryConfig()
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config))
```
@@ -88,22 +83,3 @@ for query in queries:
# Query: Why did you divorce your first wife?
# Response: We divorced due to living apart for five years.
```
## Other methods
### Reset
Resets the database and deletes all embeddings. Irreversible. Requires reinitialization afterwards.
```python
app.reset()
```
### Count
Counts the number of embeddings (chunks) in the database.
```python
print(app.count())
# returns: 481
```

View File

@@ -0,0 +1,74 @@
---
title: '🤝 Interface types'
---
## Interface Types
The embedchain app exposes the following methods.
### Query Interface
- This interface is like a question answering bot. It takes a question and gets the answer. It does not maintain context about the previous chats.❓
- To use this, call `.query()` function to get the answer for any query.
```python
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
```
### Chat Interface
- This interface is chat interface where it remembers previous conversation. Right now it remembers 5 conversation by default. 💬
- To use this, call `.chat` function to get the answer for any query.
```python
print(naval_chat_bot.chat("How to be happy in life?"))
# answer: The most important trick to being happy is to realize happiness is a skill you develop and a choice you make. You choose to be happy, and then you work at it. It's just like building muscles or succeeding at your job. It's about recognizing the abundance and gifts around you at all times.
print(naval_chat_bot.chat("who is naval ravikant?"))
# answer: Naval Ravikant is an Indian-American entrepreneur and investor.
print(naval_chat_bot.chat("what did the author say about happiness?"))
# answer: The author, Naval Ravikant, believes that happiness is a choice you make and a skill you develop. He compares the mind to the body, stating that just as the body can be molded and changed, so can the mind. He emphasizes the importance of being present in the moment and not getting caught up in regrets of the past or worries about the future. By being present and grateful for where you are, you can experience true happiness.
```
### Stream Response
- You can add config to your query method to stream responses like ChatGPT does. You would require a downstream handler to render the chunk in your desirable format. Supports both OpenAI model and OpenSourceApp. 📊
- To use this, instantiate a `QueryConfig` or `ChatConfig` object with `stream=True`. Then pass it to the `.chat()` or `.query()` method. The following example iterates through the chunks and prints them as they appear.
```python
app = App()
query_config = QueryConfig(stream = True)
resp = app.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", query_config)
for chunk in resp:
print(chunk, end="", flush=True)
# answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
```
### Other Methods
#### Dry Run
Dry run has all the options that `query` has, it just doesn't send the prompt to the LLM, to save money. It's used for [testing](/advanced/testing#dry-run).
#### Reset
Resets the database and deletes all embeddings. Irreversible. Requires reinitialization afterwards.
```python
app.reset()
```
#### Count
Counts the number of embeddings (chunks) in the database.
```python
print(app.count())
# returns: 481
```

View File

@@ -4,11 +4,11 @@ title: '🔍 Query configurations'
## AppConfig
| option | description | type | default |
|-----------|-----------------------|---------------------------------|------------------------|
| log_level | log level | string | WARNING |
| ef | embedding function | chromadb.utils.embedding_functions | \{text-embedding-ada-002\} |
| db | vector database (experimental) | BaseVectorDB | ChromaDB |
| option | description | type | default |
|-------------|-----------------------|---------------------------------|------------------------|
| log_level | log level | string | WARNING |
| embedding_fn| embedding function | chromadb.utils.embedding_functions | \{text-embedding-ada-002\} |
| db | vector database (experimental) | BaseVectorDB | ChromaDB |
## AddConfig

View File

@@ -2,6 +2,10 @@
title: '🧪 Testing'
---
## Methods for testing
### Dry Run
Before you consume valueable tokens, you should make sure that the embedding you have done works and that it's receiving the correct document from the database.
For this you can use the `dry_run` method.

View File

@@ -32,7 +32,7 @@
},
{
"group": "Advanced",
"pages": ["advanced/app_types", "advanced/data_types", "advanced/query_configuration", "advanced/configuration", "advanced/testing", "advanced/showcase"]
"pages": ["advanced/app_types", "advanced/interface_types", "advanced/adding_data","advanced/data_types", "advanced/query_configuration", "advanced/configuration", "advanced/testing", "advanced/showcase"]
},
{
"group": "Contribution Guidelines",

View File

@@ -9,6 +9,12 @@ Install embedchain python package:
pip install embedchain
```
Creating a chatbot involves 3 steps:
- ⚙️ Import the App instance
- 🗃️ Add Dataset
- 💬 Query or Chat on the dataset and get answers (Interface Types)
Run your first bot in python using the following code. Make sure to set the `OPENAI_API_KEY` 🔑 environment variable in the code.
```python