[Refactor] Converge Pipeline and App classes (#1021)

Co-authored-by: Deven Patel <deven298@yahoo.com>
This commit is contained in:
Deven Patel
2023-12-29 16:52:41 +05:30
committed by GitHub
parent c0aafd38c9
commit a926bcc640
91 changed files with 646 additions and 875 deletions

View File

@@ -5,7 +5,7 @@ title: "🐝 Beehiiv"
To add any Beehiiv data sources to your app, just add the base url as the source and set the data_type to `beehiiv`.
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -5,7 +5,7 @@ title: '📊 CSV'
To add any csv file, use the data_type as `csv`. `csv` allows remote urls and conventional file paths. Headers are included for each line, so if you have an `age` column, `18` will be added as `age: 18`. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()
app.add('https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv', data_type="csv")

View File

@@ -5,7 +5,7 @@ title: '⚙️ Custom'
When we say "custom", we mean that you can customize the loader and chunker to your needs. This is done by passing a custom loader and chunker to the `add` method.
```python
from embedchain import Pipeline as App
from embedchain import App
import your_loader
import your_chunker
@@ -27,7 +27,7 @@ app.add("source", data_type="custom", loader=loader, chunker=chunker)
Example:
```python
from embedchain import Pipeline as App
from embedchain import App
from embedchain.loaders.github import GithubLoader
app = App()

View File

@@ -35,7 +35,7 @@ Default behavior is to create a persistent vector db in the directory **./db**.
Create a local index:
```python
from embedchain import Pipeline as App
from embedchain import App
naval_chat_bot = App()
naval_chat_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
@@ -45,7 +45,7 @@ naval_chat_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Alma
You can reuse the local index with the same code, but without adding new documents:
```python
from embedchain import Pipeline as App
from embedchain import App
naval_chat_bot = App()
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
@@ -56,7 +56,7 @@ print(naval_chat_bot.query("What unique capacity does Naval argue humans possess
You can reset the app by simply calling the `reset` method. This will delete the vector database and all other app related files.
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()
app.add("https://www.youtube.com/watch?v=3qHkcs3kG44")

View File

@@ -8,7 +8,7 @@ To use an entire directory as data source, just add `data_type` as `directory` a
```python
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["OPENAI_API_KEY"] = "sk-xxx"
@@ -23,7 +23,7 @@ print(response)
```python
import os
from embedchain import Pipeline as App
from embedchain import App
from embedchain.loaders.directory_loader import DirectoryLoader
os.environ["OPENAI_API_KEY"] = "sk-xxx"

View File

@@ -12,7 +12,7 @@ To add any Discord channel messages to your app, just add the `channel_id` as th
```python
import os
from embedchain import Pipeline as App
from embedchain import App
# add your discord "BOT" token
os.environ["DISCORD_TOKEN"] = "xxx"

View File

@@ -5,7 +5,7 @@ title: '📚 Code documentation'
To add any code documentation website as a loader, use the data_type as `docs_site`. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()
app.add("https://docs.embedchain.ai/", data_type="docs_site")

View File

@@ -7,7 +7,7 @@ title: '📄 Docx file'
To add any doc/docx file, use the data_type as `docx`. `docx` allows remote urls and conventional file paths. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()
app.add('https://example.com/content/intro.docx', data_type="docx")

View File

@@ -24,7 +24,7 @@ To use this you need to save `credentials.json` in the directory from where you
12. Put the `.json` file in your current directory and rename it to `credentials.json`
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -21,7 +21,7 @@ If you would like to add other data structures (e.g. list, dict etc.), convert i
<CodeGroup>
```python python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -5,7 +5,7 @@ title: '📝 Mdx file'
To add any `.mdx` file to your app, use the data_type (first argument to `.add()` method) as `mdx`. Note that this supports support mdx file present on machine, so this should be a file path. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()
app.add('path/to/file.mdx', data_type='mdx')

View File

@@ -8,7 +8,7 @@ To load a notion page, use the data_type as `notion`. Since it is hard to automa
The next argument must **end** with the `notion page id`. The id is a 32-character string. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -5,7 +5,7 @@ title: 🙌 OpenAPI
To add any OpenAPI spec yaml file (currently the json file will be detected as JSON data type), use the data_type as 'openapi'. 'openapi' allows remote urls and conventional file paths.
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -5,7 +5,7 @@ title: '📰 PDF file'
To add any pdf file, use the data_type as `pdf_file`. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -5,7 +5,7 @@ title: '❓💬 Queston and answer pair'
QnA pair is a local data type. To supply your own QnA pair, use the data_type as `qna_pair` and enter a tuple. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -5,7 +5,7 @@ title: '🗺️ Sitemap'
Add all web pages from an xml-sitemap. Filters non-text files. Use the data_type as `sitemap`. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -16,7 +16,7 @@ This will automatically retrieve data from the workspace associated with the use
```python
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["SLACK_USER_TOKEN"] = "xoxp-xxx"
app = App()

View File

@@ -5,7 +5,7 @@ title: "📝 Substack"
To add any Substack data sources to your app, just add the main base url as the source and set the data_type to `substack`.
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -7,7 +7,7 @@ title: '📝 Text'
Text is a local data type. To supply your own text, use the data_type as `text` and enter a string. The text is not processed, this can be very versatile. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -5,7 +5,7 @@ title: '🌐 Web page'
To add any web page, use the data_type as `web_page`. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -7,7 +7,7 @@ title: '🧾 XML file'
To add any xml file, use the data_type as `xml`. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()

View File

@@ -13,7 +13,7 @@ pip install -u "embedchain[youtube]"
</Note>
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()
app.add("@channel_name", data_type="youtube_channel")

View File

@@ -5,7 +5,7 @@ title: '📺 Youtube'
To add any youtube video to your app, use the data_type as `youtube_video`. Eg:
```python
from embedchain import Pipeline as App
from embedchain import App
app = App()
app.add('a_valid_youtube_url_here', data_type='youtube_video')

View File

@@ -25,7 +25,7 @@ Once you have obtained the key, you can use it like this:
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
@@ -52,7 +52,7 @@ To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` enviro
<CodeGroup>
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["GOOGLE_API_KEY"] = "xxx"
@@ -81,7 +81,7 @@ To use Azure OpenAI embedding model, you have to set some of the azure openai re
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
@@ -119,7 +119,7 @@ GPT4All supports generating high quality embeddings of arbitrary length document
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load embedding model configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
@@ -148,7 +148,7 @@ Hugging Face supports generating embeddings of arbitrary length documents of tex
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load embedding model configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
@@ -179,7 +179,7 @@ Embedchain supports Google's VertexAI embeddings model through a simple interfac
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load embedding model configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

View File

@@ -29,7 +29,7 @@ Once you have obtained the key, you can use it like this:
```python
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
@@ -44,7 +44,7 @@ If you are looking to configure the different parameters of the LLM, you can do
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ['OPENAI_API_KEY'] = 'xxx'
@@ -71,7 +71,7 @@ Examples:
<Accordion title="Using Pydantic Models">
```python
import os
from embedchain import Pipeline as App
from embedchain import App
from embedchain.llm.openai import OpenAILlm
import requests
from pydantic import BaseModel, Field, ValidationError, field_validator
@@ -123,7 +123,7 @@ print(result)
<Accordion title="Using OpenAI JSON schema">
```python
import os
from embedchain import Pipeline as App
from embedchain import App
from embedchain.llm.openai import OpenAILlm
import requests
from pydantic import BaseModel, Field, ValidationError, field_validator
@@ -158,7 +158,7 @@ print(result)
<Accordion title="Using actual python functions">
```python
import os
from embedchain import Pipeline as App
from embedchain import App
from embedchain.llm.openai import OpenAILlm
import requests
from pydantic import BaseModel, Field, ValidationError, field_validator
@@ -192,7 +192,7 @@ To use Google AI model, you have to set the `GOOGLE_API_KEY` environment variabl
<CodeGroup>
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["GOOGLE_API_KEY"] = "xxx"
@@ -235,7 +235,7 @@ To use Azure OpenAI model, you have to set some of the azure openai related envi
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "https://xxx.openai.azure.com/"
@@ -274,7 +274,7 @@ To use anthropic's model, please set the `ANTHROPIC_API_KEY` which you find on t
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["ANTHROPIC_API_KEY"] = "xxx"
@@ -311,7 +311,7 @@ Once you have the API key, you are all set to use it with Embedchain.
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["COHERE_API_KEY"] = "xxx"
@@ -347,7 +347,7 @@ Once you have the API key, you are all set to use it with Embedchain.
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["TOGETHER_API_KEY"] = "xxx"
@@ -375,7 +375,7 @@ Setup Ollama using https://github.com/jmorganca/ollama
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
@@ -406,7 +406,7 @@ GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or inte
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
@@ -438,7 +438,7 @@ Once you have the key, load the app using the config yaml file:
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["JINACHAT_API_KEY"] = "xxx"
# load llm configuration from config.yaml file
@@ -474,7 +474,7 @@ Once you have the token, load the app using the config yaml file:
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"
@@ -504,7 +504,7 @@ Once you have the token, load the app using the config yaml file:
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ["REPLICATE_API_TOKEN"] = "xxx"
@@ -531,7 +531,7 @@ Setup Google Cloud Platform application credentials by following the instruction
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

View File

@@ -22,7 +22,7 @@ Utilizing a vector database alongside Embedchain is a seamless process. All you
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load chroma configuration from yaml file
app = App.from_config(config_path="config1.yaml")
@@ -67,7 +67,7 @@ You can authorize the connection to Elasticsearch by providing either `basic_aut
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load elasticsearch configuration from yaml file
app = App.from_config(config_path="config.yaml")
@@ -97,7 +97,7 @@ pip install --upgrade 'embedchain[opensearch]'
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load opensearch configuration from yaml file
app = App.from_config(config_path="config.yaml")
@@ -133,7 +133,7 @@ Set the Zilliz environment variables `ZILLIZ_CLOUD_URI` and `ZILLIZ_CLOUD_TOKEN`
```python main.py
import os
from embedchain import Pipeline as App
from embedchain import App
os.environ['ZILLIZ_CLOUD_URI'] = 'https://xxx.zillizcloud.com'
os.environ['ZILLIZ_CLOUD_TOKEN'] = 'xxx'
@@ -172,7 +172,7 @@ In order to use Pinecone as vector database, set the environment variables `PINE
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load pinecone configuration from yaml file
app = App.from_config(config_path="config.yaml")
@@ -195,7 +195,7 @@ In order to use Qdrant as a vector database, set the environment variables `QDRA
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load qdrant configuration from yaml file
app = App.from_config(config_path="config.yaml")
@@ -215,7 +215,7 @@ In order to use Weaviate as a vector database, set the environment variables `WE
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
from embedchain import App
# load weaviate configuration from yaml file
app = App.from_config(config_path="config.yaml")