[Refactor] Converge Pipeline and App classes (#1021)

Co-authored-by: Deven Patel <deven298@yahoo.com>
2023-12-29 16:52:41 +05:30
parent c0aafd38c9
commit a926bcc640
91 changed files with 646 additions and 875 deletions
--- a/docs/components/data-sources/beehiiv.mdx
+++ b/docs/components/data-sources/beehiiv.mdx
@@ -5,7 +5,7 @@ title: "🐝 Beehiiv"
 To add any Beehiiv data sources to your app, just add the base url as the source and set the data_type to `beehiiv`.

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/csv.mdx
+++ b/docs/components/data-sources/csv.mdx
@@ -5,7 +5,7 @@ title: '📊 CSV'
 To add any csv file, use the data_type as `csv`. `csv` allows remote urls and conventional file paths. Headers are included for each line, so if you have an `age` column, `18` will be added as `age: 18`. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()
 app.add('https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv', data_type="csv")
--- a/docs/components/data-sources/custom.mdx
+++ b/docs/components/data-sources/custom.mdx
@@ -5,7 +5,7 @@ title: '⚙️ Custom'
 When we say "custom", we mean that you can customize the loader and chunker to your needs. This is done by passing a custom loader and chunker to the `add` method.

 ```python
-from embedchain import Pipeline as App
+from embedchain import App
 import your_loader
 import your_chunker

@@ -27,7 +27,7 @@ app.add("source", data_type="custom", loader=loader, chunker=chunker)
 Example:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App
 from embedchain.loaders.github import GithubLoader

 app = App()
--- a/docs/components/data-sources/data-type-handling.mdx
+++ b/docs/components/data-sources/data-type-handling.mdx
@@ -35,7 +35,7 @@ Default behavior is to create a persistent vector db in the directory **./db**.
 Create a local index:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 naval_chat_bot = App()
 naval_chat_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
@@ -45,7 +45,7 @@ naval_chat_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Alma
 You can reuse the local index with the same code, but without adding new documents:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 naval_chat_bot = App()
 print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
@@ -56,7 +56,7 @@ print(naval_chat_bot.query("What unique capacity does Naval argue humans possess
 You can reset the app by simply calling the `reset` method. This will delete the vector database and all other app related files.

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()
 app.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
--- a/docs/components/data-sources/directory.mdx
+++ b/docs/components/data-sources/directory.mdx
@@ -8,7 +8,7 @@ To use an entire directory as data source, just add `data_type` as `directory` a

 ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["OPENAI_API_KEY"] = "sk-xxx"

@@ -23,7 +23,7 @@ print(response)

 ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App
 from embedchain.loaders.directory_loader import DirectoryLoader

 os.environ["OPENAI_API_KEY"] = "sk-xxx"
--- a/docs/components/data-sources/discord.mdx
+++ b/docs/components/data-sources/discord.mdx
@@ -12,7 +12,7 @@ To add any Discord channel messages to your app, just add the `channel_id` as th

 ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 # add your discord "BOT" token
 os.environ["DISCORD_TOKEN"] = "xxx"
--- a/docs/components/data-sources/docs-site.mdx
+++ b/docs/components/data-sources/docs-site.mdx
@@ -5,7 +5,7 @@ title: '📚 Code documentation'
 To add any code documentation website as a loader, use the data_type as `docs_site`. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()
 app.add("https://docs.embedchain.ai/", data_type="docs_site")
--- a/docs/components/data-sources/docx.mdx
+++ b/docs/components/data-sources/docx.mdx
@@ -7,7 +7,7 @@ title: '📄 Docx file'
 To add any doc/docx file, use the data_type as `docx`. `docx` allows remote urls and conventional file paths. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()
 app.add('https://example.com/content/intro.docx', data_type="docx")
--- a/docs/components/data-sources/gmail.mdx
+++ b/docs/components/data-sources/gmail.mdx
@@ -24,7 +24,7 @@ To use this you need to save `credentials.json` in the directory from where you
 12. Put the `.json` file in your current directory and rename it to `credentials.json`

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/json.mdx
+++ b/docs/components/data-sources/json.mdx
@@ -21,7 +21,7 @@ If you would like to add other data structures (e.g. list, dict etc.), convert i
 <CodeGroup>

 ```python python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/mdx.mdx
+++ b/docs/components/data-sources/mdx.mdx
@@ -5,7 +5,7 @@ title: '📝 Mdx file'
 To add any `.mdx` file to your app, use the data_type (first argument to `.add()` method) as `mdx`. Note that this supports support mdx file present on machine, so this should be a file path. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()
 app.add('path/to/file.mdx', data_type='mdx')
--- a/docs/components/data-sources/notion.mdx
+++ b/docs/components/data-sources/notion.mdx
@@ -8,7 +8,7 @@ To load a notion page, use the data_type as `notion`. Since it is hard to automa
 The next argument must **end** with the `notion page id`. The id is a 32-character string. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/openapi.mdx
+++ b/docs/components/data-sources/openapi.mdx
@@ -5,7 +5,7 @@ title: 🙌 OpenAPI
 To add any OpenAPI spec yaml file (currently the json file will be detected as JSON data type), use the data_type as 'openapi'. 'openapi' allows remote urls and conventional file paths.

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/pdf-file.mdx
+++ b/docs/components/data-sources/pdf-file.mdx
@@ -5,7 +5,7 @@ title: '📰 PDF file'
 To add any pdf file, use the data_type as `pdf_file`. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/qna.mdx
+++ b/docs/components/data-sources/qna.mdx
@@ -5,7 +5,7 @@ title: '❓💬 Queston and answer pair'
 QnA pair is a local data type. To supply your own QnA pair, use the data_type as `qna_pair` and enter a tuple. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/sitemap.mdx
+++ b/docs/components/data-sources/sitemap.mdx
@@ -5,7 +5,7 @@ title: '🗺️ Sitemap'
 Add all web pages from an xml-sitemap. Filters non-text files. Use the data_type as `sitemap`. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/slack.mdx
+++ b/docs/components/data-sources/slack.mdx
@@ -16,7 +16,7 @@ This will automatically retrieve data from the workspace associated with the use

 ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["SLACK_USER_TOKEN"] = "xoxp-xxx"
 app = App()
--- a/docs/components/data-sources/substack.mdx
+++ b/docs/components/data-sources/substack.mdx
@@ -5,7 +5,7 @@ title: "📝 Substack"
 To add any Substack data sources to your app, just add the main base url as the source and set the data_type to `substack`.

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/text.mdx
+++ b/docs/components/data-sources/text.mdx
@@ -7,7 +7,7 @@ title: '📝 Text'
 Text is a local data type. To supply your own text, use the data_type as `text` and enter a string. The text is not processed, this can be very versatile. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/web-page.mdx
+++ b/docs/components/data-sources/web-page.mdx
@@ -5,7 +5,7 @@ title: '🌐 Web page'
 To add any web page, use the data_type as `web_page`. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/xml.mdx
+++ b/docs/components/data-sources/xml.mdx
@@ -7,7 +7,7 @@ title: '🧾 XML file'
 To add any xml file, use the data_type as `xml`. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()

--- a/docs/components/data-sources/youtube-channel.mdx
+++ b/docs/components/data-sources/youtube-channel.mdx
@@ -13,7 +13,7 @@ pip install -u "embedchain[youtube]"
 </Note>

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()
 app.add("@channel_name", data_type="youtube_channel")
--- a/docs/components/data-sources/youtube-video.mdx
+++ b/docs/components/data-sources/youtube-video.mdx
@@ -5,7 +5,7 @@ title: '📺 Youtube'
 To add any youtube video to your app, use the data_type as `youtube_video`. Eg:

 ```python
-from embedchain import Pipeline as App
+from embedchain import App

 app = App()
 app.add('a_valid_youtube_url_here', data_type='youtube_video')
--- a/docs/components/embedding-models.mdx
+++ b/docs/components/embedding-models.mdx
@@ -25,7 +25,7 @@ Once you have obtained the key, you can use it like this:

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ['OPENAI_API_KEY'] = 'xxx'

@@ -52,7 +52,7 @@ To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` enviro
 <CodeGroup>
 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["GOOGLE_API_KEY"] = "xxx"

@@ -81,7 +81,7 @@ To use Azure OpenAI embedding model, you have to set some of the azure openai re

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["OPENAI_API_TYPE"] = "azure"
 os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
@@ -119,7 +119,7 @@ GPT4All supports generating high quality embeddings of arbitrary length document
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load embedding model configuration from config.yaml file
 app = App.from_config(config_path="config.yaml")
@@ -148,7 +148,7 @@ Hugging Face supports generating embeddings of arbitrary length documents of tex
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load embedding model configuration from config.yaml file
 app = App.from_config(config_path="config.yaml")
@@ -179,7 +179,7 @@ Embedchain supports Google's VertexAI embeddings model through a simple interfac
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load embedding model configuration from config.yaml file
 app = App.from_config(config_path="config.yaml")
--- a/docs/components/llms.mdx
+++ b/docs/components/llms.mdx
@@ -29,7 +29,7 @@ Once you have obtained the key, you can use it like this:

 ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ['OPENAI_API_KEY'] = 'xxx'

@@ -44,7 +44,7 @@ If you are looking to configure the different parameters of the LLM, you can do

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ['OPENAI_API_KEY'] = 'xxx'

@@ -71,7 +71,7 @@ Examples:
 <Accordion title="Using Pydantic Models">
  ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App
 from embedchain.llm.openai import OpenAILlm
 import requests
 from pydantic import BaseModel, Field, ValidationError, field_validator
@@ -123,7 +123,7 @@ print(result)
  <Accordion title="Using OpenAI JSON schema">
 ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App
 from embedchain.llm.openai import OpenAILlm
 import requests
 from pydantic import BaseModel, Field, ValidationError, field_validator
@@ -158,7 +158,7 @@ print(result)
  <Accordion title="Using actual python functions">
  ```python
 import os
-from embedchain import Pipeline as App
+from embedchain import App
 from embedchain.llm.openai import OpenAILlm
 import requests
 from pydantic import BaseModel, Field, ValidationError, field_validator
@@ -192,7 +192,7 @@ To use Google AI model, you have to set the `GOOGLE_API_KEY` environment variabl
 <CodeGroup>
 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["GOOGLE_API_KEY"] = "xxx"

@@ -235,7 +235,7 @@ To use Azure OpenAI model, you have to set some of the azure openai related envi

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["OPENAI_API_TYPE"] = "azure"
 os.environ["OPENAI_API_BASE"] = "https://xxx.openai.azure.com/"
@@ -274,7 +274,7 @@ To use anthropic's model, please set the `ANTHROPIC_API_KEY` which you find on t

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["ANTHROPIC_API_KEY"] = "xxx"

@@ -311,7 +311,7 @@ Once you have the API key, you are all set to use it with Embedchain.

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["COHERE_API_KEY"] = "xxx"

@@ -347,7 +347,7 @@ Once you have the API key, you are all set to use it with Embedchain.

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["TOGETHER_API_KEY"] = "xxx"

@@ -375,7 +375,7 @@ Setup Ollama using https://github.com/jmorganca/ollama

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 # load llm configuration from config.yaml file
 app = App.from_config(config_path="config.yaml")
@@ -406,7 +406,7 @@ GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or inte
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load llm configuration from config.yaml file
 app = App.from_config(config_path="config.yaml")
@@ -438,7 +438,7 @@ Once you have the key, load the app using the config yaml file:

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["JINACHAT_API_KEY"] = "xxx"
 # load llm configuration from config.yaml file
@@ -474,7 +474,7 @@ Once you have the token, load the app using the config yaml file:

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"

@@ -504,7 +504,7 @@ Once you have the token, load the app using the config yaml file:

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ["REPLICATE_API_TOKEN"] = "xxx"

@@ -531,7 +531,7 @@ Setup Google Cloud Platform application credentials by following the instruction
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load llm configuration from config.yaml file
 app = App.from_config(config_path="config.yaml")
--- a/docs/components/vector-databases.mdx
+++ b/docs/components/vector-databases.mdx
@@ -22,7 +22,7 @@ Utilizing a vector database alongside Embedchain is a seamless process. All you
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load chroma configuration from yaml file
 app = App.from_config(config_path="config1.yaml")
@@ -67,7 +67,7 @@ You can authorize the connection to Elasticsearch by providing either `basic_aut
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load elasticsearch configuration from yaml file
 app = App.from_config(config_path="config.yaml")
@@ -97,7 +97,7 @@ pip install --upgrade 'embedchain[opensearch]'
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load opensearch configuration from yaml file
 app = App.from_config(config_path="config.yaml")
@@ -133,7 +133,7 @@ Set the Zilliz environment variables `ZILLIZ_CLOUD_URI` and `ZILLIZ_CLOUD_TOKEN`

 ```python main.py
 import os
-from embedchain import Pipeline as App
+from embedchain import App

 os.environ['ZILLIZ_CLOUD_URI'] = 'https://xxx.zillizcloud.com'
 os.environ['ZILLIZ_CLOUD_TOKEN'] = 'xxx'
@@ -172,7 +172,7 @@ In order to use Pinecone as vector database, set the environment variables `PINE
 <CodeGroup>

 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load pinecone configuration from yaml file
 app = App.from_config(config_path="config.yaml")
@@ -195,7 +195,7 @@ In order to use Qdrant as a vector database, set the environment variables `QDRA

 <CodeGroup>
 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load qdrant configuration from yaml file
 app = App.from_config(config_path="config.yaml")
@@ -215,7 +215,7 @@ In order to use Weaviate as a vector database, set the environment variables `WE

 <CodeGroup>
 ```python main.py
-from embedchain import Pipeline as App
+from embedchain import App

 # load weaviate configuration from yaml file
 app = App.from_config(config_path="config.yaml")