[Docs] Revamp documentation (#1010)

This commit is contained in:
Deshraj Yadav
2023-12-15 05:14:17 +05:30
committed by GitHub
parent b7a44ef472
commit d54cdc5b00
81 changed files with 1223 additions and 378 deletions

View File

@@ -0,0 +1,16 @@
---
title: "🐝 Beehiiv"
---
To add any Beehiiv data sources to your app, just add the base url as the source and set the data_type to `beehiiv`.
```python
from embedchain import Pipeline as App
app = App()
# source: just add the base url and set the data_type to 'beehiiv'
app.add('https://aibreakfast.beehiiv.com', data_type='beehiiv')
app.query("How much is OpenAI paying developers?")
# Answer: OpenAI is aggressively recruiting Google's top AI researchers with offers ranging between $5 to $10 million annually, primarily in stock options.
```

View File

@@ -0,0 +1,19 @@
---
title: '📊 CSV'
---
To add any csv file, use the data_type as `csv`. `csv` allows remote urls and conventional file paths. Headers are included for each line, so if you have an `age` column, `18` will be added as `age: 18`. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv', data_type="csv")
# Or add using the local file path
# app.add('/path/to/file.csv', data_type="csv")
app.query("Summarize the air travel data")
# Answer: The air travel data shows the number of flights for the months of July in the years 1958, 1959, and 1960. In July 1958, there were 491 flights, in July 1959 there were 548 flights, and in July 1960 there were 622 flights.
```
Note: There is a size limit allowed for csv file beyond which it can throw error. This limit is set by the LLMs. Please consider chunking large csv files into smaller csv files.

View File

@@ -0,0 +1,41 @@
---
title: '⚙️ Custom'
---
When we say "custom", we mean that you can customize the loader and chunker to your needs. This is done by passing a custom loader and chunker to the `add` method.
```python
from embedchain import Pipeline as App
import your_loader
import your_chunker
app = App()
loader = your_loader()
chunker = your_chunker()
app.add("source", data_type="custom", loader=loader, chunker=chunker)
```
<Note>
The custom loader and chunker must be a class that inherits from the [`BaseLoader`](https://github.com/embedchain/embedchain/blob/main/embedchain/loaders/base_loader.py) and [`BaseChunker`](https://github.com/embedchain/embedchain/blob/main/embedchain/chunkers/base_chunker.py) classes respectively.
</Note>
<Note>
If the `data_type` is not a valid data type, the `add` method will fallback to the `custom` data type and expect a custom loader and chunker to be passed by the user.
</Note>
Example:
```python
from embedchain import Pipeline as App
from embedchain.loaders.github import GithubLoader
app = App()
loader = GithubLoader(config={"token": "ghp_xxx"})
app.add("repo:embedchain/embedchain type:repo", data_type="github", loader=loader)
app.query("What is Embedchain?")
# Answer: Embedchain is a Data Platform for Large Language Models (LLMs). It allows users to seamlessly load, index, retrieve, and sync unstructured data in order to build dynamic, LLM-powered applications. There is also a JavaScript implementation called embedchain-js available on GitHub.
```

View File

@@ -0,0 +1,64 @@
---
title: 'Data type handling'
---
## Automatic data type detection
The add method automatically tries to detect the data_type, based on your input for the source argument. So `app.add('https://www.youtube.com/watch?v=dQw4w9WgXcQ')` is enough to embed a YouTube video.
This detection is implemented for all formats. It is based on factors such as whether it's a URL, a local file, the source data type, etc.
### Debugging automatic detection
Set `log_level: DEBUG` in the config yaml to debug if the data type detection is done right or not. Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
### Forcing a data type
To omit any issues with the data type detection, you can **force** a data_type by adding it as a `add` method argument.
The examples below show you the keyword to force the respective `data_type`.
Forcing can also be used for edge cases, such as interpreting a sitemap as a web_page, for reading its raw text instead of following links.
## Remote data types
<Tip>
**Use local files in remote data types**
Some data_types are meant for remote content and only work with URLs.
You can pass local files by formatting the path using the `file:` [URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme), e.g. `file:///info.pdf`.
</Tip>
## Reusing a vector database
Default behavior is to create a persistent vector db in the directory **./db**. You can split your application into two Python scripts: one to create a local vector db and the other to reuse this local persistent vector db. This is useful when you want to index hundreds of documents and separately implement a chat interface.
Create a local index:
```python
from embedchain import Pipeline as App
naval_chat_bot = App()
naval_chat_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
naval_chat_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
```
You can reuse the local index with the same code, but without adding new documents:
```python
from embedchain import Pipeline as App
naval_chat_bot = App()
print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
```
## Resetting an app and vector database
You can reset the app by simply calling the `reset` method. This will delete the vector database and all other app related files.
```python
from embedchain import Pipeline as App
app = App()
app.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
app.reset()
```

View File

@@ -0,0 +1,28 @@
---
title: "💬 Discord"
---
To add any Discord channel messages to your app, just add the `channel_id` as the source and set the `data_type` to `discord`.
<Note>
This loader requires a Discord bot token with read messages access.
To obtain the token, follow the instructions provided in this tutorial:
<a href="https://www.writebots.com/discord-bot-token/">How to Get a Discord Bot Token?</a>.
</Note>
```python
import os
from embedchain import Pipeline as App
# add your discord "BOT" token
os.environ["DISCORD_TOKEN"] = "xxx"
app = App()
app.add("1177296711023075338", data_type="discord")
response = app.query("What is Joe saying about Elon Musk?")
print(response)
# Answer: Joe is saying "Elon Musk is a genius".
```

View File

@@ -0,0 +1,44 @@
---
title: '🗨️ Discourse'
---
You can now easily load data from your community built with [Discourse](https://discourse.org/).
## Example
1. Setup the Discourse Loader with your community url.
```Python
from embedchain.loaders.discourse import DiscourseLoader
dicourse_loader = DiscourseLoader(config={"domain": "https://community.openai.com"})
```
2. Once you setup the loader, you can create an app and load data using the above discourse loader
```Python
import os
from embedchain.pipeline import Pipeline as App
os.environ["OPENAI_API_KEY"] = "sk-xxx"
app = App()
app.add("openai after:2023-10-1", data_type="discourse", loader=dicourse_loader)
question = "Where can I find the OpenAI API status page?"
app.query(question)
# Answer: You can find the OpenAI API status page at https:/status.openai.com/.
```
NOTE: The `add` function of the app will accept any executable search query to load data. Refer [Discourse API Docs](https://docs.discourse.org/#tag/Search) to learn more about search queries.
3. We automatically create a chunker to chunk your discourse data, however if you wish to provide your own chunker class. Here is how you can do that:
```Python
from embedchain.chunkers.discourse import DiscourseChunker
from embedchain.config.add_config import ChunkerConfig
discourse_chunker_config = ChunkerConfig(chunk_size=1000, chunk_overlap=0, length_function=len)
discourse_chunker = DiscourseChunker(config=discourse_chunker_config)
app.add("openai", data_type='discourse', loader=dicourse_loader, chunker=discourse_chunker)
```

View File

@@ -0,0 +1,14 @@
---
title: '📚 Code documentation'
---
To add any code documentation website as a loader, use the data_type as `docs_site`. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add("https://docs.embedchain.ai/", data_type="docs_site")
app.query("What is Embedchain?")
# Answer: Embedchain is a platform that utilizes various components, including paid/proprietary ones, to provide what is believed to be the best configuration available. It uses LLM (Language Model) providers such as OpenAI, Anthpropic, Vertex_AI, GPT4ALL, Azure_OpenAI, LLAMA2, JINA, and COHERE. Embedchain allows users to import and utilize these LLM providers for their applications.'
```

View File

@@ -0,0 +1,18 @@
---
title: '📄 Docx file'
---
### Docx file
To add any doc/docx file, use the data_type as `docx`. `docx` allows remote urls and conventional file paths. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('https://example.com/content/intro.docx', data_type="docx")
# Or add file using the local file path on your system
# app.add('content/intro.docx', data_type="docx")
app.query("Summarize the docx data?")
```

View File

@@ -0,0 +1,50 @@
---
title: 📝 Github
---
1. Setup the Github loader by configuring the Github account with username and personal access token (PAT). Check out [this](https://docs.github.com/en/enterprise-server@3.6/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token) link to learn how to create a PAT.
```Python
from embedchain.loaders.github import GithubLoader
loader = GithubLoader(
config={
"token":"ghp_xxxx"
}
)
```
2. Once you setup the loader, you can create an app and load data using the above Github loader
```Python
import os
from embedchain.pipeline import Pipeline as App
os.environ["OPENAI_API_KEY"] = "sk-xxxx"
app = App()
app.add("repo:embedchain/embedchain type:repo", data_type="github", loader=loader)
response = app.query("What is Embedchain?")
# Answer: Embedchain is a Data Platform for Large Language Models (LLMs). It allows users to seamlessly load, index, retrieve, and sync unstructured data in order to build dynamic, LLM-powered applications. There is also a JavaScript implementation called embedchain-js available on GitHub.
```
The `add` function of the app will accept any valid github query with qualifiers. It only supports loading github code, repository, issues and pull-requests.
<Note>
You must provide qualifiers `type:` and `repo:` in the query. The `type:` qualifier can be a combination of `code`, `repo`, `pr`, `issue`. The `repo:` qualifier must be a valid github repository name.
</Note>
<Card title="Valid queries" icon="lightbulb" iconType="duotone" color="#ca8b04">
- `repo:embedchain/embedchain type:repo` - to load the repository
- `repo:embedchain/embedchain type:issue,pr` - to load the issues and pull-requests of the repository
- `repo:embedchain/embedchain type:issue state:closed` - to load the closed issues of the repository
</Card>
3. We automatically create a chunker to chunk your GitHub data, however if you wish to provide your own chunker class. Here is how you can do that:
```Python
from embedchain.chunkers.common_chunker import CommonChunker
from embedchain.config.add_config import ChunkerConfig
github_chunker_config = ChunkerConfig(chunk_size=2000, chunk_overlap=0, length_function=len)
github_chunker = CommonChunker(config=github_chunker_config)
app.add(load_query, data_type="github", loader=loader, chunker=github_chunker)
```

View File

@@ -0,0 +1,34 @@
---
title: '📬 Gmail'
---
To use GmailLoader you must install the extra dependencies with `pip install --upgrade embedchain[gmail]`.
The `source` must be a valid Gmail search query, you can refer `https://support.google.com/mail/answer/7190?hl=en` to build a query.
To load Gmail messages, you MUST use the data_type as `gmail`. Otherwise the source will be detected as simple `text`.
To use this you need to save `credentials.json` in the directory from where you will run the loader. Follow these steps to get the credentials
1. Go to the [Google Cloud Console](https://console.cloud.google.com/apis/credentials).
2. Create a project if you don't have one already.
3. Create an `OAuth Consent Screen` in the project. You may need to select the `external` option.
4. Make sure the consent screen is published.
5. Enable the [Gmail API](https://console.cloud.google.com/apis/api/gmail.googleapis.com)
6. Create credentials from the `Credentials` tab.
7. Select the type `OAuth Client ID`.
8. Choose the application type `Web application`. As a name you can choose `embedchain` or any other name as per your use case.
9. Add an authorized redirect URI for `http://localhost:8080/`.
10. You can leave everything else at default, finish the creation.
11. When you are done, a modal opens where you can download the details in `json` format.
12. Put the `.json` file in your current directory and rename it to `credentials.json`
```python
from embedchain import Pipeline as App
app = App()
gmail_filter = "to: me label:inbox"
app.add(gmail_filter, data_type="gmail")
app.query("Summarize my email conversations")
```

View File

@@ -0,0 +1,44 @@
---
title: '📃 JSON'
---
To add any json file, use the data_type as `json`. Headers are included for each line, so for example if you have a json like `{"age": 18}`, then it will be added as `age: 18`.
Here are the supported sources for loading `json`:
```
1. URL - valid url to json file that ends with ".json" extension.
2. Local file - valid url to local json file that ends with ".json" extension.
3. String - valid json string (e.g. - app.add('{"foo": "bar"}'))
```
<Tip>
If you would like to add other data structures (e.g. list, dict etc.), convert it to a valid json first using `json.dumps()` function.
</Tip>
## Example
<CodeGroup>
```python python
from embedchain import Pipeline as App
app = App()
# Add json file
app.add("temp.json")
app.query("What is the net worth of Elon Musk as of October 2023?")
# As of October 2023, Elon Musk's net worth is $255.2 billion.
```
```json temp.json
{
"question": "What is your net worth, Elon Musk?",
"answer": "As of October 2023, Elon Musk's net worth is $255.2 billion, making him one of the wealthiest individuals in the world."
}
```
</CodeGroup>

View File

@@ -0,0 +1,14 @@
---
title: '📝 Mdx file'
---
To add any `.mdx` file to your app, use the data_type (first argument to `.add()` method) as `mdx`. Note that this supports support mdx file present on machine, so this should be a file path. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('path/to/file.mdx', data_type='mdx')
app.query("What are the docs about?")
```

View File

@@ -0,0 +1,47 @@
---
title: '🐬 MySQL'
---
1. Setup the MySQL loader by configuring the SQL db.
```Python
from embedchain.loaders.mysql import MySQLLoader
config = {
"host": "host",
"port": "port",
"database": "database",
"user": "username",
"password": "password",
}
mysql_loader = MySQLLoader(config=config)
```
For more details on how to setup with valid config, check MySQL [documentation](https://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html).
2. Once you setup the loader, you can create an app and load data using the above MySQL loader
```Python
from embedchain.pipeline import Pipeline as App
app = App()
app.add("SELECT * FROM table_name;", data_type='mysql', loader=mysql_loader)
# Adds `(1, 'What is your net worth, Elon Musk?', "As of October 2023, Elon Musk's net worth is $255.2 billion.")`
response = app.query(question)
# Answer: As of October 2023, Elon Musk's net worth is $255.2 billion.
```
NOTE: The `add` function of the app will accept any executable query to load data. DO NOT pass the `CREATE`, `INSERT` queries in `add` function.
3. We automatically create a chunker to chunk your SQL data, however if you wish to provide your own chunker class. Here is how you can do that:
``Python
from embedchain.chunkers.mysql import MySQLChunker
from embedchain.config.add_config import ChunkerConfig
mysql_chunker_config = ChunkerConfig(chunk_size=1000, chunk_overlap=0, length_function=len)
mysql_chunker = MySQLChunker(config=mysql_chunker_config)
app.add("SELECT * FROM table_name;", data_type='mysql', loader=mysql_loader, chunker=mysql_chunker)
```

View File

@@ -0,0 +1,20 @@
---
title: '📓 Notion'
---
To use notion you must install the extra dependencies with `pip install --upgrade embedchain[community]`.
To load a notion page, use the data_type as `notion`. Since it is hard to automatically detect, it is advised to specify the `data_type` when adding a notion document.
The next argument must **end** with the `notion page id`. The id is a 32-character string. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add("cfbc134ca6464fc980d0391613959196", data_type="notion")
app.add("my-page-cfbc134ca6464fc980d0391613959196", data_type="notion")
app.add("https://www.notion.so/my-page-cfbc134ca6464fc980d0391613959196", data_type="notion")
app.query("Summarize the notion doc")
```

View File

@@ -0,0 +1,22 @@
---
title: 🙌 OpenAPI
---
To add any OpenAPI spec yaml file (currently the json file will be detected as JSON data type), use the data_type as 'openapi'. 'openapi' allows remote urls and conventional file paths.
```python
from embedchain import Pipeline as App
app = App()
app.add("https://github.com/openai/openai-openapi/blob/master/openapi.yaml", data_type="openapi")
# Or add using the local file path
# app.add("configs/openai_openapi.yaml", data_type="openapi")
app.query("What can OpenAI API endpoint do? Can you list the things it can learn from?")
# Answer: The OpenAI API endpoint allows users to interact with OpenAI's models and perform various tasks such as generating text, answering questions, summarizing documents, translating languages, and more. The specific capabilities and tasks that the API can learn from may vary depending on the models and features provided by OpenAI. For more detailed information, it is recommended to refer to the OpenAI API documentation at https://platform.openai.com/docs/api-reference.
```
<Note>
The yaml file added to the App must have the required OpenAPI fields otherwise the adding OpenAPI spec will fail. Please refer to [OpenAPI Spec Doc](https://spec.openapis.org/oas/v3.1.0)
</Note>

View File

@@ -0,0 +1,36 @@
---
title: Overview
---
Embedchain comes with built-in support for various data sources. We handle the complexity of loading unstructured data from these data sources, allowing you to easily customize your app through a user-friendly interface.
<CardGroup cols={4}>
<Card title="📰 PDF file" href="/components/data-sources/pdf-file"></Card>
<Card title="📊 CSV file" href="/components/data-sources/csv"></Card>
<Card title="📃 JSON file" href="/components/data-sources/json"></Card>
<Card title="📺 Youtube" href="/components/data-sources/youtube-video"></Card>
<Card title="📝 Text" href="/components/data-sources/text"></Card>
<Card title="📚 Documentation website" href="/components/data-sources/docs-site"></Card>
<Card title="📄 DOCX file" href="/components/data-sources/docx"></Card>
<Card title="📝 MDX file" href="/components/data-sources/mdx"></Card>
<Card title="📓 Notion" href="/components/data-sources/notion"></Card>
<Card title="❓💬 Q&A pair" href="/components/data-sources/qna"></Card>
<Card title="🗺️ Sitemap" href="/components/data-sources/sitemap"></Card>
<Card title="🌐 Web page" href="/components/data-sources/web-page"></Card>
<Card title="🧾 XML file" href="/components/data-sources/xml"></Card>
<Card title="🙌 OpenAPI" href="/components/data-sources/openapi"></Card>
<Card title="📬 Gmail" href="/components/data-sources/gmail"></Card>
<Card title="🐘 Postgres" href="/components/data-sources/postgres"></Card>
<Card title="🐬 MySQL" href="/components/data-sources/mysql"></Card>
<Card title="🤖 Slack" href="/components/data-sources/slack"></Card>
<Card title="🗨️ Discourse" href="/components/data-sources/discourse"></Card>
<Card title="💬 Discord" href="/components/data-sources/discord"></Card>
<Card title="📝 Github" href="/components/data-sources/github"></Card>
<Card title="⚙️ Custom" href="/components/data-sources/custom"></Card>
<Card title="📝 Substack" href="/components/data-sources/substack"></Card>
<Card title="🐝 Beehiiv" href="/components/data-sources/beehiiv"></Card>
</CardGroup>
<br/ >
<Snippet file="missing-data-source-tip.mdx" />

View File

@@ -0,0 +1,17 @@
---
title: '📰 PDF file'
---
To add any pdf file, use the data_type as `pdf_file`. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('https://arxiv.org/pdf/1706.03762.pdf', data_type='pdf_file')
app.query("What is the paper 'attention is all you need' about?")
# Answer: The paper "Attention Is All You Need" proposes a new network architecture called the Transformer, which is based solely on attention mechanisms. It suggests moving away from complex recurrent or convolutional neural networks and instead using attention mechanisms to connect the encoder and decoder in sequence transduction models.
```
Note that we do not support password protected pdfs.

View File

@@ -0,0 +1,64 @@
---
title: '🐘 Postgres'
---
1. Setup the Postgres loader by configuring the postgres db.
```Python
from embedchain.loaders.postgres import PostgresLoader
config = {
"host": "host_address",
"port": "port_number",
"dbname": "database_name",
"user": "username",
"password": "password",
}
"""
config = {
"url": "your_postgres_url"
}
"""
postgres_loader = PostgresLoader(config=config)
```
You can either setup the loader by passing the postgresql url or by providing the config data.
For more details on how to setup with valid url and config, check postgres [documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING:~:text=34.1.1.%C2%A0Connection%20Strings-,%23,-Several%20libpq%20functions).
NOTE: if you provide the `url` field in config, all other fields will be ignored.
2. Once you setup the loader, you can create an app and load data using the above postgres loader
```Python
import os
from embedchain.pipeline import Pipeline as App
os.environ["OPENAI_API_KEY"] = "sk-xxx"
app = App()
question = "What is Elon Musk's networth?"
response = app.query(question)
# Answer: As of September 2021, Elon Musk's net worth is estimated to be around $250 billion, making him one of the wealthiest individuals in the world. However, please note that net worth can fluctuate over time due to various factors such as stock market changes and business ventures.
app.add("SELECT * FROM table_name;", data_type='postgres', loader=postgres_loader)
# Adds `(1, 'What is your net worth, Elon Musk?', "As of October 2023, Elon Musk's net worth is $255.2 billion.")`
response = app.query(question)
# Answer: As of October 2023, Elon Musk's net worth is $255.2 billion.
```
NOTE: The `add` function of the app will accept any executable query to load data. DO NOT pass the `CREATE`, `INSERT` queries in `add` function as they will result in not adding any data, so it is pointless.
3. We automatically create a chunker to chunk your postgres data, however if you wish to provide your own chunker class. Here is how you can do that:
```Python
from embedchain.chunkers.postgres import PostgresChunker
from embedchain.config.add_config import ChunkerConfig
postgres_chunker_config = ChunkerConfig(chunk_size=1000, chunk_overlap=0, length_function=len)
postgres_chunker = PostgresChunker(config=postgres_chunker_config)
app.add("SELECT * FROM table_name;", data_type='postgres', loader=postgres_loader, chunker=postgres_chunker)
```

View File

@@ -0,0 +1,13 @@
---
title: '❓💬 Queston and answer pair'
---
QnA pair is a local data type. To supply your own QnA pair, use the data_type as `qna_pair` and enter a tuple. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add(("Question", "Answer"), data_type="qna_pair")
```

View File

@@ -0,0 +1,13 @@
---
title: '🗺️ Sitemap'
---
Add all web pages from an xml-sitemap. Filters non-text files. Use the data_type as `sitemap`. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('https://example.com/sitemap.xml', data_type='sitemap')
```

View File

@@ -0,0 +1,54 @@
---
title: '🤖 Slack'
---
## Pre-requisite
- Download required packages by running `pip install --upgrade "embedchain[slack]"`.
- Configure your slack bot token as environment variable `SLACK_USER_TOKEN`.
- Find your user token on your [Slack Account](https://api.slack.com/authentication/token-types)
- Make sure your slack user token includes [search](https://api.slack.com/scopes/search:read) scope.
## Example
1. Setup the Slack loader by configuring the Slack Webclient.
```Python
from embedchain.loaders.slack import SlackLoader
os.environ["SLACK_USER_TOKEN"] = "xoxp-*"
loader = SlackLoader()
"""
config = {
'base_url': slack_app_url,
'headers': web_headers,
'team_id': slack_team_id,
}
loader = SlackLoader(config)
"""
```
NOTE: you can also pass the `config` with `base_url`, `headers`, `team_id` to setup your SlackLoader.
2. Once you setup the loader, you can create an app and load data using the above slack loader
```Python
import os
from embedchain.pipeline import Pipeline as App
app = App()
app.add("in:random", data_type="slack", loader=loader)
question = "Which bots are available in the slack workspace's random channel?"
# Answer: The available bot in the slack workspace's random channel is the Embedchain bot.
```
3. We automatically create a chunker to chunk your slack data, however if you wish to provide your own chunker class. Here is how you can do that:
```Python
from embedchain.chunkers.slack import SlackChunker
from embedchain.config.add_config import ChunkerConfig
slack_chunker_config = ChunkerConfig(chunk_size=1000, chunk_overlap=0, length_function=len)
slack_chunker = SlackChunker(config=slack_chunker_config)
app.add(slack_chunker, data_type="slack", loader=loader, chunker=slack_chunker)
```

View File

@@ -0,0 +1,16 @@
---
title: "📝 Substack"
---
To add any Substack data sources to your app, just add the main base url as the source and set the data_type to `substack`.
```python
from embedchain import Pipeline as App
app = App()
# source: for any substack just add the root URL
app.add('https://www.lennysnewsletter.com', data_type='substack')
app.query("Who is Brian Chesky?")
# Answer: Brian Chesky is the co-founder and CEO of Airbnb.
```

View File

@@ -0,0 +1,17 @@
---
title: '📝 Text'
---
### Text
Text is a local data type. To supply your own text, use the data_type as `text` and enter a string. The text is not processed, this can be very versatile. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('Seek wealth, not money or status. Wealth is having assets that earn while you sleep. Money is how we transfer time and wealth. Status is your place in the social hierarchy.', data_type='text')
```
Note: This is not used in the examples because in most cases you will supply a whole paragraph or file, which did not fit.

View File

@@ -0,0 +1,13 @@
---
title: '🌐 Web page'
---
To add any web page, use the data_type as `web_page`. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('a_valid_web_page_url', data_type='web_page')
```

View File

@@ -0,0 +1,17 @@
---
title: '🧾 XML file'
---
### XML file
To add any xml file, use the data_type as `xml`. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('content/data.xml')
```
Note: Only the text content of the xml file will be added to the app. The tags will be ignored.

View File

@@ -0,0 +1,13 @@
---
title: '📺 Youtube'
---
To add any youtube video to your app, use the data_type (first argument to `.add()` method) as `youtube_video`. Eg:
```python
from embedchain import Pipeline as App
app = App()
app.add('a_valid_youtube_url_here', data_type='youtube_video')
```

View File