[docs]: Revamp embedchain docs (#799)
This commit is contained in:
@@ -1,14 +1,19 @@
|
||||
---
|
||||
title: 'CSV'
|
||||
title: '📊 CSV'
|
||||
---
|
||||
|
||||
### CSV file
|
||||
|
||||
To add any csv file, use the data_type as `csv`. `csv` allows remote urls and conventional file paths. Headers are included for each line, so if you have an `age` column, `18` will be added as `age: 18`. Eg:
|
||||
|
||||
```python
|
||||
app.add('https://example.com/content/sheet.csv', data_type="csv")
|
||||
app.add('content/sheet.csv', data_type="csv")
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
app.add('https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv', data_type="csv")
|
||||
# Or add using the local file path
|
||||
# app.add('/path/to/file.csv', data_type="csv")
|
||||
|
||||
app.query("Summarize the air travel data")
|
||||
# Answer: The air travel data shows the number of flights for the months of July in the years 1958, 1959, and 1960. In July 1958, there were 491 flights, in July 1959 there were 548 flights, and in July 1960 there were 622 flights.
|
||||
```
|
||||
|
||||
Note: There is a size limit allowed for csv file beyond which it can throw error. This limit is set by the LLMs. Please consider chunking large csv files into smaller csv files.
|
||||
Note: There is a size limit allowed for csv file beyond which it can throw error. This limit is set by the LLMs. Please consider chunking large csv files into smaller csv files.
|
||||
|
||||
@@ -1,18 +1,16 @@
|
||||
---
|
||||
title: 'Data Type Handling'
|
||||
title: 'Data type handling'
|
||||
---
|
||||
|
||||
## Automatic data type detection
|
||||
|
||||
The add method automatically tries to detect the data_type, based on your input for the source argument. So `app.add('https://www.youtube.com/watch?v=dQw4w9WgXcQ')` is enough to embed a YouTube video.
|
||||
|
||||
This detection is implemented for all formats. It is based on factors such as whether it's a URL, a local file, the source data type, etc.
|
||||
|
||||
### Debugging automatic detection
|
||||
|
||||
|
||||
Set `log_level=DEBUG` (in [AppConfig](http://localhost:3000/advanced/query_configuration#appconfig)) and make sure it's working as intended.
|
||||
|
||||
Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
|
||||
Set `log_level: DEBUG` in the config yaml to debug if the data type detection is done right or not. Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
|
||||
|
||||
### Forcing a data type
|
||||
|
||||
@@ -21,7 +19,7 @@ The examples below show you the keyword to force the respective `data_type`.
|
||||
|
||||
Forcing can also be used for edge cases, such as interpreting a sitemap as a web_page, for reading its raw text instead of following links.
|
||||
|
||||
## Remote Data Types
|
||||
## Remote data types
|
||||
|
||||
<Tip>
|
||||
**Use local files in remote data types**
|
||||
@@ -32,7 +30,7 @@ You can pass local files by formatting the path using the `file:` [URI scheme](h
|
||||
|
||||
## Reusing a vector database
|
||||
|
||||
Default behavior is to create a persistent vector DB in the directory **./db**. You can split your application into two Python scripts: one to create a local vector DB and the other to reuse this local persistent vector DB. This is useful when you want to index hundreds of documents and separately implement a chat interface.
|
||||
Default behavior is to create a persistent vector db in the directory **./db**. You can split your application into two Python scripts: one to create a local vector db and the other to reuse this local persistent vector db. This is useful when you want to index hundreds of documents and separately implement a chat interface.
|
||||
|
||||
Create a local index:
|
||||
|
||||
|
||||
@@ -1,11 +1,14 @@
|
||||
---
|
||||
title: 'Code Documentation'
|
||||
title: '📚🌐 Code documentation'
|
||||
---
|
||||
|
||||
### Code documentation
|
||||
|
||||
To add any code documentation website as a loader, use the data_type as `docs_site`. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
app.add("https://docs.embedchain.ai/", data_type="docs_site")
|
||||
```
|
||||
app.query("What is Embedchain?")
|
||||
# Answer: Embedchain is a platform that utilizes various components, including paid/proprietary ones, to provide what is believed to be the best configuration available. It uses LLM (Language Model) providers such as OpenAI, Anthpropic, Vertex_AI, GPT4ALL, Azure_OpenAI, LLAMA2, JINA, and COHERE. Embedchain allows users to import and utilize these LLM providers for their applications.'
|
||||
```
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: 'Docx File'
|
||||
title: '📄 Docx file'
|
||||
---
|
||||
|
||||
### Docx file
|
||||
@@ -7,6 +7,12 @@ title: 'Docx File'
|
||||
To add any doc/docx file, use the data_type as `docx`. `docx` allows remote urls and conventional file paths. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
app.add('https://example.com/content/intro.docx', data_type="docx")
|
||||
app.add('content/intro.docx', data_type="docx")
|
||||
```
|
||||
# Or add file using the local file path on your system
|
||||
# app.add('content/intro.docx', data_type="docx")
|
||||
|
||||
app.query("Summarize the docx data?")
|
||||
```
|
||||
|
||||
@@ -1,24 +0,0 @@
|
||||
---
|
||||
title: 'How to add data'
|
||||
---
|
||||
|
||||
## Add Dataset
|
||||
|
||||
- This step assumes that you have already created an `App`. We are calling our app instance as `naval_chat_bot` 🤖
|
||||
|
||||
- Now use `.add` method to add any dataset.
|
||||
|
||||
```python
|
||||
naval_chat_bot = App()
|
||||
|
||||
# Embed Online Resources
|
||||
naval_chat_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
|
||||
naval_chat_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
|
||||
naval_chat_bot.add("https://nav.al/feedback")
|
||||
naval_chat_bot.add("https://nav.al/agi")
|
||||
|
||||
# Embed Local Resources
|
||||
naval_chat_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
|
||||
```
|
||||
|
||||
The possible formats to add data can be found on the [Supported Data Formats](/advanced/data_types) page.
|
||||
@@ -1,12 +1,14 @@
|
||||
---
|
||||
title: 'Mdx'
|
||||
title: '📝 Mdx file'
|
||||
---
|
||||
|
||||
|
||||
### Mdx file
|
||||
|
||||
To add any mdx file to your app, use the data_type (first argument to `.add()` method) as `mdx`. Note that this supports support mdx file present on machine, so this should be a file path. Eg:
|
||||
To add any `.mdx` file to your app, use the data_type (first argument to `.add()` method) as `mdx`. Note that this supports support mdx file present on machine, so this should be a file path. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
app.add('path/to/file.mdx', data_type='mdx')
|
||||
```
|
||||
|
||||
app.query("What are the docs about?")
|
||||
```
|
||||
|
||||
@@ -1,15 +1,20 @@
|
||||
---
|
||||
title: 'Notion'
|
||||
title: '📓 Notion'
|
||||
---
|
||||
|
||||
### Notion
|
||||
To use notion you must install the extra dependencies with `pip install --upgrade embedchain[notion]`.
|
||||
|
||||
To load a notion page, use the data_type as `notion`. Since it is hard to automatically detect, forcing this is advised.
|
||||
To load a notion page, use the data_type as `notion`. Since it is hard to automatically detect, it is advised to specify the `data_type` when adding a notion document.
|
||||
The next argument must **end** with the `notion page id`. The id is a 32-character string. Eg:
|
||||
|
||||
```python
|
||||
app.add("cfbc134ca6464fc980d0391613959196", "notion")
|
||||
app.add("my-page-cfbc134ca6464fc980d0391613959196", "notion")
|
||||
app.add("https://www.notion.so/my-page-cfbc134ca6464fc980d0391613959196", "notion")
|
||||
```
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
|
||||
app.add("cfbc134ca6464fc980d0391613959196", data_type="notion")
|
||||
app.add("my-page-cfbc134ca6464fc980d0391613959196", data_type="notion")
|
||||
app.add("https://www.notion.so/my-page-cfbc134ca6464fc980d0391613959196", data_type="notion")
|
||||
|
||||
app.query("Summarize the notion doc")
|
||||
```
|
||||
|
||||
24
docs/data-sources/overview.mdx
Normal file
24
docs/data-sources/overview.mdx
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
title: Overview
|
||||
---
|
||||
|
||||
Embedchain comes with built-in support for various data sources. We handle the complexity of loading unstructured data from these data sources, allowing you to easily customize your app through a user-friendly interface.
|
||||
|
||||
<CardGroup cols={4}>
|
||||
<Card title="📊 csv" href="/data-sources/csv"></Card>
|
||||
<Card title="📚🌐 docs site" href="/data-sources/docs-site"></Card>
|
||||
<Card title="📄 docx" href="/data-sources/docx"></Card>
|
||||
<Card title="📝 mdx" href="/data-sources/mdx"></Card>
|
||||
<Card title="📓 notion" href="/data-sources/notion"></Card>
|
||||
<Card title="📰 pdf" href="/data-sources/pdf-file"></Card>
|
||||
<Card title="❓💬 q&a pair" href="/data-sources/qna"></Card>
|
||||
<Card title="🗺️ sitemap" href="/data-sources/sitemap"></Card>
|
||||
<Card title="📝 text" href="/data-sources/text"></Card>
|
||||
<Card title="🌐📄 web page" href="/data-sources/web-page"></Card>
|
||||
<Card title="🧾 xml" href="/data-sources/xml"></Card>
|
||||
<Card title="🎥📺 youtube video" href="/data-sources/youtube-video"></Card>
|
||||
</CardGroup>
|
||||
|
||||
<br/ >
|
||||
|
||||
<Snippet file="missing-data-source-tip.mdx" />
|
||||
@@ -1,14 +1,17 @@
|
||||
---
|
||||
title: 'PDF File'
|
||||
title: '📰 PDF file'
|
||||
---
|
||||
|
||||
|
||||
### PDF File
|
||||
|
||||
To add any pdf file, use the data_type as `pdf_file`. Eg:
|
||||
|
||||
```python
|
||||
app.add('a_valid_url_where_pdf_file_can_be_accessed', data_type='pdf_file')
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
|
||||
app.add('https://arxiv.org/pdf/1706.03762.pdf', data_type='pdf_file')
|
||||
app.query("What is the paper 'attention is all you need' about?")
|
||||
# Answer: The paper "Attention Is All You Need" proposes a new network architecture called the Transformer, which is based solely on attention mechanisms. It suggests moving away from complex recurrent or convolutional neural networks and instead using attention mechanisms to connect the encoder and decoder in sequence transduction models.
|
||||
```
|
||||
|
||||
Note that we do not support password protected pdfs.
|
||||
@@ -1,11 +1,13 @@
|
||||
---
|
||||
title: 'QnA Pair'
|
||||
title: '❓💬 Queston and answer pair'
|
||||
---
|
||||
|
||||
### QnA pair
|
||||
|
||||
QnA pair is a local data type. To supply your own QnA pair, use the data_type as `qna_pair` and enter a tuple. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
|
||||
app.add(("Question", "Answer"), data_type="qna_pair")
|
||||
```
|
||||
@@ -1,4 +0,0 @@
|
||||
---
|
||||
title: 'Request New Format'
|
||||
url: https://forms.gle/gB5La14tjgy4p94dA
|
||||
---
|
||||
@@ -1,11 +1,13 @@
|
||||
---
|
||||
title: 'Sitemap'
|
||||
title: '🗺️ Sitemap'
|
||||
---
|
||||
|
||||
### Sitemap
|
||||
|
||||
Add all web pages from an xml-sitemap. Filters non-text files. Use the data_type as `sitemap`. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
|
||||
app.add('https://example.com/sitemap.xml', data_type='sitemap')
|
||||
```
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: 'Text'
|
||||
title: '📝 Text'
|
||||
---
|
||||
|
||||
### Text
|
||||
@@ -7,7 +7,11 @@ title: 'Text'
|
||||
Text is a local data type. To supply your own text, use the data_type as `text` and enter a string. The text is not processed, this can be very versatile. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
|
||||
app.add('Seek wealth, not money or status. Wealth is having assets that earn while you sleep. Money is how we transfer time and wealth. Status is your place in the social hierarchy.', data_type='text')
|
||||
```
|
||||
|
||||
Note: This is not used in the examples because in most cases you will supply a whole paragraph or file, which did not fit.
|
||||
Note: This is not used in the examples because in most cases you will supply a whole paragraph or file, which did not fit.
|
||||
|
||||
@@ -1,11 +1,13 @@
|
||||
---
|
||||
title: 'Web page'
|
||||
title: '🌐📄 Web page'
|
||||
---
|
||||
|
||||
### Web page
|
||||
|
||||
To add any web page, use the data_type as `web_page`. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
|
||||
app.add('a_valid_web_page_url', data_type='web_page')
|
||||
```
|
||||
```
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: 'XML File'
|
||||
title: '🧾 XML file'
|
||||
---
|
||||
|
||||
### XML file
|
||||
@@ -7,7 +7,11 @@ title: 'XML File'
|
||||
To add any xml file, use the data_type as `xml`. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
|
||||
app.add('content/data.xml')
|
||||
```
|
||||
|
||||
Note: Only the text content of the xml file will be added to the app. The tags will be ignored.
|
||||
Note: Only the text content of the xml file will be added to the app. The tags will be ignored.
|
||||
|
||||
@@ -1,12 +1,13 @@
|
||||
---
|
||||
title: 'Youtube Video'
|
||||
title: '🎥📺 Youtube video'
|
||||
---
|
||||
|
||||
|
||||
### Youtube video
|
||||
|
||||
To add any youtube video to your app, use the data_type (first argument to `.add()` method) as `youtube_video`. Eg:
|
||||
|
||||
```python
|
||||
from embedchain import App
|
||||
|
||||
app = App()
|
||||
app.add('a_valid_youtube_url_here', data_type='youtube_video')
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user