[Docs] Revamp documentation (#1010)

This commit is contained in:
Deshraj Yadav
2023-12-15 05:14:17 +05:30
committed by GitHub
parent b7a44ef472
commit d54cdc5b00
81 changed files with 1223 additions and 378 deletions

View File

@@ -0,0 +1,44 @@
---
title: '📊 add'
---
`add()` method is used to load the data sources from different data sources to a RAG pipeline. You can find the signature below:
### Parameters
<ParamField path="source" type="str">
The data to embed, can be a URL, local file or raw content, depending on the data type.. You can find the full list of supported data sources [here](/components/data-sources/overview).
</ParamField>
<ParamField path="data_type" type="str" optional>
Type of data source. It can be automatically detected but user can force what data type to load as.
</ParamField>
<ParamField path="metadata" type="dict" optional>
Any metadata that you want to store with the data source. Metadata is generally really useful for doing metadata filtering on top of semantic search to yield faster search and better results.
</ParamField>
## Usage
### Load data from webpage
```python Code example
from embedchain import Pipeline as App
app = App()
app.add("https://www.forbes.com/profile/elon-musk")
# Inserting batches in chromadb: 100%|███████████████| 1/1 [00:00<00:00, 1.19it/s]
# Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 4
```
### Load data from sitemap
```python Code example
from embedchain import Pipeline as App
app = App()
app.add("https://python.langchain.com/sitemap.xml", data_type="sitemap")
# Loading pages: 100%|█████████████| 1108/1108 [00:47<00:00, 23.17it/s]
# Inserting batches in chromadb: 100%|█████████| 111/111 [04:41<00:00, 2.54s/it]
# Successfully saved https://python.langchain.com/sitemap.xml (DataType.SITEMAP). New chunks count: 11024
```
You can find complete list of supported data sources [here](/components/data-sources/overview).

View File

@@ -0,0 +1,97 @@
---
title: '💬 chat'
---
`chat()` method allows you to chat over your data sources using a user-friendly chat API. You can find the signature below:
### Parameters
<ParamField path="input_query" type="str">
Question to ask
</ParamField>
<ParamField path="config" type="BaseLlmConfig" optional>
Configure different llm settings such as prompt, temprature, number_documents etc.
</ParamField>
<ParamField path="dry_run" type="bool" optional>
The purpose is to test the prompt structure without actually running LLM inference. Defaults to `False`
</ParamField>
<ParamField path="where" type="dict" optional>
A dictionary of key-value pairs to filter the chunks from the vector database. Defaults to `None`
</ParamField>
<ParamField path="citations" type="bool" optional>
Return citations along with the LLM answer. Defaults to `False`
</ParamField>
### Returns
<ResponseField name="answer" type="str | tuple">
If `citations=False`, return a stringified answer to the question asked. <br />
If `citations=True`, returns a tuple with answer and citations respectively.
</ResponseField>
## Usage
### With citations
If you want to get the answer to question and return both answer and citations, use the following code snippet:
```python With Citations
from embedchain import Pipeline as App
# Initialize app
app = App()
# Add data source
app.add("https://www.forbes.com/profile/elon-musk")
# Get relevant answer for your query
answer, sources = app.chat("What is the net worth of Elon?", citations=True)
print(answer)
# Answer: The net worth of Elon Musk is $221.9 billion.
print(sources)
# [
# (
# 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...',
# 'https://www.forbes.com/profile/elon-musk',
# '4651b266--4aa78839fe97'
# ),
# (
# '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...',
# 'https://www.forbes.com/profile/elon-musk',
# '4651b266--4aa78839fe97'
# ),
# (
# 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...',
# 'https://www.forbes.com/profile/elon-musk',
# '4651b266--4aa78839fe97'
# )
# ]
```
<Note>
When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order):
1. source chunk
2. link of the source document
3. document id (used for book keeping purposes)
</Note>
### Without citations
If you just want to return answers and don't want to return citations, you can use the following example:
```python Without Citations
from embedchain import Pipeline as App
# Initialize app
app = App()
# Add data source
app.add("https://www.forbes.com/profile/elon-musk")
# Chat on your data using `.chat()`
answer = app.chat("What is the net worth of Elon?")
print(answer)
# Answer: The net worth of Elon Musk is $221.9 billion.
```

View File

@@ -0,0 +1,31 @@
---
title: 🚀 deploy
---
Using the `deploy()` method, Embedchain allows developers to easily launch their LLM-powered applications on the [Embedchain Platform](https://app.embedchain.ai). This platform facilitates seamless access to your data's context via a free and user-friendly REST API. Once your pipeline is deployed, you can update your data sources at any time.
The `deploy()` method not only deploys your pipeline but also efficiently manages LLMs, vector databases, embedding models, and data syncing, enabling you to focus on querying, chatting, or searching without the hassle of infrastructure management.
## Usage
```python
from embedchain import Pipeline as App
# Initialize app
app = App()
# Add data source
app.add("https://www.forbes.com/profile/elon-musk")
# Deploy your pipeline to Embedchain Platform
app.deploy()
# 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
# ec-xxxxxx
# 🛠️ Creating pipeline on the platform...
# 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
# 🛠️ Adding data to your pipeline...
# ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
```

View File

@@ -0,0 +1,130 @@
---
title: "Pipeline"
---
Create a RAG pipeline object on Embedchain. This is the main entrypoint for a developer to interact with Embedchain APIs. A pipeline configures the llm, vector database, embedding model, and retrieval strategy of your choice.
### Attributes
<ParamField path="local_id" type="str">
Pipeline ID
</ParamField>
<ParamField path="name" type="str" optional>
Name of the pipeline
</ParamField>
<ParamField path="config" type="BaseConfig">
Configuration of the pipeline
</ParamField>
<ParamField path="llm" type="BaseLlm">
Configured LLM for the RAG pipeline
</ParamField>
<ParamField path="db" type="BaseVectorDB">
Configured vector database for the RAG pipeline
</ParamField>
<ParamField path="embedding_model" type="BaseEmbedder">
Configured embedding model for the RAG pipeline
</ParamField>
<ParamField path="chunker" type="ChunkerConfig">
Chunker configuration
</ParamField>
<ParamField path="client" type="Client" optional>
Client object (used to deploy a pipeline to Embedchain platform)
</ParamField>
<ParamField path="logger" type="logging.Logger">
Logger object
</ParamField>
## Usage
You can create an embedchain pipeline instance using the following methods:
### Default setting
```python Code Example
from embedchain import Pipeline as App
app = App()
```
### Python Dict
```python Code Example
from embedchain import Pipeline as App
config_dict = {
'llm': {
'provider': 'gpt4all',
'config': {
'model': 'orca-mini-3b-gguf2-q4_0.gguf',
'temperature': 0.5,
'max_tokens': 1000,
'top_p': 1,
'stream': False
}
},
'embedder': {
'provider': 'gpt4all'
}
}
# load llm configuration from config dict
app = App.from_config(config=config_dict)
```
### YAML Config
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```
```yaml config.yaml
llm:
provider: gpt4all
config:
model: 'orca-mini-3b-gguf2-q4_0.gguf'
temperature: 0.5
max_tokens: 1000
top_p: 1
stream: false
embedder:
provider: gpt4all
```
</CodeGroup>
### JSON Config
<CodeGroup>
```python main.py
from embedchain import Pipeline as App
# load llm configuration from config.json file
app = App.from_config(config_path="config.json")
```
```json config.json
{
"llm": {
"provider": "gpt4all",
"config": {
"model": "orca-mini-3b-gguf2-q4_0.gguf",
"temperature": 0.5,
"max_tokens": 1000,
"top_p": 1,
"stream": false
}
},
"embedder": {
"provider": "gpt4all"
}
}
```
</CodeGroup>

View File

@@ -0,0 +1,97 @@
---
title: '❓ query'
---
`.query()` method empowers developers to ask questions and receive relevant answers through a user-friendly query API. Function signature is given below:
### Parameters
<ParamField path="input_query" type="str">
Question to ask
</ParamField>
<ParamField path="config" type="BaseLlmConfig" optional>
Configure different llm settings such as prompt, temprature, number_documents etc.
</ParamField>
<ParamField path="dry_run" type="bool" optional>
The purpose is to test the prompt structure without actually running LLM inference. Defaults to `False`
</ParamField>
<ParamField path="where" type="dict" optional>
A dictionary of key-value pairs to filter the chunks from the vector database. Defaults to `None`
</ParamField>
<ParamField path="citations" type="bool" optional>
Return citations along with the LLM answer. Defaults to `False`
</ParamField>
### Returns
<ResponseField name="answer" type="str | tuple">
If `citations=False`, return a stringified answer to the question asked. <br />
If `citations=True`, returns a tuple with answer and citations respectively.
</ResponseField>
## Usage
### With citations
If you want to get the answer to question and return both answer and citations, use the following code snippet:
```python With Citations
from embedchain import Pipeline as App
# Initialize app
app = App()
# Add data source
app.add("https://www.forbes.com/profile/elon-musk")
# Get relevant answer for your query
answer, sources = app.query("What is the net worth of Elon?", citations=True)
print(answer)
# Answer: The net worth of Elon Musk is $221.9 billion.
print(sources)
# [
# (
# 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...',
# 'https://www.forbes.com/profile/elon-musk',
# '4651b266--4aa78839fe97'
# ),
# (
# '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...',
# 'https://www.forbes.com/profile/elon-musk',
# '4651b266--4aa78839fe97'
# ),
# (
# 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...',
# 'https://www.forbes.com/profile/elon-musk',
# '4651b266--4aa78839fe97'
# )
# ]
```
<Note>
When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order):
1. source chunk
2. link of the source document
3. document id (used for book keeping purposes)
</Note>
### Without citations
If you just want to return answers and don't want to return citations, you can use the following example:
```python Without Citations
from embedchain import Pipeline as App
# Initialize app
app = App()
# Add data source
app.add("https://www.forbes.com/profile/elon-musk")
# Get relevant answer for your query
answer = app.query("What is the net worth of Elon?")
print(answer)
# Answer: The net worth of Elon Musk is $221.9 billion.
```

View File

@@ -0,0 +1,17 @@
---
title: 🔄 reset
---
`reset()` method allows you to wipe the data from your RAG application and start from scratch.
## Usage
```python
from embedchain import Pipeline as App
app = App()
app.add("https://www.forbes.com/profile/elon-musk")
# Reset the app
app.reset()
```

View File

@@ -0,0 +1,51 @@
---
title: '🔍 search'
---
`.search()` enables you to uncover the most pertinent context by performing a semantic search across your data sources based on a given query. Refer to the function signature below:
### Parameters
<ParamField path="query" type="str">
Question
</ParamField>
<ParamField path="num_documents" type="int" optional>
Number of relevant documents to fetch. Defaults to `3`
</ParamField>
### Returns
<ResponseField name="answer" type="dict">
Return list of dictionaries that contain the relevant chunk and their source information.
</ResponseField>
## Usage
Refer to the following example on how to use the search api:
```python Code example
from embedchain import Pipeline as App
# Initialize app
app = App()
# Add data source
app.add("https://www.forbes.com/profile/elon-musk")
# Get relevant context using semantic search
context = app.search("What is the net worth of Elon?", num_documents=2)
print(context)
# Context:
# [
# {
# 'context': 'Elon Musk PROFILEElon MuskCEO, Tesla$221.9BReal Time Net Worthas of 10/29/23Reflects change since 5 pm ET of prior trading day. 1 in the world todayPhoto by Martin Schoeller for ForbesAbout Elon MuskElon Musk cofounded six companies, including electric car maker Tesla, rocket producer SpaceX and tunneling startup Boring Company.He owns about 21% of Tesla between stock and options, but has pledged more than half his shares as collateral for personal loans of up to $3.5 billion.SpaceX, founded in',
# 'source': 'https://www.forbes.com/profile/elon-musk',
# 'document_id': 'some_document_id'
# },
# {
# 'context': 'company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes Lists 1Forbes 400 (2023)The Richest Person In Every State (2023) 2Billionaires (2023) 1Innovative Leaders (2019) 25Powerful People (2018) 12Richest In Tech (2017)Global Game Changers (2016)More ListsPersonal StatsAge52Source of WealthTesla, SpaceX, Self MadeSelf-Made Score8Philanthropy Score1ResidenceAustin, TexasCitizenshipUnited StatesMarital StatusSingleChildren11EducationBachelor of Arts/Science, University',
# 'source': 'https://www.forbes.com/profile/elon-musk',
# 'document_id': 'some_document_id'
# }
# ]
```