[Refactor] Change evaluation script path (#1165)

2024-01-12 21:29:59 +05:30
parent 862ff6cca6
commit affe319460
21 changed files with 50 additions and 45 deletions
--- a/docs/api-reference/app/add.mdx
+++ b/docs/api-reference/app/add.mdx
@@ -0,0 +1,44 @@
+---
+title: '📊 add'
+---
+
+`add()` method is used to load the data sources from different data sources to a RAG pipeline. You can find the signature below:
+
+### Parameters
+
+<ParamField path="source" type="str">
+    The data to embed, can be a URL, local file or raw content, depending on the data type.. You can find the full list of supported data sources [here](/components/data-sources/overview).
+</ParamField>
+<ParamField path="data_type" type="str" optional>
+    Type of data source. It can be automatically detected but user can force what data type to load as.
+</ParamField>
+<ParamField path="metadata" type="dict" optional>
+    Any metadata that you want to store with the data source. Metadata is generally really useful for doing metadata filtering on top of semantic search to yield faster search and better results.
+</ParamField>
+
+## Usage
+
+### Load data from webpage
+
+```python Code example
+from embedchain import App
+
+app = App()
+app.add("https://www.forbes.com/profile/elon-musk")
+# Inserting batches in chromadb: 100%|███████████████| 1/1 [00:00<00:00,  1.19it/s]
+# Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 4
+```
+
+### Load data from sitemap
+
+```python Code example
+from embedchain import App
+
+app = App()
+app.add("https://python.langchain.com/sitemap.xml", data_type="sitemap")
+# Loading pages: 100%|█████████████| 1108/1108 [00:47<00:00, 23.17it/s]
+# Inserting batches in chromadb: 100%|█████████| 111/111 [04:41<00:00,  2.54s/it]
+# Successfully saved https://python.langchain.com/sitemap.xml (DataType.SITEMAP). New chunks count: 11024
+```
+
+You can find complete list of supported data sources [here](/components/data-sources/overview).
--- a/docs/api-reference/app/chat.mdx
+++ b/docs/api-reference/app/chat.mdx
@@ -0,0 +1,131 @@
+---
+title: '💬 chat'
+---
+
+`chat()` method allows you to chat over your data sources using a user-friendly chat API. You can find the signature below:
+
+### Parameters
+
+<ParamField path="input_query" type="str">
+    Question to ask
+</ParamField>
+<ParamField path="config" type="BaseLlmConfig" optional>
+    Configure different llm settings such as prompt, temprature, number_documents etc.
+</ParamField>
+<ParamField path="dry_run" type="bool" optional>
+    The purpose is to test the prompt structure without actually running LLM inference. Defaults to `False`
+</ParamField>
+<ParamField path="where" type="dict" optional>
+    A dictionary of key-value pairs to filter the chunks from the vector database. Defaults to `None`
+</ParamField>
+<ParamField path="session_id" type="str" optional>
+    Session ID of the chat. This can be used to maintain chat history of different user sessions. Default value: `default`
+</ParamField>
+<ParamField path="citations" type="bool" optional>
+    Return citations along with the LLM answer. Defaults to `False`
+</ParamField>
+
+### Returns
+
+<ResponseField name="answer" type="str | tuple">
+  If `citations=False`, return a stringified answer to the question asked. <br />
+  If `citations=True`, returns a tuple with answer and citations respectively.
+</ResponseField>
+
+## Usage
+
+### With citations
+
+If you want to get the answer to question and return both answer and citations, use the following code snippet:
+
+```python With Citations
+from embedchain import App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Get relevant answer for your query
+answer, sources = app.chat("What is the net worth of Elon?", citations=True)
+print(answer)
+# Answer: The net worth of Elon Musk is $221.9 billion.
+
+print(sources)
+# [
+#    (
+#        'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...',
+#        {
+#           'url': 'https://www.forbes.com/profile/elon-musk', 
+#           'score': 0.89,
+#           ...
+#        }
+#    ),
+#    (
+#        '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...',
+#        {
+#           'url': 'https://www.forbes.com/profile/elon-musk', 
+#           'score': 0.81,
+#           ...
+#        }
+#    ),
+#    (
+#        'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...',
+#        {
+#           'url': 'https://www.forbes.com/profile/elon-musk', 
+#           'score': 0.73,
+#           ...
+#        }
+#    )
+# ]
+```
+
+<Note>
+When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has two elements (in the following order):
+1. source chunk
+2. dictionary with metadata about the source chunk
+    - `url`: url of the source
+    - `doc_id`: document id (used for book keeping purposes)
+    - `score`: score of the source chunk with respect to the question
+    - other metadata you might have added at the time of adding the source
+</Note>
+
+
+### Without citations
+
+If you just want to return answers and don't want to return citations, you can use the following example:
+
+```python Without Citations
+from embedchain import App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Chat on your data using `.chat()`
+answer = app.chat("What is the net worth of Elon?")
+print(answer)
+# Answer: The net worth of Elon Musk is $221.9 billion.
+```
+
+### With session id
+
+If you want to maintain chat sessions for different users, you can simply pass the `session_id` keyword argument. See the example below:
+
+```python With session id
+from embedchain import App
+
+app = App()
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Chat on your data using `.chat()`
+app.chat("What is the net worth of Elon Musk?", session_id="user1")
+# 'The net worth of Elon Musk is $250.8 billion.'
+app.chat("What is the net worth of Bill Gates?", session_id="user2")
+# "I don't know the current net worth of Bill Gates."
+app.chat("What was my last question", session_id="user1")
+# 'Your last question was "What is the net worth of Elon Musk?"'
+```
--- a/docs/api-reference/app/delete.mdx
+++ b/docs/api-reference/app/delete.mdx
@@ -0,0 +1,19 @@
+---
+title: 🗑 delete
+---
+
+`delete_session_chat_history()` method allows you to delete all previous messages in a chat history.
+
+## Usage
+
+```python
+from embedchain import App
+
+app = App()
+
+app.add("https://www.forbes.com/profile/elon-musk")
+
+app.chat("What is the net worth of Elon Musk?")
+
+app.delete_session_chat_history()
+```
--- a/docs/api-reference/app/deploy.mdx
+++ b/docs/api-reference/app/deploy.mdx
@@ -0,0 +1,31 @@
+---
+title: 🚀 deploy
+---
+
+Using the `deploy()` method, Embedchain allows developers to easily launch their LLM-powered applications on the [Embedchain Platform](https://app.embedchain.ai). This platform facilitates seamless access to your data's context via a free and user-friendly REST API. Once your pipeline is deployed, you can update your data sources at any time.
+
+The `deploy()` method not only deploys your pipeline but also efficiently manages LLMs, vector databases, embedding models, and data syncing, enabling you to focus on querying, chatting, or searching without the hassle of infrastructure management.
+
+## Usage
+
+```python
+from embedchain import App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Deploy your pipeline to Embedchain Platform
+app.deploy()
+
+# 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
+# ec-xxxxxx
+
+# 🛠️ Creating pipeline on the platform...
+# 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
+
+# 🛠️ Adding data to your pipeline...
+# ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
+```
--- a/docs/api-reference/app/evaluate.mdx
+++ b/docs/api-reference/app/evaluate.mdx
@@ -0,0 +1,41 @@
+---
+title: '📝 evaluate'
+---
+
+`evaluate()` method is used to evaluate the performance of a RAG app. You can find the signature below:
+
+### Parameters
+
+<ParamField path="question" type="Union[str, list[str]]">
+    A question or a list of questions to evaluate your app on.
+</ParamField>
+<ParamField path="metrics" type="Optional[list[Union[BaseMetric, str]]]" optional>
+    The metrics to evaluate your app on. Defaults to all metrics: `["context_relevancy", "answer_relevancy", "groundedness"]`
+</ParamField>
+<ParamField path="num_workers" type="int" optional>
+    Specify the number of threads to use for parallel processing.
+</ParamField>
+
+### Returns
+
+<ResponseField name="metrics" type="dict">
+    Returns the metrics you have chosen to evaluate your app on as a dictionary.
+</ResponseField>
+
+## Usage
+
+```python
+from embedchain import App
+
+app = App()
+
+# add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# run evaluation
+app.evaluate("what is the net worth of Elon Musk?")
+# {'answer_relevancy': 0.958019958036268, 'context_relevancy': 0.12903225806451613}
+
+# or
+# app.evaluate(["what is the net worth of Elon Musk?", "which companies does Elon Musk own?"])
+```
--- a/docs/api-reference/app/overview.mdx
+++ b/docs/api-reference/app/overview.mdx
@@ -0,0 +1,130 @@
+---
+title: "App"
+---
+
+Create a RAG app object on Embedchain. This is the main entrypoint for a developer to interact with Embedchain APIs. An app configures the llm, vector database, embedding model, and retrieval strategy of your choice.
+
+### Attributes
+
+<ParamField path="local_id" type="str">
+    App ID
+</ParamField>
+<ParamField path="name" type="str" optional>
+    Name of the app
+</ParamField>
+<ParamField path="config" type="BaseConfig">
+    Configuration of the app
+</ParamField>
+<ParamField path="llm" type="BaseLlm">
+    Configured LLM for the RAG app
+</ParamField>
+<ParamField path="db" type="BaseVectorDB">
+    Configured vector database for the RAG app
+</ParamField>
+<ParamField path="embedding_model" type="BaseEmbedder">
+    Configured embedding model for the RAG app
+</ParamField>
+<ParamField path="chunker" type="ChunkerConfig">
+    Chunker configuration
+</ParamField>
+<ParamField path="client" type="Client" optional>
+    Client object (used to deploy an app to Embedchain platform)
+</ParamField>
+<ParamField path="logger" type="logging.Logger">
+    Logger object
+</ParamField>
+
+## Usage
+
+You can create an app instance using the following methods:
+
+### Default setting
+
+```python Code Example
+from embedchain import App
+app = App()
+```
+
+
+### Python Dict
+
+```python Code Example
+from embedchain import App
+
+config_dict = {
+  'llm': {
+    'provider': 'gpt4all',
+    'config': {
+      'model': 'orca-mini-3b-gguf2-q4_0.gguf',
+      'temperature': 0.5,
+      'max_tokens': 1000,
+      'top_p': 1,
+      'stream': False
+    }
+  },
+  'embedder': {
+    'provider': 'gpt4all'
+  }
+}
+
+# load llm configuration from config dict
+app = App.from_config(config=config_dict)
+```
+
+### YAML Config
+
+<CodeGroup>
+
+```python main.py
+from embedchain import App
+
+# load llm configuration from config.yaml file
+app = App.from_config(config_path="config.yaml")
+```
+
+```yaml config.yaml
+llm:
+  provider: gpt4all
+  config:
+    model: 'orca-mini-3b-gguf2-q4_0.gguf'
+    temperature: 0.5
+    max_tokens: 1000
+    top_p: 1
+    stream: false
+
+embedder:
+  provider: gpt4all
+```
+
+</CodeGroup>
+
+### JSON Config
+
+<CodeGroup>
+
+```python main.py
+from embedchain import App
+
+# load llm configuration from config.json file
+app = App.from_config(config_path="config.json")
+```
+
+```json config.json
+{
+  "llm": {
+    "provider": "gpt4all",
+    "config": {
+      "model": "orca-mini-3b-gguf2-q4_0.gguf",
+      "temperature": 0.5,
+      "max_tokens": 1000,
+      "top_p": 1,
+      "stream": false
+    }
+  },
+  "embedder": {
+    "provider": "gpt4all"
+  }
+}
+```
+
+</CodeGroup>
--- a/docs/api-reference/app/query.mdx
+++ b/docs/api-reference/app/query.mdx
@@ -0,0 +1,109 @@
+---
+title: '❓ query'
+---
+
+`.query()` method empowers developers to ask questions and receive relevant answers through a user-friendly query API. Function signature is given below:
+
+### Parameters
+
+<ParamField path="input_query" type="str">
+    Question to ask
+</ParamField>
+<ParamField path="config" type="BaseLlmConfig" optional>
+    Configure different llm settings such as prompt, temprature, number_documents etc.
+</ParamField>
+<ParamField path="dry_run" type="bool" optional>
+    The purpose is to test the prompt structure without actually running LLM inference. Defaults to `False`
+</ParamField>
+<ParamField path="where" type="dict" optional>
+    A dictionary of key-value pairs to filter the chunks from the vector database. Defaults to `None`
+</ParamField>
+<ParamField path="citations" type="bool" optional>
+    Return citations along with the LLM answer. Defaults to `False`
+</ParamField>
+
+### Returns
+
+<ResponseField name="answer" type="str | tuple">
+  If `citations=False`, return a stringified answer to the question asked. <br />
+  If `citations=True`, returns a tuple with answer and citations respectively.
+</ResponseField>
+
+## Usage
+
+### With citations
+
+If you want to get the answer to question and return both answer and citations, use the following code snippet:
+
+```python With Citations
+from embedchain import App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Get relevant answer for your query
+answer, sources = app.query("What is the net worth of Elon?", citations=True)
+print(answer)
+# Answer: The net worth of Elon Musk is $221.9 billion.
+
+print(sources)
+# [
+#    (
+#        'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...',
+#        {
+#           'url': 'https://www.forbes.com/profile/elon-musk', 
+#           'score': 0.89,
+#           ...
+#        }
+#    ),
+#    (
+#        '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...',
+#        {
+#           'url': 'https://www.forbes.com/profile/elon-musk', 
+#           'score': 0.81,
+#           ...
+#        }
+#    ),
+#    (
+#        'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...',
+#        {
+#           'url': 'https://www.forbes.com/profile/elon-musk', 
+#           'score': 0.73,
+#           ...
+#        }
+#    )
+# ]
+```
+
+<Note>
+When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has two elements (in the following order):
+1. source chunk
+2. dictionary with metadata about the source chunk
+    - `url`: url of the source
+    - `doc_id`: document id (used for book keeping purposes)
+    - `score`: score of the source chunk with respect to the question
+    - other metadata you might have added at the time of adding the source
+</Note>
+
+### Without citations
+
+If you just want to return answers and don't want to return citations, you can use the following example:
+
+```python Without Citations
+from embedchain import App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Get relevant answer for your query
+answer = app.query("What is the net worth of Elon?")
+print(answer)
+# Answer: The net worth of Elon Musk is $221.9 billion.
+```
+
--- a/docs/api-reference/app/reset.mdx
+++ b/docs/api-reference/app/reset.mdx
@@ -0,0 +1,17 @@
+---
+title: 🔄 reset
+---
+
+`reset()` method allows you to wipe the data from your RAG application and start from scratch.
+
+## Usage
+
+```python
+from embedchain import App
+
+app = App()
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Reset the app
+app.reset()
+```
--- a/docs/api-reference/app/search.mdx
+++ b/docs/api-reference/app/search.mdx
@@ -0,0 +1,57 @@
+---
+title: '🔍 search'
+---
+
+`.search()` enables you to uncover the most pertinent context by performing a semantic search across your data sources based on a given query. Refer to the function signature below:
+
+### Parameters
+
+<ParamField path="query" type="str">
+    Question
+</ParamField>
+<ParamField path="num_documents" type="int" optional>
+    Number of relevant documents to fetch. Defaults to `3`
+</ParamField>
+
+### Returns
+
+<ResponseField name="answer" type="dict">
+    Return list of dictionaries that contain the relevant chunk and their source information.
+</ResponseField>
+
+## Usage
+
+Refer to the following example on how to use the search api:
+
+```python Code example
+from embedchain import App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Get relevant context using semantic search
+context = app.search("What is the net worth of Elon?", num_documents=2)
+print(context)
+# Context:
+# [
+#     {
+#         'context': 'Elon Musk PROFILEElon MuskCEO, Tesla$221.9BReal Time Net Worth ...',
+#         'metadata': {
+#             'source': 'https://www.forbes.com/profile/elon-musk',
+#             'document_id': 'some_document_id',
+#             'score': 0.404,
+#         }
+#     },
+#     {
+#         'context': 'company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH ...',
+#         'metadata': {
+#             'source': 'https://www.forbes.com/profile/elon-musk',
+#             'document_id': 'some_document_id',
+#             'score': 0.435,
+#         }
+#     }
+# ]
+```