diff --git a/README.md b/README.md index 46ee4fba..c2d6e660 100644 --- a/README.md +++ b/README.md @@ -98,8 +98,8 @@ Comprehensive guides and API documentation are available to help you get the mos - [Getting Started](https://docs.embedchain.ai/get-started/quickstart) - [Introduction](https://docs.embedchain.ai/get-started/introduction#what-is-embedchain) -- [Examples](https://docs.embedchain.ai/get-started/examples) -- [Supported data types](https://docs.embedchain.ai/data-sources/) +- [Examples](https://docs.embedchain.ai/examples) +- [Supported data types](https://docs.embedchain.ai/components/data-sources/overview) ## πŸ”— Join the Community diff --git a/docs/_snippets/get-help.mdx b/docs/_snippets/get-help.mdx index c6193f0d..37f9a3f2 100644 --- a/docs/_snippets/get-help.mdx +++ b/docs/_snippets/get-help.mdx @@ -1,11 +1,11 @@ + + Schedule a call + Join our slack community Join our discord community - - Schedule a call with Embedchain founder - diff --git a/docs/advanced/configuration.mdx b/docs/api-reference/advanced/configuration.mdx similarity index 95% rename from docs/advanced/configuration.mdx rename to docs/api-reference/advanced/configuration.mdx index 2893e5b4..1fb3e2ff 100644 --- a/docs/advanced/configuration.mdx +++ b/docs/api-reference/advanced/configuration.mdx @@ -1,14 +1,14 @@ --- -title: 'βš™οΈ Custom configurations' +title: 'Custom configurations' --- -Embedchain is made to work out of the box. However, for advanced users we're also offering configuration options. All of these configuration options are optional and have sane defaults. +Embedchain offers several configuration options for your LLM, vector database, and embedding model. All of these configuration options are optional and have sane defaults. You can configure different components of your app (`llm`, `embedding model`, or `vector database`) through a simple yaml configuration that Embedchain offers. Here is a generic full-stack example of the yaml config: -Embedchain applications are configurable using YAML file, JSON file or by directly passing the config dictionary. +Embedchain applications are configurable using YAML file, JSON file or by directly passing the config dictionary. Checkout the [docs here](/api-reference/pipeline/overview#usage) on how to use other formats. diff --git a/docs/api-reference/overview.mdx b/docs/api-reference/overview.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/api-reference/pipeline/add.mdx b/docs/api-reference/pipeline/add.mdx new file mode 100644 index 00000000..53ae5729 --- /dev/null +++ b/docs/api-reference/pipeline/add.mdx @@ -0,0 +1,44 @@ +--- +title: 'πŸ“Š add' +--- + +`add()` method is used to load the data sources from different data sources to a RAG pipeline. You can find the signature below: + +### Parameters + + + The data to embed, can be a URL, local file or raw content, depending on the data type.. You can find the full list of supported data sources [here](/components/data-sources/overview). + + + Type of data source. It can be automatically detected but user can force what data type to load as. + + + Any metadata that you want to store with the data source. Metadata is generally really useful for doing metadata filtering on top of semantic search to yield faster search and better results. + + +## Usage + +### Load data from webpage + +```python Code example +from embedchain import Pipeline as App + +app = App() +app.add("https://www.forbes.com/profile/elon-musk") +# Inserting batches in chromadb: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 1.19it/s] +# Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 4 +``` + +### Load data from sitemap + +```python Code example +from embedchain import Pipeline as App + +app = App() +app.add("https://python.langchain.com/sitemap.xml", data_type="sitemap") +# Loading pages: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1108/1108 [00:47<00:00, 23.17it/s] +# Inserting batches in chromadb: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 111/111 [04:41<00:00, 2.54s/it] +# Successfully saved https://python.langchain.com/sitemap.xml (DataType.SITEMAP). New chunks count: 11024 +``` + +You can find complete list of supported data sources [here](/components/data-sources/overview). diff --git a/docs/api-reference/pipeline/chat.mdx b/docs/api-reference/pipeline/chat.mdx new file mode 100644 index 00000000..5a606529 --- /dev/null +++ b/docs/api-reference/pipeline/chat.mdx @@ -0,0 +1,97 @@ +--- +title: 'πŸ’¬ chat' +--- + +`chat()` method allows you to chat over your data sources using a user-friendly chat API. You can find the signature below: + +### Parameters + + + Question to ask + + + Configure different llm settings such as prompt, temprature, number_documents etc. + + + The purpose is to test the prompt structure without actually running LLM inference. Defaults to `False` + + + A dictionary of key-value pairs to filter the chunks from the vector database. Defaults to `None` + + + Return citations along with the LLM answer. Defaults to `False` + + +### Returns + + + If `citations=False`, return a stringified answer to the question asked.
+ If `citations=True`, returns a tuple with answer and citations respectively. +
+ +## Usage + +### With citations + +If you want to get the answer to question and return both answer and citations, use the following code snippet: + +```python With Citations +from embedchain import Pipeline as App + +# Initialize app +app = App() + +# Add data source +app.add("https://www.forbes.com/profile/elon-musk") + +# Get relevant answer for your query +answer, sources = app.chat("What is the net worth of Elon?", citations=True) +print(answer) +# Answer: The net worth of Elon Musk is $221.9 billion. + +print(sources) +# [ +# ( +# 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...', +# 'https://www.forbes.com/profile/elon-musk', +# '4651b266--4aa78839fe97' +# ), +# ( +# '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...', +# 'https://www.forbes.com/profile/elon-musk', +# '4651b266--4aa78839fe97' +# ), +# ( +# 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...', +# 'https://www.forbes.com/profile/elon-musk', +# '4651b266--4aa78839fe97' +# ) +# ] +``` + + +When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order): +1. source chunk +2. link of the source document +3. document id (used for book keeping purposes) + + + +### Without citations + +If you just want to return answers and don't want to return citations, you can use the following example: + +```python Without Citations +from embedchain import Pipeline as App + +# Initialize app +app = App() + +# Add data source +app.add("https://www.forbes.com/profile/elon-musk") + +# Chat on your data using `.chat()` +answer = app.chat("What is the net worth of Elon?") +print(answer) +# Answer: The net worth of Elon Musk is $221.9 billion. +``` diff --git a/docs/api-reference/pipeline/deploy.mdx b/docs/api-reference/pipeline/deploy.mdx new file mode 100644 index 00000000..11838086 --- /dev/null +++ b/docs/api-reference/pipeline/deploy.mdx @@ -0,0 +1,31 @@ +--- +title: πŸš€ deploy +--- + +Using the `deploy()` method, Embedchain allows developers to easily launch their LLM-powered applications on the [Embedchain Platform](https://app.embedchain.ai). This platform facilitates seamless access to your data's context via a free and user-friendly REST API. Once your pipeline is deployed, you can update your data sources at any time. + +The `deploy()` method not only deploys your pipeline but also efficiently manages LLMs, vector databases, embedding models, and data syncing, enabling you to focus on querying, chatting, or searching without the hassle of infrastructure management. + +## Usage + +```python +from embedchain import Pipeline as App + +# Initialize app +app = App() + +# Add data source +app.add("https://www.forbes.com/profile/elon-musk") + +# Deploy your pipeline to Embedchain Platform +app.deploy() + +# πŸ”‘ Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/ +# ec-xxxxxx + +# πŸ› οΈ Creating pipeline on the platform... +# πŸŽ‰πŸŽ‰πŸŽ‰ Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx + +# πŸ› οΈ Adding data to your pipeline... +# βœ… Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully. +``` diff --git a/docs/api-reference/pipeline/overview.mdx b/docs/api-reference/pipeline/overview.mdx new file mode 100644 index 00000000..d69d80c3 --- /dev/null +++ b/docs/api-reference/pipeline/overview.mdx @@ -0,0 +1,130 @@ +--- +title: "Pipeline" +--- + +Create a RAG pipeline object on Embedchain. This is the main entrypoint for a developer to interact with Embedchain APIs. A pipeline configures the llm, vector database, embedding model, and retrieval strategy of your choice. + +### Attributes + + + Pipeline ID + + + Name of the pipeline + + + Configuration of the pipeline + + + Configured LLM for the RAG pipeline + + + Configured vector database for the RAG pipeline + + + Configured embedding model for the RAG pipeline + + + Chunker configuration + + + Client object (used to deploy a pipeline to Embedchain platform) + + + Logger object + + +## Usage + +You can create an embedchain pipeline instance using the following methods: + +### Default setting + +```python Code Example +from embedchain import Pipeline as App +app = App() +``` + + +### Python Dict + +```python Code Example +from embedchain import Pipeline as App + +config_dict = { + 'llm': { + 'provider': 'gpt4all', + 'config': { + 'model': 'orca-mini-3b-gguf2-q4_0.gguf', + 'temperature': 0.5, + 'max_tokens': 1000, + 'top_p': 1, + 'stream': False + } + }, + 'embedder': { + 'provider': 'gpt4all' + } +} + +# load llm configuration from config dict +app = App.from_config(config=config_dict) +``` + +### YAML Config + + + +```python main.py +from embedchain import Pipeline as App + +# load llm configuration from config.yaml file +app = App.from_config(config_path="config.yaml") +``` + +```yaml config.yaml +llm: + provider: gpt4all + config: + model: 'orca-mini-3b-gguf2-q4_0.gguf' + temperature: 0.5 + max_tokens: 1000 + top_p: 1 + stream: false + +embedder: + provider: gpt4all +``` + + + +### JSON Config + + + +```python main.py +from embedchain import Pipeline as App + +# load llm configuration from config.json file +app = App.from_config(config_path="config.json") +``` + +```json config.json +{ + "llm": { + "provider": "gpt4all", + "config": { + "model": "orca-mini-3b-gguf2-q4_0.gguf", + "temperature": 0.5, + "max_tokens": 1000, + "top_p": 1, + "stream": false + } + }, + "embedder": { + "provider": "gpt4all" + } +} +``` + + \ No newline at end of file diff --git a/docs/api-reference/pipeline/query.mdx b/docs/api-reference/pipeline/query.mdx new file mode 100644 index 00000000..5034c81d --- /dev/null +++ b/docs/api-reference/pipeline/query.mdx @@ -0,0 +1,97 @@ +--- +title: '❓ query' +--- + +`.query()` method empowers developers to ask questions and receive relevant answers through a user-friendly query API. Function signature is given below: + +### Parameters + + + Question to ask + + + Configure different llm settings such as prompt, temprature, number_documents etc. + + + The purpose is to test the prompt structure without actually running LLM inference. Defaults to `False` + + + A dictionary of key-value pairs to filter the chunks from the vector database. Defaults to `None` + + + Return citations along with the LLM answer. Defaults to `False` + + +### Returns + + + If `citations=False`, return a stringified answer to the question asked.
+ If `citations=True`, returns a tuple with answer and citations respectively. +
+ +## Usage + +### With citations + +If you want to get the answer to question and return both answer and citations, use the following code snippet: + +```python With Citations +from embedchain import Pipeline as App + +# Initialize app +app = App() + +# Add data source +app.add("https://www.forbes.com/profile/elon-musk") + +# Get relevant answer for your query +answer, sources = app.query("What is the net worth of Elon?", citations=True) +print(answer) +# Answer: The net worth of Elon Musk is $221.9 billion. + +print(sources) +# [ +# ( +# 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...', +# 'https://www.forbes.com/profile/elon-musk', +# '4651b266--4aa78839fe97' +# ), +# ( +# '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...', +# 'https://www.forbes.com/profile/elon-musk', +# '4651b266--4aa78839fe97' +# ), +# ( +# 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...', +# 'https://www.forbes.com/profile/elon-musk', +# '4651b266--4aa78839fe97' +# ) +# ] +``` + + +When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order): +1. source chunk +2. link of the source document +3. document id (used for book keeping purposes) + + +### Without citations + +If you just want to return answers and don't want to return citations, you can use the following example: + +```python Without Citations +from embedchain import Pipeline as App + +# Initialize app +app = App() + +# Add data source +app.add("https://www.forbes.com/profile/elon-musk") + +# Get relevant answer for your query +answer = app.query("What is the net worth of Elon?") +print(answer) +# Answer: The net worth of Elon Musk is $221.9 billion. +``` + diff --git a/docs/api-reference/pipeline/reset.mdx b/docs/api-reference/pipeline/reset.mdx new file mode 100644 index 00000000..df697b19 --- /dev/null +++ b/docs/api-reference/pipeline/reset.mdx @@ -0,0 +1,17 @@ +--- +title: πŸ”„ reset +--- + +`reset()` method allows you to wipe the data from your RAG application and start from scratch. + +## Usage + +```python +from embedchain import Pipeline as App + +app = App() +app.add("https://www.forbes.com/profile/elon-musk") + +# Reset the app +app.reset() +``` diff --git a/docs/api-reference/pipeline/search.mdx b/docs/api-reference/pipeline/search.mdx new file mode 100644 index 00000000..a2e618cd --- /dev/null +++ b/docs/api-reference/pipeline/search.mdx @@ -0,0 +1,51 @@ +--- +title: 'πŸ” search' +--- + +`.search()` enables you to uncover the most pertinent context by performing a semantic search across your data sources based on a given query. Refer to the function signature below: + +### Parameters + + + Question + + + Number of relevant documents to fetch. Defaults to `3` + + +### Returns + + + Return list of dictionaries that contain the relevant chunk and their source information. + + +## Usage + +Refer to the following example on how to use the search api: + +```python Code example +from embedchain import Pipeline as App + +# Initialize app +app = App() + +# Add data source +app.add("https://www.forbes.com/profile/elon-musk") + +# Get relevant context using semantic search +context = app.search("What is the net worth of Elon?", num_documents=2) +print(context) +# Context: +# [ +# { +# 'context': 'Elon Musk PROFILEElon MuskCEO, Tesla$221.9BReal Time Net Worthas of 10/29/23Reflects change since 5 pm ET of prior trading day. 1 in the world todayPhoto by Martin Schoeller for ForbesAbout Elon MuskElon Musk cofounded six companies, including electric car maker Tesla, rocket producer SpaceX and tunneling startup Boring Company.He owns about 21% of Tesla between stock and options, but has pledged more than half his shares as collateral for personal loans of up to $3.5 billion.SpaceX, founded in', +# 'source': 'https://www.forbes.com/profile/elon-musk', +# 'document_id': 'some_document_id' +# }, +# { +# 'context': 'company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes Lists 1Forbes 400 (2023)The Richest Person In Every State (2023) 2Billionaires (2023) 1Innovative Leaders (2019) 25Powerful People (2018) 12Richest In Tech (2017)Global Game Changers (2016)More ListsPersonal StatsAge52Source of WealthTesla, SpaceX, Self MadeSelf-Made Score8Philanthropy Score1ResidenceAustin, TexasCitizenshipUnited StatesMarital StatusSingleChildren11EducationBachelor of Arts/Science, University', +# 'source': 'https://www.forbes.com/profile/elon-musk', +# 'document_id': 'some_document_id' +# } +# ] +``` diff --git a/docs/api-reference/store/ai-assistants.mdx b/docs/api-reference/store/ai-assistants.mdx new file mode 100644 index 00000000..09c6122a --- /dev/null +++ b/docs/api-reference/store/ai-assistants.mdx @@ -0,0 +1,54 @@ +--- +title: 'AI Assistant' +--- + +The `AIAssistant` class, an alternative to the OpenAI Assistant API, is designed for those who prefer using large language models (LLMs) other than those provided by OpenAI. It facilitates the creation of AI Assistants with several key benefits: + +- **Visibility into Citations**: It offers transparent access to the sources and citations used by the AI, enhancing the understanding and trustworthiness of its responses. + +- **Debugging Capabilities**: Users have the ability to delve into and debug the AI's processes, allowing for a deeper understanding and fine-tuning of its performance. + +- **Customizable Prompts**: The class provides the flexibility to modify and tailor prompts according to specific needs, enabling more precise and relevant interactions. + +- **Chain of Thought Integration**: It supports the incorporation of a 'chain of thought' approach, which helps in breaking down complex queries into simpler, sequential steps, thereby improving the clarity and accuracy of responses. + +It is ideal for those who value customization, transparency, and detailed control over their AI Assistant's functionalities. + +### Arguments + + + Name for your AI assistant + + + + How the Assistant and model should behave or respond + + + + Load existing AI Assistant. If you pass this, you don't have to pass other arguments. + + + + Existing thread id if exists + + + + Embedchain pipeline config yaml path to use. This will define the configuration of the AI Assistant (such as configuring the LLM, vector database, and embedding model) + + + + Add data sources to your assistant. You can add in the following format: `[{"source": "https://example.com", "data_type": "web_page"}]` + + + + Anonymous telemetry (doesn't collect any user information or user's files). Used to improve the Embedchain package utilization. Default is `True`. + + + +## Usage + +For detailed guidance on creating your own AI Assistant, click the link below. It provides step-by-step instructions to help you through the process: + + + Learn how to build a customized AI Assistant using the `AIAssistant` class. + diff --git a/docs/api-reference/store/openai-assistant.mdx b/docs/api-reference/store/openai-assistant.mdx new file mode 100644 index 00000000..1ab21aa1 --- /dev/null +++ b/docs/api-reference/store/openai-assistant.mdx @@ -0,0 +1,45 @@ +--- +title: 'OpenAI Assistant' +--- + +### Arguments + + + Name for your AI assistant + + + + how the Assistant and model should behave or respond + + + + Load existing OpenAI Assistant. If you pass this, you don't have to pass other arguments. + + + + Existing OpenAI thread id if exists + + + + OpenAI model to use + + + + OpenAI tools to use. Default set to `[{"type": "retrieval"}]` + + + + Add data sources to your assistant. You can add in the following format: `[{"source": "https://example.com", "data_type": "web_page"}]` + + + + Anonymous telemetry (doesn't collect any user information or user's files). Used to improve the Embedchain package utilization. Default is `True`. + + +## Usage + +For detailed guidance on creating your own OpenAI Assistant, click the link below. It provides step-by-step instructions to help you through the process: + + + Learn how to build an OpenAI Assistant using the `OpenAIAssistant` class. + diff --git a/docs/data-sources/beehiiv.mdx b/docs/components/data-sources/beehiiv.mdx similarity index 100% rename from docs/data-sources/beehiiv.mdx rename to docs/components/data-sources/beehiiv.mdx diff --git a/docs/data-sources/csv.mdx b/docs/components/data-sources/csv.mdx similarity index 100% rename from docs/data-sources/csv.mdx rename to docs/components/data-sources/csv.mdx diff --git a/docs/data-sources/custom.mdx b/docs/components/data-sources/custom.mdx similarity index 100% rename from docs/data-sources/custom.mdx rename to docs/components/data-sources/custom.mdx diff --git a/docs/data-sources/data-type-handling.mdx b/docs/components/data-sources/data-type-handling.mdx similarity index 100% rename from docs/data-sources/data-type-handling.mdx rename to docs/components/data-sources/data-type-handling.mdx diff --git a/docs/data-sources/discord.mdx b/docs/components/data-sources/discord.mdx similarity index 100% rename from docs/data-sources/discord.mdx rename to docs/components/data-sources/discord.mdx diff --git a/docs/data-sources/discourse.mdx b/docs/components/data-sources/discourse.mdx similarity index 100% rename from docs/data-sources/discourse.mdx rename to docs/components/data-sources/discourse.mdx diff --git a/docs/data-sources/docs-site.mdx b/docs/components/data-sources/docs-site.mdx similarity index 100% rename from docs/data-sources/docs-site.mdx rename to docs/components/data-sources/docs-site.mdx diff --git a/docs/data-sources/docx.mdx b/docs/components/data-sources/docx.mdx similarity index 100% rename from docs/data-sources/docx.mdx rename to docs/components/data-sources/docx.mdx diff --git a/docs/data-sources/github.mdx b/docs/components/data-sources/github.mdx similarity index 100% rename from docs/data-sources/github.mdx rename to docs/components/data-sources/github.mdx diff --git a/docs/data-sources/gmail.mdx b/docs/components/data-sources/gmail.mdx similarity index 100% rename from docs/data-sources/gmail.mdx rename to docs/components/data-sources/gmail.mdx diff --git a/docs/data-sources/json.mdx b/docs/components/data-sources/json.mdx similarity index 100% rename from docs/data-sources/json.mdx rename to docs/components/data-sources/json.mdx diff --git a/docs/data-sources/mdx.mdx b/docs/components/data-sources/mdx.mdx similarity index 100% rename from docs/data-sources/mdx.mdx rename to docs/components/data-sources/mdx.mdx diff --git a/docs/data-sources/mysql.mdx b/docs/components/data-sources/mysql.mdx similarity index 100% rename from docs/data-sources/mysql.mdx rename to docs/components/data-sources/mysql.mdx diff --git a/docs/data-sources/notion.mdx b/docs/components/data-sources/notion.mdx similarity index 100% rename from docs/data-sources/notion.mdx rename to docs/components/data-sources/notion.mdx diff --git a/docs/data-sources/openapi.mdx b/docs/components/data-sources/openapi.mdx similarity index 100% rename from docs/data-sources/openapi.mdx rename to docs/components/data-sources/openapi.mdx diff --git a/docs/components/data-sources/overview.mdx b/docs/components/data-sources/overview.mdx new file mode 100644 index 00000000..ec9ecade --- /dev/null +++ b/docs/components/data-sources/overview.mdx @@ -0,0 +1,36 @@ +--- +title: Overview +--- + +Embedchain comes with built-in support for various data sources. We handle the complexity of loading unstructured data from these data sources, allowing you to easily customize your app through a user-friendly interface. + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + diff --git a/docs/data-sources/pdf-file.mdx b/docs/components/data-sources/pdf-file.mdx similarity index 100% rename from docs/data-sources/pdf-file.mdx rename to docs/components/data-sources/pdf-file.mdx diff --git a/docs/data-sources/postgres.mdx b/docs/components/data-sources/postgres.mdx similarity index 100% rename from docs/data-sources/postgres.mdx rename to docs/components/data-sources/postgres.mdx diff --git a/docs/data-sources/qna.mdx b/docs/components/data-sources/qna.mdx similarity index 100% rename from docs/data-sources/qna.mdx rename to docs/components/data-sources/qna.mdx diff --git a/docs/data-sources/sitemap.mdx b/docs/components/data-sources/sitemap.mdx similarity index 100% rename from docs/data-sources/sitemap.mdx rename to docs/components/data-sources/sitemap.mdx diff --git a/docs/data-sources/slack.mdx b/docs/components/data-sources/slack.mdx similarity index 100% rename from docs/data-sources/slack.mdx rename to docs/components/data-sources/slack.mdx diff --git a/docs/data-sources/substack.mdx b/docs/components/data-sources/substack.mdx similarity index 100% rename from docs/data-sources/substack.mdx rename to docs/components/data-sources/substack.mdx diff --git a/docs/data-sources/text.mdx b/docs/components/data-sources/text.mdx similarity index 100% rename from docs/data-sources/text.mdx rename to docs/components/data-sources/text.mdx diff --git a/docs/data-sources/web-page.mdx b/docs/components/data-sources/web-page.mdx similarity index 100% rename from docs/data-sources/web-page.mdx rename to docs/components/data-sources/web-page.mdx diff --git a/docs/data-sources/xml.mdx b/docs/components/data-sources/xml.mdx similarity index 100% rename from docs/data-sources/xml.mdx rename to docs/components/data-sources/xml.mdx diff --git a/docs/data-sources/youtube-video.mdx b/docs/components/data-sources/youtube-video.mdx similarity index 100% rename from docs/data-sources/youtube-video.mdx rename to docs/components/data-sources/youtube-video.mdx diff --git a/docs/components/retrieval-methods.mdx b/docs/components/retrieval-methods.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/contribution/dev.mdx b/docs/contribution/dev.mdx index cda2f7bc..f902b5a3 100644 --- a/docs/contribution/dev.mdx +++ b/docs/contribution/dev.mdx @@ -22,17 +22,6 @@ make lint format 5. **Create a pull request**: When you are ready to contribute your changes, submit a pull request to the EmbedChain repository. Provide a clear and descriptive title for your pull request, along with a detailed description of the changes you have made. -# Tech Stack - -embedchain is built on the following stack: - -- [Langchain](https://github.com/hwchase17/langchain) as an LLM framework to load, chunk and index data -- [OpenAI's Ada embedding model](https://platform.openai.com/docs/guides/embeddings) to create embeddings -- [OpenAI's ChatGPT API](https://platform.openai.com/docs/guides/gpt/chat-completions-api) as LLM to get answers given the context -- [Chroma](https://github.com/chroma-core/chroma) as the vector database to store embeddings -- [gpt4all](https://github.com/nomic-ai/gpt4all) as an open source LLM -- [sentence-transformers](https://huggingface.co/sentence-transformers) as open source embedding model - ## Team ### Authors diff --git a/docs/customizations/chunking.mdx b/docs/customizations/chunking.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/customizations/data-loader.mdx b/docs/customizations/data-loader.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/customizations/embedding-models.mdx b/docs/customizations/embedding-models.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/customizations/llms.mdx b/docs/customizations/llms.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/customizations/retrieval.mdx b/docs/customizations/retrieval.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/customizations/vector-databases.mdx b/docs/customizations/vector-databases.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/data-sources/overview.mdx b/docs/data-sources/overview.mdx deleted file mode 100644 index 8488ce40..00000000 --- a/docs/data-sources/overview.mdx +++ /dev/null @@ -1,36 +0,0 @@ ---- -title: Overview ---- - -Embedchain comes with built-in support for various data sources. We handle the complexity of loading unstructured data from these data sources, allowing you to easily customize your app through a user-friendly interface. - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - diff --git a/docs/community/showcase.mdx b/docs/examples/community/showcase.mdx similarity index 100% rename from docs/community/showcase.mdx rename to docs/examples/community/showcase.mdx diff --git a/docs/examples/full_stack.mdx b/docs/examples/full_stack.mdx index e4158ad2..6f0bb4fd 100644 --- a/docs/examples/full_stack.mdx +++ b/docs/examples/full_stack.mdx @@ -1,5 +1,5 @@ --- -title: '🌐 Full Stack' +title: 'Full Stack' --- The Full Stack app example can be found [here](https://github.com/embedchain/embedchain/tree/main/examples/full_stack). diff --git a/docs/get-started/openai-assistant.mdx b/docs/examples/openai-assistant.mdx similarity index 68% rename from docs/get-started/openai-assistant.mdx rename to docs/examples/openai-assistant.mdx index 2316bc66..ffd312fa 100644 --- a/docs/get-started/openai-assistant.mdx +++ b/docs/examples/openai-assistant.mdx @@ -1,5 +1,5 @@ --- -title: 'πŸ€– OpenAI Assistant' +title: 'OpenAI Assistant' --- OpenAI Logo @@ -38,40 +38,6 @@ assistant = OpenAIAssistant(assistant_id="asst_xxx") assistant = OpenAIAssistant(assistant_id="asst_xxx", thread_id="thread_xxx") ``` -### Arguments - - - Name for your AI assistant - - - - how the Assistant and model should behave or respond - - - - Load existing OpenAI Assistant. If you pass this, you don't have to pass other arguments. - - - - Existing OpenAI thread id if exists - - - - OpenAI model to use - - - - OpenAI tools to use. Default set to `[{"type": "retrieval"}]` - - - - Add data sources to your assistant. You can add in the following format: `[{"source": "https://example.com", "data_type": "web_page"}]` - - - - Anonymous telemetry (doesn't collect any user information or user's files). Used to improve the Embedchain package utilization. Default is `True`. - - ## Step-2: Add data to thread You can add any custom data source that is supported by Embedchain. Else, you can directly pass the file path on your local system and Embedchain propagates it to OpenAI Assistant. @@ -92,4 +58,3 @@ You can try it out yourself using the following Google Colab notebook: Open in Colab - diff --git a/docs/examples/opensource-assistant.mdx b/docs/examples/opensource-assistant.mdx new file mode 100644 index 00000000..f4dcaa52 --- /dev/null +++ b/docs/examples/opensource-assistant.mdx @@ -0,0 +1,51 @@ +--- +title: 'Open-Source AI Assistant' +--- + +Embedchain also provides support for creating Open-Source AI Assistants (similar to [OpenAI Assistants API](https://platform.openai.com/docs/assistants/overview)) which allows you to build AI assistants within your own applications using any LLM (OpenAI or otherwise). An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. + +At a high level, the Open-Source AI Assistants API has the following flow: + +1. Create an AI Assistant by picking a model +2. Create a Thread when a user starts a conversation +3. Add Messages to the Thread as the user ask questions +4. Run the Assistant on the Thread to trigger responses. This automatically calls the relevant tools. + +Creating an Open-Source AI Assistant is a simple 3 step process. + +## Step 1: Instantiate AI Assistant + +```python Initialize +from embedchain.store.assistants import AIAssistant + +assistant = AIAssistant( + name="My Assistant", + data_sources=[{"source": "https://www.youtube.com/watch?v=U9mJuUkhUzk"}]) +``` + +If you want to use the existing assistant, you can do something like this: + +```python Initialize +# Load an assistant and create a new thread +assistant = AIAssistant(assistant_id="asst_xxx") + +# Load a specific thread for an assistant +assistant = AIAssistant(assistant_id="asst_xxx", thread_id="thread_xxx") +``` + +## Step-2: Add data to thread + +You can add any custom data source that is supported by Embedchain. Else, you can directly pass the file path on your local system and Embedchain propagates it to OpenAI Assistant. + +```python Add data +assistant.add("/path/to/file.pdf") +assistant.add("https://www.youtube.com/watch?v=U9mJuUkhUzk") +assistant.add("https://openai.com/blog/new-models-and-developer-products-announced-at-devday") +``` + +## Step-3: Chat with your AI Assistant + +```python Chat +assistant.chat("How much OpenAI credits were offered to attendees during OpenAI DevDay?") +# Response: 'Every attendee of OpenAI DevDay 2023 was offered $500 in OpenAI credits.' +``` diff --git a/docs/rest-api/add-data.mdx b/docs/examples/rest-api/add-data.mdx similarity index 100% rename from docs/rest-api/add-data.mdx rename to docs/examples/rest-api/add-data.mdx diff --git a/docs/rest-api/chat.mdx b/docs/examples/rest-api/chat.mdx similarity index 100% rename from docs/rest-api/chat.mdx rename to docs/examples/rest-api/chat.mdx diff --git a/docs/rest-api/check-status.mdx b/docs/examples/rest-api/check-status.mdx similarity index 100% rename from docs/rest-api/check-status.mdx rename to docs/examples/rest-api/check-status.mdx diff --git a/docs/rest-api/create.mdx b/docs/examples/rest-api/create.mdx similarity index 100% rename from docs/rest-api/create.mdx rename to docs/examples/rest-api/create.mdx diff --git a/docs/rest-api/delete.mdx b/docs/examples/rest-api/delete.mdx similarity index 100% rename from docs/rest-api/delete.mdx rename to docs/examples/rest-api/delete.mdx diff --git a/docs/rest-api/deploy.mdx b/docs/examples/rest-api/deploy.mdx similarity index 100% rename from docs/rest-api/deploy.mdx rename to docs/examples/rest-api/deploy.mdx diff --git a/docs/rest-api/get-all-apps.mdx b/docs/examples/rest-api/get-all-apps.mdx similarity index 100% rename from docs/rest-api/get-all-apps.mdx rename to docs/examples/rest-api/get-all-apps.mdx diff --git a/docs/rest-api/get-data.mdx b/docs/examples/rest-api/get-data.mdx similarity index 100% rename from docs/rest-api/get-data.mdx rename to docs/examples/rest-api/get-data.mdx diff --git a/docs/rest-api/getting-started.mdx b/docs/examples/rest-api/getting-started.mdx similarity index 100% rename from docs/rest-api/getting-started.mdx rename to docs/examples/rest-api/getting-started.mdx diff --git a/docs/rest-api/query.mdx b/docs/examples/rest-api/query.mdx similarity index 100% rename from docs/rest-api/query.mdx rename to docs/examples/rest-api/query.mdx diff --git a/docs/examples/showcase.mdx b/docs/examples/showcase.mdx new file mode 100644 index 00000000..d8b51191 --- /dev/null +++ b/docs/examples/showcase.mdx @@ -0,0 +1,115 @@ +--- +title: 'πŸŽͺ Community showcase' +--- + +Embedchain community has been super active in creating demos on top of Embedchain. On this page, we showcase all the apps, blogs, videos, and tutorials created by the community. ❀️ + +## Apps + +### Open Source + +- [My GSoC23 bot- Streamlit chat](https://github.com/lucifertrj/EmbedChain_GSoC23_BOT) by Tarun Jain +- [Discord Bot for LLM chat](https://github.com/Reidond/discord_bots_playground/tree/c8b0c36541e4b393782ee506804c4b6962426dd6/python/chat-channel-bot) by Reidond +- [EmbedChain-Streamlit-Docker App](https://github.com/amjadraza/embedchain-streamlit-app) by amjadraza +- [Harry Potter Philosphers Stone Bot](https://github.com/vinayak-kempawad/Harry_Potter_Philosphers_Stone_Bot/) by Vinayak Kempawad, ([LinkedIn post](https://www.linkedin.com/feed/update/urn:li:activity:7080907532155686912/)) +- [LLM bot trained on own messages](https://github.com/Harin329/harinBot) by Hao Wu + +### Closed Source + +- [Taobot.io](https://taobot.io) - chatbot & knowledgebase hybrid by [cachho](https://github.com/cachho) +- [Create Instant ChatBot πŸ€– using embedchain](https://databutton.com/v/h3e680h9) by Avra, ([Tweet](https://twitter.com/Avra_b/status/1674704745154641920/)) +- [JOBO πŸ€– β€” The AI-driven sidekick to craft your resume](https://try-jobo.com/) by Enrico Willemse, ([LinkedIn Post](https://www.linkedin.com/posts/enrico-willemse_jobai-gptfun-embedchain-activity-7090340080879374336-ueLB/)) +- [Explore Your Knowledge Base: Interactive chats over various forms of documents](https://chatdocs.dkedar.com/) by Kedar Dabhadkar, ([LinkedIn Post](https://www.linkedin.com/posts/dkedar7_machinelearning-llmops-activity-7092524836639424513-2O3L/)) +- [Chatbot trained on 1000+ videos of Ester hicks the co-author behind the famous book Secret](https://ask-abraham.thoughtseed.repl.co) by Mohan Kumar + + +## Templates + +### Replit +- [Embedchain Chat Bot](https://replit.com/@taranjeet1/Embedchain-Chat-Bot) by taranjeetio +- [Embedchain Memory Chat Bot Template](https://replit.com/@taranjeetio/Embedchain-Memory-Chat-Bot-Template) by taranjeetio +- [Chatbot app to demonstrate question-answering using retrieved information](https://replit.com/@AllisonMorrell/EmbedChainlitPublic) by Allison Morrell, ([LinkedIn Post](https://www.linkedin.com/posts/allison-morrell-2889275a_retrievalbot-screenshots-activity-7080339991754649600-wihZ/)) + +## Posts + +### Blogs + +- [Customer Service LINE Bot](https://www.evanlin.com/langchain-embedchain/) by Evan Lin +- [Chatbot in Under 5 mins using Embedchain](https://medium.com/@ayush.wattal/chatbot-in-under-5-mins-using-embedchain-a4f161fcf9c5) by Ayush Wattal +- [Understanding what the LLM framework embedchain does](https://zenn.dev/hijikix/articles/4bc8d60156a436) by Daisuke Hashimoto +- [In bed with GPT and Node.js](https://dev.to/worldlinetech/in-bed-with-gpt-and-nodejs-4kh2) by RaphaΓ«l Semeteys, ([LinkedIn Post](https://www.linkedin.com/posts/raphaelsemeteys_in-bed-with-gpt-and-nodejs-activity-7088113552326029313-nn87/)) +- [Using Embedchain β€” A powerful LangChain Python wrapper to build Chat Bots even faster!⚑](https://medium.com/@avra42/using-embedchain-a-powerful-langchain-python-wrapper-to-build-chat-bots-even-faster-35c12994a360) by Avra, ([Tweet](https://twitter.com/Avra_b/status/1686767751560310784/)) +- [What is the Embedchain library?](https://jahaniwww.com/%da%a9%d8%aa%d8%a7%d8%a8%d8%ae%d8%a7%d9%86%d9%87-embedchain/) by Ali Jahani, ([LinkedIn Post](https://www.linkedin.com/posts/ajahani_aepaetaeqaexaggahyaeu-aetaexaesabraeaaeqaepaeu-activity-7097605202135904256-ppU-/)) +- [LangChain is Nice, But Have You Tried EmbedChain ?](https://medium.com/thoughts-on-machine-learning/langchain-is-nice-but-have-you-tried-embedchain-215a34421cde) by FS Ndzomga, ([Tweet](https://twitter.com/ndzfs/status/1695583640372035951/)) +- [Simplest Method to Build a Custom Chatbot with GPT-3.5 (via Embedchain)](https://www.ainewsletter.today/p/simplest-method-to-build-a-custom) by Arjun, ([Tweet](https://twitter.com/aiguy_arjun/status/1696393808467091758/)) + +### LinkedIn + +- [What is embedchain](https://www.linkedin.com/posts/activity-7079393104423698432-wRyi/) by Rithesh Sreenivasan +- [Building a chatbot with EmbedChain](https://www.linkedin.com/posts/activity-7078434598984060928-Zdso/) by Lior Sinclair +- [Making chatbot without vs with embedchain](https://www.linkedin.com/posts/kalyanksnlp_llms-chatbots-langchain-activity-7077453416221863936-7N1L/) by Kalyan KS +- [EmbedChain - very intuitive, first you index your data and then query!](https://www.linkedin.com/posts/shubhamsaboo_embedchain-a-framework-to-easily-create-activity-7079535460699557888-ad1X/) by Shubham Saboo +- [EmbedChain - Harnessing power of LLM](https://www.linkedin.com/posts/uditsaini_chatbotrevolution-llmpoweredbots-embedchainframework-activity-7077520356827181056-FjTK/) by Udit S. +- [AI assistant for ABBYY Vantage](https://www.linkedin.com/posts/maximevermeir_llm-github-abbyy-activity-7081658972071424000-fXfZ/) by Maxime V. +- [About embedchain](https://www.linkedin.com/feed/update/urn:li:activity:7080984218914189312/) by Morris Lee +- [How to use Embedchain](https://www.linkedin.com/posts/nehaabansal_github-embedchainembedchain-framework-activity-7085830340136595456-kbW5/) by Neha Bansal +- [Youtube/Webpage summary for Energy Study](https://www.linkedin.com/posts/bar%C4%B1%C5%9F-sanl%C4%B1-34b82715_enerji-python-activity-7082735341563977730-Js0U/) by Barış SanlΔ±, ([Tweet](https://twitter.com/barissanli/status/1676968784979193857/)) +- [Demo: How to use Embedchain? (Contains Collab Notebook link)](https://www.linkedin.com/posts/liorsinclair_embedchain-is-getting-a-lot-of-traction-because-activity-7103044695995424768-RckT/) by Lior Sinclair + +### Twitter + +- [What is embedchain](https://twitter.com/AlphaSignalAI/status/1672668574450847745) by Lior +- [Building a chatbot with Embedchain](https://twitter.com/Saboo_Shubham_/status/1673537044419686401) by Shubham Saboo +- [Chatbot docker image behind an API with yaml configs with Embedchain](https://twitter.com/tricalt/status/1678411430192730113/) by Vasilije +- [Build AI powered PDF chatbot with just five lines of Python code with Embedchain!](https://twitter.com/Saboo_Shubham_/status/1676627104866156544/) by Shubham Saboo +- [Chatbot against a youtube video using embedchain](https://twitter.com/smaameri/status/1675201443043704834/) by Sami Maameri +- [Highlights of EmbedChain](https://twitter.com/carl_AIwarts/status/1673542204328120321/) by carl_AIwarts +- [Build Llama-2 chatbot in less than 5 minutes](https://twitter.com/Saboo_Shubham_/status/1682168956918833152/) by Shubham Saboo +- [All cool features of embedchain](https://twitter.com/DhravyaShah/status/1683497882438217728/) by Dhravya Shah, ([LinkedIn Post](https://www.linkedin.com/posts/dhravyashah_what-if-i-tell-you-that-you-can-make-an-ai-activity-7089459599287726080-ZIYm/)) +- [Read paid Medium articles for Free using embedchain](https://twitter.com/kumarkaushal_/status/1688952961622585344) by Kaushal Kumar + +## Videos + +- [Embedchain in one shot](https://www.youtube.com/watch?v=vIhDh7H73Ww&t=82s) by AI with Tarun +- [embedChain Create LLM powered bots over any dataset Python Demo Tesla Neurallink Chatbot Example](https://www.youtube.com/watch?v=bJqAn22a6Gc) by Rithesh Sreenivasan +- [Embedchain - NEW πŸ”₯ Langchain BABY to build LLM Bots](https://www.youtube.com/watch?v=qj_GNQ06I8o) by 1littlecoder +- [EmbedChain -- NEW!: Build LLM-Powered Bots with Any Dataset](https://www.youtube.com/watch?v=XmaBezzGHu4) by DataInsightEdge +- [Chat With Your PDFs in less than 10 lines of code! EMBEDCHAIN tutorial](https://www.youtube.com/watch?v=1ugkcsAcw44) by Phani Reddy +- [How To Create A Custom Knowledge AI Powered Bot | Install + How To Use](https://www.youtube.com/watch?v=VfCrIiAst-c) by The Ai Solopreneur +- [Build Custom Chatbot in 6 min with this Framework [Beginner Friendly]](https://www.youtube.com/watch?v=-8HxOpaFySM) by Maya Akim +- [embedchain-streamlit-app](https://www.youtube.com/watch?v=3-9GVd-3v74) by Amjad Raza +- [πŸ€–CHAT with ANY ONLINE RESOURCES using EMBEDCHAIN - a LangChain wrapper, in few lines of code !](https://www.youtube.com/watch?v=Mp7zJe4TIdM) by Avra +- [Building resource-driven LLM-powered bots with Embedchain](https://www.youtube.com/watch?v=IVfcAgxTO4I) by BugBytes +- [embedchain-streamlit-demo](https://www.youtube.com/watch?v=yJAWB13FhYQ) by Amjad Raza +- [Embedchain - create your own AI chatbots using open source models](https://www.youtube.com/shorts/O3rJWKwSrWE) by Dhravya Shah +- [AI ChatBot in 5 lines Python Code](https://www.youtube.com/watch?v=zjWvLJLksv8) by Data Engineering +- [Interview with Karl Marx](https://www.youtube.com/watch?v=5Y4Tscwj1xk) by Alexander Ray Williams +- [Vlog where we try to build a bot based on our content on the internet](https://www.youtube.com/watch?v=I2w8CWM3bx4) by DV, ([Tweet](https://twitter.com/dvcoolster/status/1688387017544261632)) +- [CHAT with ANY ONLINE RESOURCES using EMBEDCHAIN|STREAMLIT with MEMORY |All OPENSOURCE](https://www.youtube.com/watch?v=TqQIHWoWTDQ&pp=ygUKZW1iZWRjaGFpbg%3D%3D) by DataInsightEdge +- [Build POWERFUL LLM Bots EASILY with Your Own Data - Embedchain - Langchain 2.0? (Tutorial)](https://www.youtube.com/watch?v=jE24Y_GasE8) by WorldofAI, ([Tweet](https://twitter.com/intheworldofai/status/1696229166922780737)) +- [Embedchain: An AI knowledge base assistant for customizing enterprise private data, which can be connected to discord, whatsapp, slack, tele and other terminals (with gradio to build a request interface) in Chinese](https://www.youtube.com/watch?v=5RZzCJRk-d0) by AIGC LINK +- [Embedchain Introduction](https://www.youtube.com/watch?v=Jet9zAqyggI) by Fahd Mirza + +## Mentions + +### Github repos + +- [Awesome-LLM](https://github.com/Hannibal046/Awesome-LLM) +- [awesome-chatgpt-api](https://github.com/reorx/awesome-chatgpt-api) +- [awesome-langchain](https://github.com/kyrolabs/awesome-langchain) +- [Awesome-Prompt-Engineering](https://github.com/promptslab/Awesome-Prompt-Engineering) +- [awesome-chatgpt](https://github.com/eon01/awesome-chatgpt) +- [Awesome-LLMOps](https://github.com/tensorchord/Awesome-LLMOps) +- [awesome-generative-ai](https://github.com/filipecalegario/awesome-generative-ai) +- [awesome-gpt](https://github.com/formulahendry/awesome-gpt) +- [awesome-ChatGPT-repositories](https://github.com/taishi-i/awesome-ChatGPT-repositories) +- [awesome-gpt-prompt-engineering](https://github.com/snwfdhmp/awesome-gpt-prompt-engineering) +- [awesome-chatgpt](https://github.com/awesome-chatgpt/awesome-chatgpt) +- [awesome-llm-and-aigc](https://github.com/sjinzh/awesome-llm-and-aigc) +- [awesome-compbio-chatgpt](https://github.com/csbl-br/awesome-compbio-chatgpt) +- [Awesome-LLM4Tool](https://github.com/OpenGVLab/Awesome-LLM4Tool) + +## Meetups + +- [Dash and ChatGPT: Future of AI-enabled apps 30/08/23](https://go.plotly.com/dash-chatgpt) +- [Pie & AI: Bangalore - Build end-to-end LLM app using Embedchain 01/09/23](https://www.eventbrite.com/e/pie-ai-bangalore-build-end-to-end-llm-app-using-embedchain-tickets-698045722547) diff --git a/docs/get-started/deployment.mdx b/docs/get-started/deployment.mdx new file mode 100644 index 00000000..9765c979 --- /dev/null +++ b/docs/get-started/deployment.mdx @@ -0,0 +1,53 @@ +--- +title: 'πŸš€ Deployment' +description: 'Deploy your embedchain RAG application to production' +--- + +After successfully setting up and testing your Embedchain application locally, the next step is to deploy it to a hosting service to make it accessible to a wider audience. This section offers various options for hosting your app on the [Embedchain platform](https://app.embedchain.ai) or through [self-hosting options](#self-hosting). + +## Option 1: Deploy on Embedchain Platform + +Embedchain enables developers to deploy their LLM-powered apps in production using the [Embedchain platform](https://app.embedchain.ai). The platform offers free access to context on your data through its REST API. Once the pipeline is deployed, you can update your data sources anytime after deployment. + +See the example below on how to use the deploy your app (for free): + +```python +from embedchain import Pipeline as App + +# Initialize app +app = App() + +# Add data source +app.add("https://www.forbes.com/profile/elon-musk") + +# Deploy your pipeline to Embedchain Platform +app.deploy() + +# πŸ”‘ Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/ +# ec-xxxxxx + +# πŸ› οΈ Creating pipeline on the platform... +# πŸŽ‰πŸŽ‰πŸŽ‰ Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx + +# πŸ› οΈ Adding data to your pipeline... +# βœ… Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully. +``` + +## Option 2: Self-hosting + +You can also deploy Embedchain as a self-hosted service using the dockerized REST API service that we provide. Please follow the [guide here](/examples/rest-api) on how to use the REST API service. Here are some tutorials on how to deploy a containerized application to different platforms like AWS, GCP, Azure etc: + +- [AWS EKS](https://docs.aws.amazon.com/eks/latest/userguide/sample-deployment.html) +- [AWS ECS](https://docs.aws.amazon.com/codecatalyst/latest/userguide/deploy-tut-ecs.html) +- [Google GKE](https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app) +- [Azure App Service](https://learn.microsoft.com/en-us/training/modules/deploy-run-container-app-service/) +- [Fly.io](https://fly.io/docs/languages-and-frameworks/python/) +- [Render.com](https://render.com/docs/deploy-an-image) +- [Huggingface Spaces](https://huggingface.co/new-space) + + +## Seeking help? + +If you run into issues with deployment, please feel free to reach out to us via any of the following methods: + + diff --git a/docs/get-started/faq.mdx b/docs/get-started/faq.mdx index 80028e73..d5899f91 100644 --- a/docs/get-started/faq.mdx +++ b/docs/get-started/faq.mdx @@ -115,7 +115,7 @@ embedder: -#### Need more help? +#### Still have questions? If docs aren't sufficient, please feel free to reach out to us using one of the following methods: diff --git a/docs/get-started/integrations.mdx b/docs/get-started/integrations.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/get-started/introduction.mdx b/docs/get-started/introduction.mdx index e6c73955..26c23e2d 100644 --- a/docs/get-started/introduction.mdx +++ b/docs/get-started/introduction.mdx @@ -1,221 +1,66 @@ --- title: πŸ“š Introduction -description: 'πŸ“ Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data' --- -## 🌐 What is Embedchain? +## What is Embedchain? -Embedchain simplifies data handling by automatically processing unstructured data, breaking it into chunks, generating embeddings, and storing it in a vector database. +Embedchain is a production ready Open-Source RAG framework - load, index, retrieve, and sync any unstructured data. -Through various APIs, you can obtain contextual information for queries, find answers to specific questions, and engage in chat conversations using your data. -## πŸ” Search +Embedchain streamlines the creation of RAG applications, offering a seamless process for managing various types of unstructured data. It efficiently segments data into manageable chunks, generates relevant embeddings, and stores them in a vector database for optimized retrieval. With a suite of diverse APIs, it enables users to extract contextual information, find precise answers, or engage in interactive chat conversations, all tailored to their own data. -Embedchain lets you get most relevant context by doing semantic search over your data sources for a provided query. See the example below: +## Who is Embedchain for? -```python -from embedchain import Pipeline as App +Embedchain is designed for a diverse range of users, from AI professionals like Data Scientists and Machine Learning Engineers to those just starting their AI journey, including college students, independent developers, and hobbyists. Essentially, it's for anyone with an interest in AI, regardless of their expertise level. -# Initialize app -app = App() +Our APIs are user-friendly yet adaptable, enabling beginners to effortlessly create LLM-powered applications with as few as 4 lines of code. At the same time, we offer extensive customization options for every aspect of the RAG pipeline. This includes the choice of LLMs, vector databases, loaders and chunkers, retrieval strategies, re-ranking, and more. -# Add data source -app.add("https://www.forbes.com/profile/elon-musk") +Our platform's clear and well-structured abstraction layers ensure that users can tailor the system to meet their specific needs, whether they're crafting a simple project or a complex, nuanced AI application. -# Get relevant context using semantic search -context = app.search("What is the net worth of Elon?", num_documents=2) -print(context) -# Context: -# [ -# { -# 'context': 'Elon Musk PROFILEElon MuskCEO, Tesla$221.9BReal Time Net Worthas of 10/29/23Reflects change since 5 pm ET of prior trading day. 1 in the world todayPhoto by Martin Schoeller for ForbesAbout Elon MuskElon Musk cofounded six companies, including electric car maker Tesla, rocket producer SpaceX and tunneling startup Boring Company.He owns about 21% of Tesla between stock and options, but has pledged more than half his shares as collateral for personal loans of up to $3.5 billion.SpaceX, founded in', -# 'source': 'https://www.forbes.com/profile/elon-musk', -# 'document_id': 'some_document_id' -# }, -# { -# 'context': 'company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes Lists 1Forbes 400 (2023)The Richest Person In Every State (2023) 2Billionaires (2023) 1Innovative Leaders (2019) 25Powerful People (2018) 12Richest In Tech (2017)Global Game Changers (2016)More ListsPersonal StatsAge52Source of WealthTesla, SpaceX, Self MadeSelf-Made Score8Philanthropy Score1ResidenceAustin, TexasCitizenshipUnited StatesMarital StatusSingleChildren11EducationBachelor of Arts/Science, University', -# 'source': 'https://www.forbes.com/profile/elon-musk', -# 'document_id': 'some_document_id' -# } -# ] -``` +## Why Use Embedchain? -## ❓Query +Developing a robust and efficient RAG (Retrieval-Augmented Generation) pipeline for production use presents numerous complexities, such as: -Embedchain empowers developers to ask questions and receive relevant answers through a user-friendly query API. Refer to the following example to learn how to utilize the query API: +- Integrating and indexing data from diverse sources. +- Determining optimal data chunking methods for each source. +- Synchronizing the RAG pipeline with regularly updated data sources. +- Implementing efficient data storage in a vector store. +- Deciding whether to include metadata with document chunks. +- Handling permission management. +- Configuring Large Language Models (LLMs). +- Selecting effective prompts. +- Choosing suitable retrieval strategies. +- Assessing the performance of your RAG pipeline. +- Deploying the pipeline into a production environment, among other concerns. - +Embedchain is designed to simplify these tasks, offering conventional yet customizable APIs. Our solution handles the intricate processes of loading, chunking, indexing, and retrieving data. This enables you to concentrate on aspects that are crucial for your specific use case or business objectives, ensuring a smoother and more focused development process. -```python With Citations -from embedchain import Pipeline as App +## How it works? -# Initialize app -app = App() +Embedchain makes it easy to add data to your RAG pipeline with these straightforward steps: -# Add data source -app.add("https://www.forbes.com/profile/elon-musk") +1. **Automatic Data Handling**: It automatically recognizes the data type and loads it. +2. **Efficient Data Processing**: The system creates embeddings for key parts of your data. +3. **Flexible Data Storage**: You get to choose where to store this processed data in a vector database. -# Get relevant answer for your query -answer, sources = app.query("What is the net worth of Elon?", citations=True) -print(answer) -# Answer: The net worth of Elon Musk is $221.9 billion. +When a user asks a question, whether for chatting, searching, or querying, Embedchain simplifies the response process: -print(sources) -# [ -# ( -# 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...', -# 'https://www.forbes.com/profile/elon-musk', -# '4651b266--4aa78839fe97' -# ), -# ( -# '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...', -# 'https://www.forbes.com/profile/elon-musk', -# '4651b266--4aa78839fe97' -# ), -# ( -# 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...', -# 'https://www.forbes.com/profile/elon-musk', -# '4651b266--4aa78839fe97' -# ) -# ] -``` +1. **Query Processing**: It turns the user's question into embeddings. +2. **Document Retrieval**: These embeddings are then used to find related documents in the database. +3. **Answer Generation**: The related documents are used by the LLM to craft a precise answer. +With Embedchain, you don’t have to worry about the complexities of building a RAG pipeline. It offers an easy-to-use interface for developing applications with any kind of data. -```python Without Citations -from embedchain import Pipeline as App +## Getting started -# Initialize app -app = App() +Checkout our [quickstart guide](/get-started/quickstart) to start your first RAG application. -# Add data source -app.add("https://www.forbes.com/profile/elon-musk") +## Support -# Get relevant answer for your query -answer = app.query("What is the net worth of Elon?") -print(answer) -# Answer: The net worth of Elon Musk is $221.9 billion. -``` +Feel free to reach out to us if you have ideas, feedback or questions that we can help out with. - + -When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order): -1. source chunk -2. link of the source document -3. document id (used for book keeping purposes) +## Contribute - -## πŸ’¬ Chat - -Embedchain allows easy chatting over your data sources using a user-friendly chat API. Check out the example below to understand how to use the chat API: - - - -```python With Citations -from embedchain import Pipeline as App - -# Initialize app -app = App() - -# Add data source -app.add("https://www.forbes.com/profile/elon-musk") - -# Get relevant answer for your query -answer, sources = app.chat("What is the net worth of Elon?", citations=True) -print(answer) -# Answer: The net worth of Elon Musk is $221.9 billion. - -print(sources) -# [ -# ( -# 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...', -# 'https://www.forbes.com/profile/elon-musk', -# '4651b266--4aa78839fe97' -# ), -# ( -# '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...', -# 'https://www.forbes.com/profile/elon-musk', -# '4651b266--4aa78839fe97' -# ), -# ( -# 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...', -# 'https://www.forbes.com/profile/elon-musk', -# '4651b266--4aa78839fe97' -# ) -# ] -``` - -```python Without Citations -from embedchain import Pipeline as App - -# Initialize app -app = App() - -# Add data source -app.add("https://www.forbes.com/profile/elon-musk") - -# Chat on your data using `.chat()` -answer = app.chat("What is the net worth of Elon?") -print(answer) -# Answer: The net worth of Elon Musk is $221.9 billion. -``` - - - -Similar to `query()` function, when `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order): -1. source chunk -2. link of the source document -3. document id (used for book keeping purposes) - -## πŸš€ Deploy - -Embedchain enables developers to deploy their LLM-powered apps in production using the Embedchain platform. The platform offers free access to context on your data through its REST API. Once the pipeline is deployed, you can update your data sources anytime after deployment. - -See the example below on how to use the deploy API: - -```python -from embedchain import Pipeline as App - -# Initialize app -app = App() - -# Add data source -app.add("https://www.forbes.com/profile/elon-musk") - -# Deploy your pipeline to Embedchain Platform -app.deploy() - -# πŸ”‘ Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/ -# ec-xxxxxx - -# πŸ› οΈ Creating pipeline on the platform... -# πŸŽ‰πŸŽ‰πŸŽ‰ Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx - -# πŸ› οΈ Adding data to your pipeline... -# βœ… Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully. -``` - -## πŸ› οΈ How it works? - -Embedchain abstracts out the following steps from you to easily create LLM powered apps: - -1. Detect the data type and load data -2. Create meaningful chunks -3. Create embeddings for each chunk -4. Store chunks in a vector database - -When a user asks a query, the following process happens to find the answer: - -1. Create an embedding for the query -2. Find similar documents for the query from the vector database -3. Pass the similar documents as context to LLM to get the final answer - -The process of loading the dataset and querying involves multiple steps, each with its own nuances: - -- How should I chunk the data? What is a meaningful chunk size? -- How should I create embeddings for each chunk? Which embedding model should I use? -- How should I store the chunks in a vector database? Which vector database should I use? -- Should I store metadata along with the embeddings? -- How should I find similar documents for a query? Which ranking model should I use? - -Embedchain takes care of all these nuances and provides a simple interface to create apps on any data. - -## [πŸš€ Get started](https://docs.embedchain.ai/get-started/quickstart) +- [GitHub](https://github.com/embedchain/embedchain) +- [Contribution docs](/contribution/dev) diff --git a/docs/get-started/quickstart.mdx b/docs/get-started/quickstart.mdx index f501054f..710acf9c 100644 --- a/docs/get-started/quickstart.mdx +++ b/docs/get-started/quickstart.mdx @@ -1,20 +1,14 @@ --- -title: 'πŸš€ Quickstart' -description: 'πŸ’‘ Start building LLM powered apps under 30 seconds' +title: '⚑ Quickstart' +description: 'πŸ’‘ Start building ChatGPT like apps in a minute on your own data' --- -Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data. - -Install embedchain python package: +Install python package: ```bash pip install embedchain ``` - -Embedchain now supports OpenAI's latest `gpt-4-turbo` model. Checkout the [FAQs](/get-started/faq#how-to-use-gpt-4-turbo-model-released-on-openai-devday). - - Creating an app involves 3 steps: @@ -59,6 +53,7 @@ Creating an app involves 3 steps: app.query("What is the net worth of Elon Musk today?") # Answer: The net worth of Elon Musk today is $258.7 billion. ``` +
Embedchain provides a wide range of features to interact with your app. You can chat with your app, ask questions, search through your data, and much more. ```python @@ -88,9 +83,3 @@ Creating an app involves 3 steps:
- -Putting it together, you can run your first app using the following Google Colab. Make sure to set the `OPENAI_API_KEY` πŸ”‘ environment variable in the code. - - - Open in Colab - diff --git a/docs/integration/langsmith.mdx b/docs/integration/langsmith.mdx index fb9e2a9f..3a29786c 100644 --- a/docs/integration/langsmith.mdx +++ b/docs/integration/langsmith.mdx @@ -48,4 +48,4 @@ app.query("How many companies did Elon found?") * Now the entire log for this will be visible in langsmith. - \ No newline at end of file + diff --git a/docs/mint.json b/docs/mint.json index 7d32a528..aef640ee 100644 --- a/docs/mint.json +++ b/docs/mint.json @@ -4,7 +4,7 @@ "logo": { "dark": "/logo/dark.svg", "light": "/logo/light.svg", - "href": "https://embedchain.ai/" + "href": "https://github.com/embedchain/embedchain" }, "favicon": "/favicon.png", "colors": { @@ -19,106 +19,145 @@ "modeToggle": { "default": "dark" }, - "openapi": ["/rest-api.json"], + "openapi": [ + "/rest-api.json" + ], "metadata": { "og:image": "/images/og.png", "twitter:site": "@embedchain" }, + "tabs": [ + { + "name": "Examples", + "url": "examples" + }, + { + "name": "API Reference", + "url": "api-reference" + } + ], "anchors": [ { - "name": "Embedchain Platform", - "icon": "tv", - "url": "https://app.embedchain.ai/" + "name": "Talk to founders", + "icon": "calendar", + "url": "https://cal.com/taranjeetio/ec" }, { "name": "Join our slack", "icon": "slack", "url": "https://join.slack.com/t/embedchain/shared_invite/zt-22uwz3c46-Zg7cIh5rOBteT_xe1jwLDw" + }, + { + "name": "Join our discord", + "icon": "discord", + "url": "https://discord.gg/CUU9FPhRNt" } ], "topbarLinks": [ { - "name": "Create account", - "url": "https://app.embedchain.ai/login/" + "name": "GitHub", + "url": "https://github.com/embedchain/embedchain" } ], "topbarCtaButton": { - "name": "Get started", - "url": "https://app.embedchain.ai" + "name": "Join our slack", + "url": "https://join.slack.com/t/embedchain/shared_invite/zt-22uwz3c46-Zg7cIh5rOBteT_xe1jwLDw" }, "primaryTab": { - "name": "Docs" + "name": "Documentation" }, "navigation": [ { - "group": "Get started", + "group": "Get Started", "pages": [ - "get-started/quickstart", "get-started/introduction", - "get-started/openai-assistant", - "get-started/faq", - "get-started/examples" + "get-started/quickstart", + "get-started/deployment", + { + "group": "πŸ”— Integrations", + "pages": [ + "integration/langsmith" + ] + }, + "get-started/faq" + ] + }, + { + "group": "Use cases", + "pages": [ + "use-cases/chatbots", + "use-cases/question-answering", + "use-cases/semantic-search" ] }, { "group": "Components", "pages": [ - "components/llms", - "components/embedding-models", - "components/vector-databases" - ] - }, - { - "group": "Data sources", - "pages": [ - "data-sources/overview", { - "group": "Supported data sources", + "group": "Data sources", "pages": [ - "data-sources/csv", - "data-sources/json", - "data-sources/docs-site", - "data-sources/docx", - "data-sources/mdx", - "data-sources/notion", - "data-sources/pdf-file", - "data-sources/qna", - "data-sources/sitemap", - "data-sources/text", - "data-sources/web-page", - "data-sources/openapi", - "data-sources/youtube-video", - "data-sources/discourse", - "data-sources/substack", - "data-sources/discord", - "data-sources/beehiiv" + "components/data-sources/overview", + { + "group": "Data types", + "pages": [ + "components/data-sources/csv", + "components/data-sources/json", + "components/data-sources/docs-site", + "components/data-sources/docx", + "components/data-sources/mdx", + "components/data-sources/notion", + "components/data-sources/pdf-file", + "components/data-sources/qna", + "components/data-sources/sitemap", + "components/data-sources/text", + "components/data-sources/web-page", + "components/data-sources/openapi", + "components/data-sources/youtube-video", + "components/data-sources/discourse", + "components/data-sources/substack", + "components/data-sources/discord", + "components/data-sources/beehiiv" + ] + }, + "components/data-sources/data-type-handling" ] }, - "data-sources/data-type-handling" + "components/llms", + "components/vector-databases", + "components/embedding-models" ] }, { - "group": "Advanced", - "pages": ["advanced/configuration"] - }, - { - "group": "REST API", + "group": "Community", "pages": [ - "rest-api/getting-started", - "rest-api/create", - "rest-api/get-all-apps", - "rest-api/add-data", - "rest-api/get-data", - "rest-api/query", - "rest-api/deploy", - "rest-api/delete", - "rest-api/check-status" + "community/connect-with-us" ] }, { - "group": "Use Cases", + "group": "Examples", "pages": [ + { + "group": "REST API Service", + "pages": [ + "examples/rest-api/getting-started", + "examples/rest-api/create", + "examples/rest-api/get-all-apps", + "examples/rest-api/add-data", + "examples/rest-api/get-data", + "examples/rest-api/query", + "examples/rest-api/deploy", + "examples/rest-api/delete", + "examples/rest-api/check-status" + ] + }, "examples/full_stack", + "examples/openai-assistant", + "examples/opensource-assistant" + ] + }, + { + "group": "Chatbots", + "pages": [ "examples/discord_bot", "examples/slack_bot", "examples/telegram_bot", @@ -127,15 +166,33 @@ ] }, { - "group": "Community", - "pages": ["community/connect-with-us", "community/showcase"] + "group": "Showcase", + "pages": [ + "examples/showcase" + ] }, { - "group": "Integrations", - "pages": ["integration/langsmith"] + "group": "API Reference", + "pages": [ + "api-reference/pipeline/overview", + { + "group": "Pipeline methods", + "pages": [ + "api-reference/pipeline/add", + "api-reference/pipeline/query", + "api-reference/pipeline/chat", + "api-reference/pipeline/search", + "api-reference/pipeline/deploy", + "api-reference/pipeline/reset" + ] + }, + "api-reference/store/openai-assistant", + "api-reference/store/ai-assistants", + "api-reference/advanced/configuration" + ] }, { - "group": "Contribute", + "group": "Contributing", "pages": [ "contribution/guidelines", "contribution/dev", @@ -146,7 +203,9 @@ }, { "group": "Product", - "pages": ["product/release-notes"] + "pages": [ + "product/release-notes" + ] } ], "footerSocials": { @@ -175,4 +234,4 @@ "api": { "baseUrl": "http://localhost:8080" } -} +} \ No newline at end of file diff --git a/docs/platform/faq.mdx b/docs/platform/faq.mdx new file mode 100644 index 00000000..750988c6 --- /dev/null +++ b/docs/platform/faq.mdx @@ -0,0 +1,3 @@ +--- +title: 'FAQs' +--- \ No newline at end of file diff --git a/docs/platform/overview.mdx b/docs/platform/overview.mdx new file mode 100644 index 00000000..a7bc86dc --- /dev/null +++ b/docs/platform/overview.mdx @@ -0,0 +1,3 @@ +--- +title: 'Overview' +--- \ No newline at end of file diff --git a/docs/platform/quickstart.mdx b/docs/platform/quickstart.mdx new file mode 100644 index 00000000..331cc491 --- /dev/null +++ b/docs/platform/quickstart.mdx @@ -0,0 +1,3 @@ +--- +title: 'Quickstart' +--- \ No newline at end of file diff --git a/docs/platform/roadmap.mdx b/docs/platform/roadmap.mdx new file mode 100644 index 00000000..e84a0c2e --- /dev/null +++ b/docs/platform/roadmap.mdx @@ -0,0 +1,3 @@ +--- +title: 'Roadmap' +--- \ No newline at end of file diff --git a/docs/platform/security.mdx b/docs/platform/security.mdx new file mode 100644 index 00000000..3da6f817 --- /dev/null +++ b/docs/platform/security.mdx @@ -0,0 +1,3 @@ +--- +title: 'Security' +--- \ No newline at end of file diff --git a/docs/rest-api.json b/docs/rest-api.json index ecb40f86..087d7e06 100644 --- a/docs/rest-api.json +++ b/docs/rest-api.json @@ -245,7 +245,7 @@ "/{app_id}/deploy": { "post": { "tags": ["Apps"], - "summary": "Deploy App", + "summary": "Deploy app", "description": "Deploy an existing app.", "operationId": "deploy_app__app_id__deploy_post", "parameters": [ diff --git a/docs/support/get-help.mdx b/docs/support/get-help.mdx new file mode 100644 index 00000000..e69de29b diff --git a/docs/use-cases/chatbots.mdx b/docs/use-cases/chatbots.mdx new file mode 100644 index 00000000..122abdd1 --- /dev/null +++ b/docs/use-cases/chatbots.mdx @@ -0,0 +1,41 @@ +--- +title: 'Chatbots' +--- + +Chatbots, especially those powered by Large Language Models (LLMs), have a wide range of use cases, significantly enhancing various aspects of business, education, and personal assistance. Here are some key applications: + +- **Customer Service**: Automating responses to common queries and providing 24/7 support. +- **Education**: Offering personalized tutoring and learning assistance. +- **E-commerce**: Assisting in product discovery, recommendations, and transactions. +- **Content Management**: Aiding in writing, summarizing, and organizing content. +- **Data Analysis**: Extracting insights from large datasets. +- **Language Translation**: Providing real-time multilingual support. +- **Mental Health**: Offering preliminary mental health support and conversation. +- **Entertainment**: Engaging users with games, quizzes, and humorous chats. +- **Accessibility Aid**: Enhancing information and service access for individuals with disabilities. + +Embedchain provides the right set of tools to create chatbots for the above use cases. Refer to the following examples of chatbots on and you can built on top of these examples: + + + + Learn to integrate a chatbot within a full-stack application. + + + Build a tailored GPT chatbot suited for your specific needs. + + + Enhance your Slack workspace with a specialized bot. + + + Create an engaging bot for your Discord server. + + + Develop a handy assistant for Telegram users. + + + Design a WhatsApp bot for efficient communication. + + + Explore advanced bot interactions with Poe Bot. + + diff --git a/docs/use-cases/question-answering.mdx b/docs/use-cases/question-answering.mdx new file mode 100644 index 00000000..c476517c --- /dev/null +++ b/docs/use-cases/question-answering.mdx @@ -0,0 +1,75 @@ +--- +title: 'Question Answering' +--- + +Utilizing large language models (LLMs) for question answering is a transformative application, bringing significant benefits to various real-world situations. Embedchain extensively supports tasks related to question answering, including summarization, content creation, language translation, and data analysis. The versatility of question answering with LLMs enables solutions for numerous practical applications such as: + +- **Educational Aid**: Enhancing learning experiences and aiding with homework +- **Customer Support**: Addressing and resolving customer queries efficiently +- **Research Assistance**: Facilitating academic and professional research endeavors +- **Healthcare Information**: Providing fundamental medical knowledge +- **Technical Support**: Resolving technology-related inquiries +- **Legal Information**: Offering basic legal advice and information +- **Business Insights**: Delivering market analysis and strategic business advice +- **Language Learning** Assistance: Aiding in understanding and translating languages +- **Travel Guidance**: Supplying information on travel and hospitality +- **Content Development**: Assisting authors and creators with research and idea generation + +## Example: Build a Q&A System with Embedchain for Next.JS + +Quickly create a RAG pipeline to answer queries about the [Next.JS Framework](https://nextjs.org/) using Embedchain tools. + +### Step 1: Set Up Your RAG Pipeline + +First, let's create your RAG pipeline. Open your Python environment and enter: + +```python Create pipeline +from embedchain import Pipeline as App +app = App() +``` + +This initializes your application. + +### Step 2: Populate Your Pipeline with Data + +Now, let's add data to your pipeline. We'll include the Next.JS website and its documentation: + +```python Ingest data sources +# Add Next.JS Website and docs +app.add("https://nextjs.org/sitemap.xml", data_type="sitemap") + +# Add Next.JS Forum data +app.add("https://nextjs-forum.com/sitemap.xml", data_type="sitemap") +``` + +This step incorporates over **15K pages** from the Next.JS website and forum into your pipeline. For more data source options, check the [Embedchain data sources overview](/components/data-sources/overview). + +### Step 3: Local Testing of Your Pipeline + +Test the pipeline on your local machine: + +```python Query App +app.query("Summarize the features of Next.js 14?") +``` + +Run this query to see how your pipeline responds with information about Next.js 14. + +### (Optional) Step 4: Deploying Your RAG Pipeline + +Want to go live? Deploy your pipeline with these options: + +- Deploy on the Embedchain Platform +- Self-host on your preferred cloud provider + +For detailed deployment instructions, follow these guides: + +- [Deploying on Embedchain Platform](/get-started/deployment#deploy-on-embedchain-platform) +- [Self-hosting Guide](/get-started/deployment#self-hosting) + +## Need help? + +If you are looking to configure the RAG pipeline further, feel free to checkout the [API reference](/api-reference/pipeline/query). + +In case you run into issues, feel free to contact us via any of the following methods: + + diff --git a/docs/use-cases/semantic-search.mdx b/docs/use-cases/semantic-search.mdx new file mode 100644 index 00000000..3a86fc3e --- /dev/null +++ b/docs/use-cases/semantic-search.mdx @@ -0,0 +1,91 @@ +Semantic searching, which involves understanding the intent and contextual meaning behind search queries, is yet another popular use-case of RAG. It has several popular use cases across various domains: + +- **Information Retrieval**: Enhances search accuracy in databases and websites +- **E-commerce**: Improves product discovery in online shopping +- **Customer Support**: Powers smarter chatbots for effective responses +- **Content Discovery**: Aids in finding relevant media content +- **Knowledge Management**: Streamlines document and data retrieval in enterprises +- **Healthcare**: Facilitates medical research and literature search +- **Legal Research**: Assists in legal document and case law search +- **Academic Research**: Aids in academic paper discovery +- **Language Processing**: Enables multilingual search capabilities + +Embedchain offers a simple yet customizable `search()` API that you can use for semantic search. See the example in the next section to know more. + +## Example: Semantic Search over Next.JS Website + Forum + +### Step 1: Set Up Your RAG Pipeline + +First, let's create your RAG pipeline. Open your Python environment and enter: + +```python Create pipeline +from embedchain import Pipeline as App +app = App() +``` + +This initializes your application. + +### Step 2: Populate Your Pipeline with Data + +Now, let's add data to your pipeline. We'll include the Next.JS website and its documentation: + +```python Ingest data sources +# Add Next.JS Website and docs +app.add("https://nextjs.org/sitemap.xml", data_type="sitemap") + +# Add Next.JS Forum data +app.add("https://nextjs-forum.com/sitemap.xml", data_type="sitemap") +``` + +This step incorporates over **15K pages** from the Next.JS website and forum into your pipeline. For more data source options, check the [Embedchain data sources overview](/components/data-sources/overview). + +### Step 3: Local Testing of Your Pipeline + +Test the pipeline on your local machine: + +```python Search App +app.search("Summarize the features of Next.js 14?") +[ + { + 'context': 'Next.js 14 | Next.jsBack to BlogThursday, October 26th 2023Next.js 14Posted byLee Robinson@leeerobTim Neutkens@timneutkensAs we announced at Next.js Conf, Next.js 14 is our most focused release with: Turbopack: 5,000 tests passing for App & Pages Router 53% faster local server startup 94% faster code updates with Fast Refresh Server Actions (Stable): Progressively enhanced mutations Integrated with caching & revalidating Simple function calls, or works natively with forms Partial Prerendering', + 'source': 'https://nextjs.org/blog/next-14', + 'document_id': '6c8d1a7b-ea34-4927-8823-daa29dcfc5af--b83edb69b8fc7e442ff8ca311b48510e6c80bf00caa806b3a6acb34e1bcdd5d5' + }, + { + 'context': 'Next.js 13.3 | Next.jsBack to BlogThursday, April 6th 2023Next.js 13.3Posted byDelba de Oliveira@delba_oliveiraTim Neutkens@timneutkensNext.js 13.3 adds popular community-requested features, including: File-Based Metadata API: Dynamically generate sitemaps, robots, favicons, and more. Dynamic Open Graph Images: Generate OG images using JSX, HTML, and CSS. Static Export for App Router: Static / Single-Page Application (SPA) support for Server Components. Parallel Routes and Interception: Advanced', + 'source': 'https://nextjs.org/blog/next-13-3', + 'document_id': '6c8d1a7b-ea34-4927-8823-daa29dcfc5af--b83edb69b8fc7e442ff8ca311b48510e6c80bf00caa806b3a6acb34e1bcdd5d5' + }, + { + 'context': 'Upgrading: Version 14 | Next.js MenuUsing App RouterFeatures available in /appApp Router.UpgradingVersion 14Version 14 Upgrading from 13 to 14 To update to Next.js version 14, run the following command using your preferred package manager: Terminalnpm i next@latest react@latest react-dom@latest eslint-config-next@latest Terminalyarn add next@latest react@latest react-dom@latest eslint-config-next@latest Terminalpnpm up next react react-dom eslint-config-next -latest Terminalbun add next@latest', + 'source': 'https://nextjs.org/docs/app/building-your-application/upgrading/version-14', + 'document_id': '6c8d1a7b-ea34-4927-8823-daa29dcfc5af--b83edb69b8fc7e442ff8ca311b48510e6c80bf00caa806b3a6acb34e1bcdd5d5' + } +] +``` +The `source` key contains the url of the document that yielded that document chunk. + +If you are interested in configuring the search further, refer to our [API documentation](/api-reference/pipeline/search). + +### (Optional) Step 4: Deploying Your RAG Pipeline + +Want to go live? Deploy your pipeline with these options: + +- Deploy on the Embedchain Platform +- Self-host on your preferred cloud provider + +For detailed deployment instructions, follow these guides: + +- [Deploying on Embedchain Platform](/get-started/deployment#deploy-on-embedchain-platform) +- [Self-hosting Guide](/get-started/deployment#self-hosting) + +---- + +This guide will help you swiftly set up a semantic search pipeline with Embedchain, making it easier to access and analyze specific information from large data sources. + + +## Need help? + +In case you run into issues, feel free to contact us via any of the following methods: + + diff --git a/embedchain-js/README.md b/embedchain-js/README.md index b2d1c425..ba339e77 100644 --- a/embedchain-js/README.md +++ b/embedchain-js/README.md @@ -228,15 +228,6 @@ embedchain is a framework which takes care of all these nuances and provides a s In the first release, we are making it easier for anyone to get a chatbot over any dataset up and running in less than a minute. All you need to do is create an app instance, add the data sets using `.add` function and then use `.query` function to get the relevant answer. -# Tech Stack - -embedchain is built on the following stack: - -- [Langchain](https://github.com/hwchase17/langchain) as an LLM framework to load, chunk and index data -- [OpenAI's Ada embedding model](https://platform.openai.com/docs/guides/embeddings) to create embeddings -- [OpenAI's ChatGPT API](https://platform.openai.com/docs/guides/gpt/chat-completions-api) as LLM to get answers given the context -- [Chroma](https://github.com/chroma-core/chroma) as the vector database to store embeddings - # Team ## Author