docs: setup docs for embedchain (#287)

2023-07-16 16:33:30 -07:00
parent 05a4eef6ae
commit c595003481
21 changed files with 914 additions and 619 deletions
--- a/docs/introduction.mdx
+++ b/docs/introduction.mdx
@@ -0,0 +1,56 @@
+---
+title: 📚 Introduction
+description: '📝 Embedchain is a framework to easily create LLM powered bots over any dataset.'
+---
+
+## 🤔 What is Embedchain?
+
+Embedchain abstracts the entire process of loading a dataset, chunking it, creating embeddings, and storing it in a vector database.
+
+You can add a single or multiple datasets using the .add and .add_local functions. Then, simply use the .query function to find answers from the added datasets.
+
+If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you.
+
+```python
+from embedchain import App
+
+naval_chat_bot = App()
+# Embed Online Resources
+naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44")
+naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
+naval_chat_bot.add("web_page", "https://nav.al/feedback")
+naval_chat_bot.add("web_page", "https://nav.al/agi")
+
+# Embed Local Resources
+naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
+
+naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
+# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
+```
+
+## 🚀 How it works?
+
+Creating a chat bot over any dataset involves the following steps:
+
+1. Load the data
+2. Create meaningful chunks
+3. Create embeddings for each chunk
+4. Store the chunks in a vector database
+
+When a user asks a query, the following process happens to find the answer:
+
+1. Create an embedding for the query
+2. Find similar documents for the query from the vector database
+3. Pass the similar documents as context to LLM to get the final answer.
+
+The process of loading the dataset and querying involves multiple steps, each with its own nuances:
+
+- How should I chunk the data? What is a meaningful chunk size?
+- How should I create embeddings for each chunk? Which embedding model should I use?
+- How should I store the chunks in a vector database? Which vector database should I use?
+- Should I store metadata along with the embeddings?
+- How should I find similar documents for a query? Which ranking model should I use?
+
+Embedchain takes care of all these nuances and provides a simple interface to create bots over any dataset.
+
+In the first release, we make it easier for anyone to get a chatbot over any dataset up and running in less than a minute. Just create an app instance, add the datasets using the `.add()` function, and use the `.query()` function to get the relevant answers.