59 lines
2.7 KiB
Plaintext
59 lines
2.7 KiB
Plaintext
---
|
|
title: 📚 Introduction
|
|
description: '📝 Embedchain is a framework to easily create LLM powered apps on your data.'
|
|
---
|
|
|
|
## 🤔 What is Embedchain?
|
|
|
|
Embedchain abstracts the entire process of loading data, chunking it, creating embeddings, and storing it in a vector database.
|
|
|
|
You can add data from different data sources using the `.add()` method. Then, simply use the `.query()` method to find answers from the added datasets.
|
|
|
|
If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you.
|
|
|
|
```python
|
|
from embedchain import App
|
|
|
|
naval_bot = App()
|
|
# Add online data
|
|
naval_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
|
|
naval_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
|
|
naval_bot.add("https://nav.al/feedback")
|
|
naval_bot.add("https://nav.al/agi")
|
|
naval_bot.add("The Meanings of Life", 'text', metadata={'chapter': 'philosphy'})
|
|
|
|
# Add local resources
|
|
naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
|
|
|
|
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
|
|
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
|
|
|
|
# Ask questions with specific context
|
|
naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", where={'chapter': 'philosophy'})
|
|
```
|
|
|
|
## 🚀 How it works?
|
|
|
|
Embedchain abstracts out the following steps from you to easily create LLM powered apps:
|
|
|
|
1. Detect the data type and load data
|
|
2. Create meaningful chunks
|
|
3. Create embeddings for each chunk
|
|
4. Store chunks in a vector database
|
|
|
|
When a user asks a query, the following process happens to find the answer:
|
|
|
|
1. Create an embedding for the query
|
|
2. Find similar documents for the query from the vector database
|
|
3. Pass the similar documents as context to LLM to get the final answer
|
|
|
|
The process of loading the dataset and querying involves multiple steps, each with its own nuances:
|
|
|
|
- How should I chunk the data? What is a meaningful chunk size?
|
|
- How should I create embeddings for each chunk? Which embedding model should I use?
|
|
- How should I store the chunks in a vector database? Which vector database should I use?
|
|
- Should I store metadata along with the embeddings?
|
|
- How should I find similar documents for a query? Which ranking model should I use?
|
|
|
|
Embedchain takes care of all these nuances and provides a simple interface to create apps on any data.
|