From 77c90a308eed22e713715e07f2707ca11ed994ee Mon Sep 17 00:00:00 2001 From: Deshraj Yadav Date: Sat, 14 Oct 2023 19:14:24 -0700 Subject: [PATCH] [Docs]: Clean up docs (#802) --- README.md | 4 +- configs/full-stack.yaml | 2 +- docs/advanced/query_configuration.mdx | 66 ---------- docs/advanced/testing.mdx | 40 ------ docs/advanced/vector_database.mdx | 118 ------------------ docs/community/connect-with-us.mdx | 30 +++-- docs/components/embedding-models.mdx | 24 ++-- docs/components/llms.mdx | 76 ++++++----- docs/components/vector-databases.mdx | 36 ++++-- docs/contribution/dev.mdx | 9 +- docs/get-started/introduction.mdx | 5 +- docs/get-started/quickstart.mdx | 10 +- .../full_stack/frontend/src/pages/index.js | 2 +- pyproject.toml | 2 +- 14 files changed, 120 insertions(+), 304 deletions(-) delete mode 100644 docs/advanced/query_configuration.mdx delete mode 100644 docs/advanced/testing.mdx delete mode 100644 docs/advanced/vector_database.mdx diff --git a/README.md b/README.md index d3179d75..68474792 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ [![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/138lMWhENGeEu7Q1-6lNbNTHGLZXBBz_B?usp=sharing) [![codecov](https://codecov.io/gh/embedchain/embedchain/graph/badge.svg?token=EMRRHZXW1Q)](https://codecov.io/gh/embedchain/embedchain) -Embedchain is a framework to easily create LLM powered bots over any dataset. If you want a javascript version, check out [embedchain-js](https://github.com/embedchain/embedchain/tree/main/embedchain-js) +Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data. If you want a javascript version, check out [embedchain-js](https://github.com/embedchain/embedchain/tree/main/embedchain-js) ## Community @@ -94,7 +94,7 @@ If you utilize this repository, please consider citing it with: ``` @misc{embedchain, author = {Taranjeet Singh, Deshraj Yadav}, - title = {Embedchain: Framework to easily create LLM powered bots over any dataset}, + title = {Embedchain: Data platform for LLMs - load, index, retrieve, and sync any unstructured data}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, diff --git a/configs/full-stack.yaml b/configs/full-stack.yaml index 13b661f8..4c9d4bba 100644 --- a/configs/full-stack.yaml +++ b/configs/full-stack.yaml @@ -25,7 +25,7 @@ llm: vectordb: provider: chroma config: - collection_name: 'full-stack-app' + collection_name: 'my-collection-name' dir: db allow_reset: true diff --git a/docs/advanced/query_configuration.mdx b/docs/advanced/query_configuration.mdx deleted file mode 100644 index f1c9c31c..00000000 --- a/docs/advanced/query_configuration.mdx +++ /dev/null @@ -1,66 +0,0 @@ ---- -title: 'πŸ” Query configurations' ---- - -## AppConfig - -| option | description | type | default | -|-----------|-----------------------|---------------------------------|------------------------| -| log_level | log level | string | WARNING | -| embedding_fn| embedding function | chromadb.utils.embedding_functions | \{text-embedding-ada-002\} | -| db | vector database (experimental) | BaseVectorDB | ChromaDB | -| collection_name | initial collection name for the database | string | embedchain_store | -| collect_metrics | collect anonymous telemetry data to improve embedchain | boolean | true | - - -## AddConfig - -|option|description|type|default| -|---|---|---|---| -|chunker|chunker config|ChunkerConfig|Default values for chunker depends on the `data_type`. Please refer [ChunkerConfig](#chunker-config)| -|loader|loader config|LoaderConfig|None| - -Yes, you are passing `ChunkerConfig` to `AddConfig`, like so: - -```python -chunker_config = ChunkerConfig(chunk_size=100) -add_config = AddConfig(chunker=chunker_config) -app.add("lorem ipsum", config=add_config) -``` - -### ChunkerConfig - -|option|description|type|default| -|---|---|---|---| -|chunk_size|Maximum size of chunks to return|int|Default value for various `data_type` mentioned below| -|chunk_overlap|Overlap in characters between chunks|int|Default value for various `data_type` mentioned below| -|length_function|Function that measures the length of given chunks|typing.Callable|Default value for various `data_type` mentioned below| - -Default values of chunker config parameters for different `data_type`: - -|data_type|chunk_size|chunk_overlap|length_function| -|---|---|---|---| -|docx|1000|0|len| -|text|300|0|len| -|qna_pair|300|0|len| -|web_page|500|0|len| -|pdf_file|1000|0|len| -|youtube_video|2000|0|len| -|docs_site|500|50|len| -|notion|300|0|len| - -## BaseLlmConfig - -|option|description|type|default| -|---|---|---|---| -|number_documents|Absolute number of documents to pull from the database as context.|int|1 -|template|custom template for prompt. If history is used with query, $history has to be included as well.|Template|Template("Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. \$context Query: \$query Helpful Answer:")| -|model|name of the model used.|string|depends on app type| -|temperature|Controls the randomness of the model's output. Higher values (closer to 1) make output more random, lower values make it more deterministic.|float|0| -|max_tokens|Controls how many tokens are used. Exact implementation (whether it counts prompt and/or response) depends on the model.|int|1000| -|top_p|Controls the diversity of words. Higher values (closer to 1) make word selection more diverse, lower values make words less diverse.|float|1| -|history|include conversation history from your client or database.|any (recommendation: list[str])|None| -|stream|control if response is streamed back to the user.|bool|False| -|deployment_name|t.b.a.|str|None| -|system_prompt|System prompt string. Unused if none.|str|None| -|where|filter for context search.|dict|None| diff --git a/docs/advanced/testing.mdx b/docs/advanced/testing.mdx deleted file mode 100644 index 1a8dc22c..00000000 --- a/docs/advanced/testing.mdx +++ /dev/null @@ -1,40 +0,0 @@ ---- -title: 'πŸ§ͺ Testing' ---- - -## Methods for testing - -### Dry Run - -Before you consume valueable tokens, you should make sure that data chunks are properly created and the embedding you have done works and that it's receiving the correct document from the database. - -- For `query` or `chat` method, you can add this to your script: - -```python -print(naval_chat_bot.query('Can you tell me who Naval Ravikant is?', dry_run=True)) - -''' -Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. - Q: Who is Naval Ravikant? -A: Naval Ravikant is an Indian-American entrepreneur and investor. - Query: Can you tell me who Naval Ravikant is? - Helpful Answer: -''' -``` - -_The embedding is confirmed to work as expected. It returns the right document, even if the question is asked slightly different. No prompt tokens have been consumed._ - -The dry run will still consume tokens to embed your query, but it is only **~1/15 of the prompt.** - - -- For `add` method, you can add this to your script: - -```python -print(naval_chat_bot.add('https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', dry_run=True)) - -''' -{'chunks': ['THE ALMANACK OF NAVAL RAVIKANT', 'GETTING RICH IS NOT JUST ABOUT LUCK;', 'HAPPINESS IS NOT JUST A TRAIT WE ARE'], 'metadata': [{'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 0, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}, {'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 2, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}, {'source': 'C:\\Users\\Dev\\AppData\\Local\\Temp\\tmp3g5mjoiz\\tmp.pdf', 'page': 2, 'url': 'https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf', 'data_type': 'pdf_file'}], 'count': 7358, 'type': } - -# less items to show for readability -''' -``` \ No newline at end of file diff --git a/docs/advanced/vector_database.mdx b/docs/advanced/vector_database.mdx deleted file mode 100644 index 964a3512..00000000 --- a/docs/advanced/vector_database.mdx +++ /dev/null @@ -1,118 +0,0 @@ ---- -title: 'πŸ’Ύ Vector Database' ---- - -We support `Chroma`, `Elasticsearch` and `OpenSearch` as vector databases. -`Chroma` is used as a default database. - -## Elasticsearch - -### Minimal Example - -In order to use `Elasticsearch` as vector database we need to use App type `CustomApp`. - -1. Set the environment variables in a `.env` file. -``` -OPENAI_API_KEY=sk-SECRETKEY -ELASTICSEARCH_API_KEY=SECRETKEY== -ELASTICSEARCH_URL=https://secret-domain.europe-west3.gcp.cloud.es.io:443 -``` -Please note that the key needs certain privileges. For testing you can just toggle off `restrict privileges` under `/app/management/security/api_keys/` in your web interface. - -2. Load the app -```python -from embedchain import CustomApp -from embedchain.embedder.openai import OpenAIEmbedder -from embedchain.llm.openai import OpenAILlm -from embedchain.vectordb.elasticsearch import ElasticsearchDB - -es_app = CustomApp( - llm=OpenAILlm(), - embedder=OpenAIEmbedder(), - db=ElasticsearchDB(), -) -``` - -### More custom settings - -You can get a URL for elasticsearch in the cloud, or run it locally. -The following example shows you how to configure embedchain to work with a locally running elasticsearch. - -Instead of using an API key, we use http login credentials. The localhost url can be defined in .env or in the config. - -```python -import os - -from embedchain import CustomApp -from embedchain.config import CustomAppConfig, ElasticsearchDBConfig -from embedchain.embedder.openai import OpenAIEmbedder -from embedchain.llm.openai import OpenAILlm -from embedchain.vectordb.elasticsearch import ElasticsearchDB - -es_config = ElasticsearchDBConfig( - # elasticsearch url or list of nodes url with different hosts and ports. - es_url='https://localhost:9200', - # pass named parameters supported by Python Elasticsearch client - http_auth=("elastic", "secret"), - ca_certs="~/binaries/elasticsearch-8.7.0/config/certs/http_ca.crt" # your cert path - # verify_certs=False # Alternative, if you aren't using certs -) # pass named parameters supported by elasticsearch-py - -es_app = CustomApp( - config=CustomAppConfig(log_level="INFO"), - llm=OpenAILlm(), - embedder=OpenAIEmbedder(), - db=ElasticsearchDB(config=es_config), -) -``` -3. This should log your connection details to the console. -4. Alternatively to a URL, you `ElasticsearchDBConfig` accepts `es_url` as a list of nodes url with different hosts and ports. -5. Additionally we can pass named parameters supported by Python Elasticsearch client. - - -## OpenSearch πŸ” - -To use OpenSearch as a vector database with a CustomApp, follow these simple steps: - -1. Set the `OPENAI_API_KEY` environment variable: - -``` -OPENAI_API_KEY=sk-xxxx -``` - -2. Define the OpenSearch configuration in your Python code: - -```python -from embedchain import CustomApp -from embedchain.config import OpenSearchDBConfig -from embedchain.embedder.openai import OpenAIEmbedder -from embedchain.llm.openai import OpenAILlm -from embedchain.vectordb.opensearch import OpenSearchDB - -opensearch_url = "https://localhost:9200" -http_auth = ("username", "password") - -db_config = OpenSearchDBConfig( - opensearch_url=opensearch_url, - http_auth=http_auth, - collection_name="embedchain-app", - use_ssl=True, - timeout=30, -) -db = OpenSearchDB(config=db_config) -``` - -2. Instantiate the app and add data: - -```python -app = CustomApp(llm=OpenAILlm(), embedder=OpenAIEmbedder(), db=db) -app.add("https://en.wikipedia.org/wiki/Elon_Musk") -app.add("https://www.forbes.com/profile/elon-musk") -app.add("https://www.britannica.com/biography/Elon-Musk") -``` - -3. You're all set! Start querying using the following command: - -```python -app.query("What is the net worth of Elon Musk?") -``` diff --git a/docs/community/connect-with-us.mdx b/docs/community/connect-with-us.mdx index 36b09e9f..d4bd2abb 100644 --- a/docs/community/connect-with-us.mdx +++ b/docs/community/connect-with-us.mdx @@ -4,15 +4,25 @@ title: 🀝 Connect with Us We believe in building a vibrant and supportive community around embedchain. There are various channels through which you can connect with us, stay updated, and contribute to the ongoing discussions: - -* Slack: Our Slack workspace provides a platform for more structured discussions and channels dedicated to different topics. Feel free to jump in and start contributing. [Join Slack](https://join.slack.com/t/embedchain/shared_invite/zt-22uwz3c46-Zg7cIh5rOBteT_xe1jwLDw). - -* Discord: Join our Discord server to engage in real-time conversations with the community members and the project maintainers. It’s a great place to seek help and discuss anything related to the project. [Join Discord](https://discord.gg/CUU9FPhRNt). - -* Twitter: Follow us on Twitter for the latest news, announcements, and highlights from our community. It’s also a quick way to reach out to us. [Follow @embedchain](https://twitter.com/embedchain). - -* LinkedIn: Connect with us on LinkedIn to stay updated on official announcements, job openings, and professional networking opportunities within our community. [Follow Our Page](https://www.linkedin.com/company/embedchain/). - -* Newsletter: Subscribe to our newsletter for a curated list of project updates, community contributions, and upcoming events. It’s a compact way to stay in the loop with what’s happening in our community. [Subscribe Now](https://embedchain.substack.com/). + + + Follow us on Twitter + + + Join our slack community + + + Join our discord community + + + Connect with us on LinkedIn + + + Schedule a call with Embedchain founder + + + Subscribe to our newsletter + + We look forward to connecting with you and seeing how we can create amazing things together! diff --git a/docs/components/embedding-models.mdx b/docs/components/embedding-models.mdx index 9966e99f..592bae7a 100644 --- a/docs/components/embedding-models.mdx +++ b/docs/components/embedding-models.mdx @@ -27,14 +27,14 @@ from embedchain import App os.environ['OPENAI_API_KEY'] = 'xxx' -# load embedding model configuration from openai.yaml file -app = App.from_config(yaml_path="openai.yaml") +# load embedding model configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") app.add("https://en.wikipedia.org/wiki/OpenAI") app.query("What is OpenAI?") ``` -```yaml openai.yaml +```yaml config.yaml embedder: provider: openai config: @@ -52,11 +52,11 @@ GPT4All supports generating high quality embeddings of arbitrary length document ```python main.py from embedchain import App -# load embedding model configuration from gpt4all.yaml file -app = App.from_config(yaml_path="gpt4all.yaml") +# load embedding model configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml gpt4all.yaml +```yaml config.yaml llm: provider: gpt4all model: 'orca-mini-3b.ggmlv3.q4_0.bin' @@ -83,11 +83,11 @@ Hugging Face supports generating embeddings of arbitrary length documents of tex ```python main.py from embedchain import App -# load embedding model configuration from huggingface.yaml file -app = App.from_config(yaml_path="huggingface.yaml") +# load embedding model configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml huggingface.yaml +```yaml config.yaml llm: provider: huggingface model: 'google/flan-t5-xxl' @@ -114,11 +114,11 @@ Embedchain supports Google's VertexAI embeddings model through a simple interfac ```python main.py from embedchain import App -# load embedding model configuration from vertexai.yaml file -app = App.from_config(yaml_path="vertexai.yaml") +# load embedding model configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml vertexai.yaml +```yaml config.yaml llm: provider: vertexai model: 'chat-bison' diff --git a/docs/components/llms.mdx b/docs/components/llms.mdx index a4d72556..8c644d6c 100644 --- a/docs/components/llms.mdx +++ b/docs/components/llms.mdx @@ -35,7 +35,7 @@ app.add("https://en.wikipedia.org/wiki/OpenAI") app.query("What is OpenAI?") ``` -If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/embedchain/yaml/chroma.yaml) file. +If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/configs/chroma.yaml) file. @@ -45,11 +45,11 @@ from embedchain import App os.environ['OPENAI_API_KEY'] = 'xxx' -# load llm configuration from openai.yaml file -app = App.from_config(yaml_path="openai.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml openai.yaml +```yaml config.yaml llm: provider: openai model: 'gpt-3.5-turbo' @@ -79,11 +79,11 @@ from embedchain import App os.environ["ANTHROPIC_API_KEY"] = "xxx" -# load llm configuration from anthropic.yaml file -app = App.from_config(yaml_path="anthropic.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml anthropic.yaml +```yaml config.yaml llm: provider: anthropic model: 'claude-instant-1' @@ -96,15 +96,14 @@ llm: -
- - -You may also have to set the `OPENAI_API_KEY` if you use the OpenAI's embedding model. - - - ## Cohere +Install related dependencies using the following command: + +```bash +pip install --upgrade 'embedchain[cohere]' +``` + Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys). Once you have the API key, you are all set to use it with Embedchain. @@ -117,11 +116,11 @@ from embedchain import App os.environ["COHERE_API_KEY"] = "xxx" -# load llm configuration from cohere.yaml file -app = App.from_config(yaml_path="cohere.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml cohere.yaml +```yaml config.yaml llm: provider: cohere model: large @@ -135,6 +134,12 @@ llm: ## GPT4ALL +Install related dependencies using the following command: + +```bash +pip install --upgrade 'embedchain[opensource]' +``` + GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code: @@ -142,11 +147,11 @@ GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or inte ```python main.py from embedchain import App -# load llm configuration from gpt4all.yaml file -app = App.from_config(yaml_path="gpt4all.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml gpt4all.yaml +```yaml config.yaml llm: provider: gpt4all model: 'orca-mini-3b.ggmlv3.q4_0.bin' @@ -177,11 +182,11 @@ import os from embedchain import App os.environ["JINACHAT_API_KEY"] = "xxx" -# load llm configuration from jina.yaml file -app = App.from_config(yaml_path="jina.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml jina.yaml +```yaml config.yaml llm: provider: jina config: @@ -195,6 +200,13 @@ llm: ## Hugging Face + +Install related dependencies using the following command: + +```bash +pip install --upgrade 'embedchain[huggingface_hub]' +``` + First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens). Once you have the token, load the app using the config yaml file: @@ -207,11 +219,11 @@ from embedchain import App os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx" -# load llm configuration from huggingface.yaml file -app = App.from_config(yaml_path="huggingface.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml huggingface.yaml +```yaml config.yaml llm: provider: huggingface model: 'google/flan-t5-xxl' @@ -237,11 +249,11 @@ from embedchain import App os.environ["REPLICATE_API_TOKEN"] = "xxx" -# load llm configuration from llama2.yaml file -app = App.from_config(yaml_path="llama2.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml llama2.yaml +```yaml config.yaml llm: provider: llama2 model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5' @@ -262,11 +274,11 @@ Setup Google Cloud Platform application credentials by following the instruction ```python main.py from embedchain import App -# load llm configuration from vertexai.yaml file -app = App.from_config(yaml_path="vertexai.yaml") +# load llm configuration from config.yaml file +app = App.from_config(yaml_path="config.yaml") ``` -```yaml vertexai.yaml +```yaml config.yaml llm: provider: vertexai model: 'chat-bison' diff --git a/docs/components/vector-databases.mdx b/docs/components/vector-databases.mdx index dfcb5845..e6caa6ef 100644 --- a/docs/components/vector-databases.mdx +++ b/docs/components/vector-databases.mdx @@ -25,10 +25,10 @@ Utilizing a vector database alongside Embedchain is a seamless process. All you from embedchain import App # load chroma configuration from yaml file -app = App.from_config(yaml_path="chroma-config-1.yaml") +app = App.from_config(yaml_path="config1.yaml") ``` -```yaml chroma-config-1.yaml +```yaml config1.yaml vectordb: provider: chroma config: @@ -37,7 +37,7 @@ vectordb: allow_reset: true ``` -```yaml chroma-config-2.yaml +```yaml config2.yaml vectordb: provider: chroma config: @@ -52,16 +52,22 @@ vectordb: ## Elasticsearch +Install related dependencies using the following command: + +```bash +pip install --upgrade 'embedchain[elasticsearch]' +``` + ```python main.py from embedchain import App # load elasticsearch configuration from yaml file -app = App.from_config(yaml_path="elasticsearch.yaml") +app = App.from_config(yaml_path="config.yaml") ``` -```yaml elasticsearch.yaml +```yaml config.yaml vectordb: provider: elasticsearch config: @@ -74,16 +80,22 @@ vectordb: ## OpenSearch +Install related dependencies using the following command: + +```bash +pip install --upgrade 'embedchain[opensearch]' +``` + ```python main.py from embedchain import App # load opensearch configuration from yaml file -app = App.from_config(yaml_path="opensearch.yaml") +app = App.from_config(yaml_path="config.yaml") ``` -```yaml opensearch.yaml +```yaml config.yaml vectordb: provider: opensearch config: @@ -101,16 +113,22 @@ vectordb: ## Zilliz +Install related dependencies using the following command: + +```bash +pip install --upgrade 'embedchain[milvus]' +``` + ```python main.py from embedchain import App # load zilliz configuration from yaml file -app = App.from_config(yaml_path="zilliz.yaml") +app = App.from_config(yaml_path="config.yaml") ``` -```yaml zilliz.yaml +```yaml config.yaml vectordb: provider: zilliz config: diff --git a/docs/contribution/dev.mdx b/docs/contribution/dev.mdx index 6a850961..cda2f7bc 100644 --- a/docs/contribution/dev.mdx +++ b/docs/contribution/dev.mdx @@ -35,12 +35,9 @@ embedchain is built on the following stack: ## Team -### Author +### Authors - Taranjeet Singh ([@taranjeetio](https://twitter.com/taranjeetio)) - -### Maintainer - - Deshraj Yadav ([@deshrajdry](https://twitter.com/taranjeetio)) ### Citation @@ -49,8 +46,8 @@ If you utilize this repository, please consider citing it with: ``` @misc{embedchain, - author = {Taranjeet Singh}, - title = {Embechain: Framework to easily create LLM powered bots over any dataset}, + author = {Taranjeet Singh, Deshraj Yadav}, + title = {Embechain: Data platform for LLMs - Load, index, retrieve and sync any unstructured data}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, diff --git a/docs/get-started/introduction.mdx b/docs/get-started/introduction.mdx index 422b4633..4a78c361 100644 --- a/docs/get-started/introduction.mdx +++ b/docs/get-started/introduction.mdx @@ -1,6 +1,6 @@ --- title: πŸ“š Introduction -description: 'πŸ“ Embedchain is a framework to easily create LLM powered apps on your data.' +description: 'πŸ“ Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data' --- ## πŸ€” What is Embedchain? @@ -27,9 +27,6 @@ naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American e naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?") # Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality. - -# Ask questions with specific context -naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?", where={'chapter': 'philosophy'}) ``` ## πŸš€ How it works? diff --git a/docs/get-started/quickstart.mdx b/docs/get-started/quickstart.mdx index 71d37bfe..c6344ebf 100644 --- a/docs/get-started/quickstart.mdx +++ b/docs/get-started/quickstart.mdx @@ -3,6 +3,8 @@ title: 'πŸš€ Quickstart' description: 'πŸ’‘ Start building LLM powered apps under 30 seconds' --- +Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data. Using embedchain, you can easily create LLM powered apps over any data. + Install embedchain python package: ```bash @@ -20,9 +22,11 @@ app = App() ```python -# Embed online resources +# Add different data sources elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk") elon_bot.add("https://www.forbes.com/profile/elon-musk") +# You can also add local data sources such as pdf, csv files etc. +# elon_bot.add("/path/to/file.pdf") ``` @@ -42,9 +46,11 @@ from embedchain import App os.environ["OPENAI_API_KEY"] = "xxx" elon_bot = App() -# Embed online resources +# Add different data sources elon_bot.add("https://en.wikipedia.org/wiki/Elon_Musk") elon_bot.add("https://www.forbes.com/profile/elon-musk") +# You can also add local data sources such as pdf, csv files etc. +# elon_bot.add("/path/to/file.pdf") response = elon_bot.query("What is the net worth of Elon Musk today?") print(response) diff --git a/examples/full_stack/frontend/src/pages/index.js b/examples/full_stack/frontend/src/pages/index.js index 37d94838..3307dbd4 100644 --- a/examples/full_stack/frontend/src/pages/index.js +++ b/examples/full_stack/frontend/src/pages/index.js @@ -28,7 +28,7 @@ export default function Home() { Welcome to Embedchain Playground

- embedchain is a framework to easily create LLM powered bots over any + Embedchain is a Data Platform for LLMs - Load, index, retrieve, and sync any unstructured data dataset

diff --git a/pyproject.toml b/pyproject.toml index cb71000e..734e9eb8 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [tool.poetry] name = "embedchain" version = "0.0.70" -description = "Embedchain is a framework to easily create LLM powered apps over any dataset" +description = "Data platform for LLMs - Load, index, retrieve and sync any unstructured data" authors = ["Taranjeet Singh, Deshraj Yadav"] license = "Apache License" readme = "README.md"