Add support for Hugging Face Inference Endpoint as LLM (#1143)

2024-01-08 13:20:04 -05:00
parent e36198dcc2
commit 62c0c52e31
5 changed files with 93 additions and 1 deletions
--- a/docs/components/llms.mdx
+++ b/docs/components/llms.mdx
@@ -494,6 +494,49 @@ llm:
 ```
 </CodeGroup>

+### Custom Endpoints
+
+
+You can also use [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index#-inference-endpoints) to access custom endpoints. First, set the `HUGGINGFACE_ACCESS_TOKEN` as above.
+
+Then, load the app using the config yaml file:
+
+<CodeGroup>
+
+```python main.py
+import os
+from embedchain import App
+
+os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"
+
+# load llm configuration from config.yaml file
+app = App.from_config(config_path="config.yaml")
+```
+
+```yaml config.yaml
+llm:
+  provider: huggingface 
+  config:
+    endpoint: https://api-inference.huggingface.co/models/gpt2 # replace with your personal endpoint
+```
+</CodeGroup>
+
+If your endpoint requires additional parameters, you can pass them in the `model_kwargs` field:
+
+```
+llm:
+  provider: huggingface 
+  config:
+    endpoint: <YOUR_ENDPOINT_URL_HERE>
+    model_kwargs:
+      max_new_tokens: 100
+      temperature: 0.5
+```
+
+Currently only supports `text-generation` and `text2text-generation` for now [[ref](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html?highlight=huggingfaceendpoint#)].
+
+See langchain's [hugging face endpoint](https://python.langchain.com/docs/integrations/chat/huggingface#huggingfaceendpoint) for more information. 
+
 ## Llama2

 Llama2 is integrated through [Replicate](https://replicate.com/).  Set `REPLICATE_API_TOKEN` in environment variable which you can obtain from [their platform](https://replicate.com/account/api-tokens).