Added support of vision input
This commit is contained in:
114
docs/features/multimodal-support.mdx
Normal file
114
docs/features/multimodal-support.mdx
Normal file
@@ -0,0 +1,114 @@
|
||||
---
|
||||
title: Multimodal Support
|
||||
---
|
||||
|
||||
Mem0 extends its capabilities beyond text by supporting multimodal data, including images. Users can seamlessly integrate images into their interactions, allowing Mem0 to extract pertinent information from visual content and enrich the memory system.
|
||||
|
||||
## How It Works
|
||||
|
||||
When a user provides an image, Mem0 processes the image to extract textual information and relevant details, which are then added to the user's memory. This feature enhances the system's ability to understand and remember details based on visual inputs.
|
||||
|
||||
<CodeGroup>
|
||||
```python Code
|
||||
from mem0 import MemoryClient
|
||||
|
||||
# Initialize the MemoryClient with your API key
|
||||
client = MemoryClient(api_key="your_api_key_here")
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hi, my name is Alice."
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "Nice to meet you, Alice! What do you like to eat?"
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": {
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "https://www.superhealthykids.com/wp-content/uploads/2021/10/best-veggie-pizza-featured-image-square-2.jpg"
|
||||
}
|
||||
}
|
||||
},
|
||||
]
|
||||
|
||||
# Calling the add method to ingest messages into the memory system
|
||||
client.add(messages, user_id="alice")
|
||||
```
|
||||
|
||||
```json Output
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"memory": "Name is Alice",
|
||||
"event": "ADD",
|
||||
"id": "7ae113a3-3cb5-46e9-b6f7-486c36391847"
|
||||
},
|
||||
{
|
||||
"memory": "Likes large pizza with toppings including cherry tomatoes, black olives, green spinach, yellow bell peppers, diced ham, and sliced mushrooms",
|
||||
"event": "ADD",
|
||||
"id": "56545065-7dee-4acf-8bf2-a5b2535aabb3"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
## Image Integration Methods
|
||||
|
||||
Mem0 allows you to add images to user interactions through two primary methods: by providing an image URL or by using a Base64-encoded image. Below are examples demonstrating each approach.
|
||||
|
||||
## 1. Using an Image URL (Recommended)
|
||||
|
||||
You can include an image by passing its direct URL. This method is simple and efficient for online images.
|
||||
|
||||
```python
|
||||
# Define the image URL
|
||||
image_url = "https://www.superhealthykids.com/wp-content/uploads/2021/10/best-veggie-pizza-featured-image-square-2.jpg"
|
||||
|
||||
# Create the message dictionary with the image URL
|
||||
image_message = {
|
||||
"role": "user",
|
||||
"content": {
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": image_url
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 2. Using Base64 Image Encoding for Local Files
|
||||
|
||||
For local images or scenarios where embedding the image directly is preferable, you can use a Base64-encoded string.
|
||||
|
||||
```python
|
||||
import base64
|
||||
|
||||
# Path to the image file
|
||||
image_path = "path/to/your/image.jpg"
|
||||
|
||||
# Encode the image in Base64
|
||||
with open(image_path, "rb") as image_file:
|
||||
base64_image = base64.b64encode(image_file.read()).decode("utf-8")
|
||||
|
||||
# Create the message dictionary with the Base64-encoded image
|
||||
image_message = {
|
||||
"role": "user",
|
||||
"content": {
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:image/jpeg;base64,{base64_image}"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
By utilizing these methods, you can effectively incorporate images into user interactions, enhancing the multimodal capabilities of your Mem0 instance.
|
||||
|
||||
If you have any questions, please feel free to reach out to us using one of the following methods:
|
||||
|
||||
<Snippet file="get-help.mdx" />
|
||||
40
docs/features/platform-overview.mdx
Normal file
40
docs/features/platform-overview.mdx
Normal file
@@ -0,0 +1,40 @@
|
||||
---
|
||||
title: Overview
|
||||
---
|
||||
|
||||
Learn about the key features and capabilities that make Mem0 a powerful platform for memory management and retrieval.
|
||||
|
||||
## Core Features
|
||||
|
||||
<CardGroup>
|
||||
<Card title="Advanced Retrieval" icon="magnifying-glass" href="/features/advanced-retrieval">
|
||||
Superior search results using state-of-the-art algorithms, including keyword search, reranking, and filtering capabilities.
|
||||
</Card>
|
||||
<Card title="Multimodal Support" icon="photo-film" href="/features/multimodal-support">
|
||||
Process and analyze various types of content including images.
|
||||
</Card>
|
||||
<Card title="Memory Customization" icon="filter" href="/features/selective-memory">
|
||||
Customize and curate stored memories to focus on relevant information while excluding unnecessary data, enabling improved accuracy, privacy control, and resource efficiency.
|
||||
</Card>
|
||||
<Card title="Custom Categories" icon="tags" href="/features/custom-categories">
|
||||
Create and manage custom categories to organize memories based on your specific needs and requirements.
|
||||
</Card>
|
||||
<Card title="Custom Instructions" icon="list-check" href="/features/custom-instructions">
|
||||
Define specific guidelines for your project to ensure consistent handling of information and requirements.
|
||||
</Card>
|
||||
<Card title="Direct Import" icon="message-bot" href="/features/direct-import">
|
||||
Tailor the behavior of your Mem0 instance with custom prompts for specific use cases or domains.
|
||||
</Card>
|
||||
<Card title="Async Client" icon="bolt" href="/features/async-client">
|
||||
Asynchronous client for non-blocking operations and high concurrency applications.
|
||||
</Card>
|
||||
<Card title="Memory Export" icon="file-export" href="/features/memory-export">
|
||||
Export memories in structured formats using customizable Pydantic schemas.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you have any questions about these features or need assistance, our team is here to help:
|
||||
|
||||
<Snippet file="get-help.mdx" />
|
||||
@@ -75,7 +75,7 @@
|
||||
"platform/quickstart",
|
||||
{
|
||||
"group": "Features",
|
||||
"pages": ["features/advanced-retrieval", "features/selective-memory", "features/custom-categories", "features/custom-instructions", "features/direct-import", "features/async-client", "features/memory-export"]
|
||||
"pages": ["features/platform-overview", "features/advanced-retrieval", "features/multimodal-support", "features/selective-memory", "features/custom-categories", "features/custom-instructions", "features/direct-import", "features/async-client", "features/memory-export"]
|
||||
}
|
||||
]
|
||||
},
|
||||
@@ -149,7 +149,7 @@
|
||||
},
|
||||
{
|
||||
"group": "Features",
|
||||
"pages": ["features/openai_compatibility", "features/custom-prompts"]
|
||||
"pages": ["features/openai_compatibility", "features/custom-prompts", "open-source/multimodal-support"]
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
117
docs/open-source/multimodal-support.mdx
Normal file
117
docs/open-source/multimodal-support.mdx
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
title: Multimodal Support
|
||||
---
|
||||
|
||||
Mem0 extends its capabilities beyond text by supporting multimodal data, including images. Users can seamlessly integrate images into their interactions, allowing Mem0 to extract pertinent information from visual content and enrich the memory system.
|
||||
|
||||
## How It Works
|
||||
|
||||
When a user provides an image, Mem0 processes the image to extract textual information and relevant details, which are then added to the user's memory. This feature enhances the system's ability to understand and remember details based on visual inputs.
|
||||
|
||||
<CodeGroup>
|
||||
```python Code
|
||||
from mem0 import Memory
|
||||
|
||||
client = Memory()
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hi, my name is Alice."
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "Nice to meet you, Alice! What do you like to eat?"
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": {
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "https://www.superhealthykids.com/wp-content/uploads/2021/10/best-veggie-pizza-featured-image-square-2.jpg"
|
||||
}
|
||||
}
|
||||
},
|
||||
]
|
||||
|
||||
# Calling the add method to ingest messages into the memory system
|
||||
client.add(messages, user_id="alice")
|
||||
```
|
||||
|
||||
```json Output
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"memory": "Name is Alice",
|
||||
"event": "ADD",
|
||||
"id": "7ae113a3-3cb5-46e9-b6f7-486c36391847"
|
||||
},
|
||||
{
|
||||
"memory": "Likes large pizza with toppings including cherry tomatoes, black olives, green spinach, yellow bell peppers, diced ham, and sliced mushrooms",
|
||||
"event": "ADD",
|
||||
"id": "56545065-7dee-4acf-8bf2-a5b2535aabb3"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
## Image Integration Methods
|
||||
|
||||
Mem0 allows you to add images to user interactions through two primary methods: by providing an image URL or by using a Base64-encoded image. Below are examples demonstrating each approach.
|
||||
|
||||
## 1. Using an Image URL (Recommended)
|
||||
|
||||
You can include an image by passing its direct URL. This method is simple and efficient for online images.
|
||||
|
||||
```python
|
||||
# Define the image URL
|
||||
image_url = "https://www.superhealthykids.com/wp-content/uploads/2021/10/best-veggie-pizza-featured-image-square-2.jpg"
|
||||
|
||||
# Create the message dictionary with the image URL
|
||||
image_message = {
|
||||
"role": "user",
|
||||
"content": {
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": image_url
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 2. Using Base64 Image Encoding for Local Files
|
||||
|
||||
For local images or scenarios where embedding the image directly is preferable, you can use a Base64-encoded string.
|
||||
|
||||
```python
|
||||
import base64
|
||||
|
||||
# Path to the image file
|
||||
image_path = "path/to/your/image.jpg"
|
||||
|
||||
# Encode the image in Base64
|
||||
with open(image_path, "rb") as image_file:
|
||||
base64_image = base64.b64encode(image_file.read()).decode("utf-8")
|
||||
|
||||
# Create the message dictionary with the Base64-encoded image
|
||||
image_message = {
|
||||
"role": "user",
|
||||
"content": {
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:image/jpeg;base64,{base64_image}"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
By utilizing these methods, you can effectively incorporate images into user interactions, enhancing the multimodal capabilities of your Mem0 instance.
|
||||
|
||||
<Note>
|
||||
Currently, we support only OpenAI models for image description.
|
||||
</Note>
|
||||
|
||||
If you have any questions, please feel free to reach out to us using one of the following methods:
|
||||
|
||||
<Snippet file="get-help.mdx" />
|
||||
@@ -63,6 +63,7 @@ class OpenAILLM(LLMBase):
|
||||
response_format=None,
|
||||
tools: Optional[List[Dict]] = None,
|
||||
tool_choice: str = "auto",
|
||||
max_tokens: int = 100,
|
||||
):
|
||||
"""
|
||||
Generate a response based on the given messages using OpenAI.
|
||||
@@ -80,7 +81,7 @@ class OpenAILLM(LLMBase):
|
||||
"model": self.config.model,
|
||||
"messages": messages,
|
||||
"temperature": self.config.temperature,
|
||||
"max_tokens": self.config.max_tokens,
|
||||
"max_tokens": max_tokens,
|
||||
"top_p": self.config.top_p,
|
||||
}
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@ from typing import Any, Dict
|
||||
|
||||
import pytz
|
||||
from pydantic import ValidationError
|
||||
|
||||
from mem0.memory.utils import parse_vision_messages
|
||||
from mem0.configs.base import MemoryConfig, MemoryItem
|
||||
from mem0.configs.prompts import get_update_memory_messages
|
||||
from mem0.memory.base import MemoryBase
|
||||
@@ -114,6 +114,8 @@ class Memory(MemoryBase):
|
||||
if isinstance(messages, str):
|
||||
messages = [{"role": "user", "content": messages}]
|
||||
|
||||
messages = parse_vision_messages(messages)
|
||||
|
||||
with concurrent.futures.ThreadPoolExecutor() as executor:
|
||||
future1 = executor.submit(self._add_to_vector_store, messages, metadata, filters)
|
||||
future2 = executor.submit(self._add_to_graph, messages, filters)
|
||||
@@ -143,7 +145,7 @@ class Memory(MemoryBase):
|
||||
|
||||
if self.custom_prompt:
|
||||
system_prompt = self.custom_prompt
|
||||
user_prompt = f"Input: {parsed_messages}"
|
||||
user_prompt = f"Input:\n{parsed_messages}"
|
||||
else:
|
||||
system_prompt, user_prompt = get_fact_retrieval_messages(parsed_messages)
|
||||
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
import re
|
||||
|
||||
from mem0.configs.prompts import FACT_RETRIEVAL_PROMPT
|
||||
from mem0.llms.openai import OpenAILLM
|
||||
|
||||
|
||||
def get_fact_retrieval_messages(message):
|
||||
return FACT_RETRIEVAL_PROMPT, f"Input: {message}"
|
||||
return FACT_RETRIEVAL_PROMPT, f"Input:\n{message}"
|
||||
|
||||
|
||||
def parse_messages(messages):
|
||||
@@ -43,3 +43,45 @@ def remove_code_blocks(content: str) -> str:
|
||||
pattern = r"^```[a-zA-Z0-9]*\n([\s\S]*?)\n```$"
|
||||
match = re.match(pattern, content.strip())
|
||||
return match.group(1).strip() if match else content.strip()
|
||||
|
||||
|
||||
def get_image_description(image_url):
|
||||
"""
|
||||
Get the description of the image
|
||||
"""
|
||||
llm = OpenAILLM()
|
||||
response = llm.generate_response(
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": "Provide a description of the image and do not include any additional text."},
|
||||
{"type": "image_url", "image_url": {"url": image_url}}
|
||||
],
|
||||
},
|
||||
],
|
||||
max_tokens=100,
|
||||
)
|
||||
return response
|
||||
|
||||
|
||||
def parse_vision_messages(messages):
|
||||
"""
|
||||
Parse the vision messages from the messages
|
||||
"""
|
||||
returned_messages = []
|
||||
for msg in messages:
|
||||
if msg["role"] != "system":
|
||||
if not isinstance(msg["content"], str) and msg["content"]["type"] == "image_url":
|
||||
image_url = msg["content"]["image_url"]["url"]
|
||||
try:
|
||||
description = get_image_description(image_url)
|
||||
msg["content"]["text"] = description
|
||||
returned_messages.append({"role": msg["role"], "content": description})
|
||||
except Exception:
|
||||
raise Exception(f"Error while downloading {image_url}.")
|
||||
else:
|
||||
returned_messages.append(msg)
|
||||
else:
|
||||
returned_messages.append(msg)
|
||||
return returned_messages
|
||||
|
||||
1068
poetry.lock
generated
1068
poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -1,6 +1,6 @@
|
||||
[tool.poetry]
|
||||
name = "mem0ai"
|
||||
version = "0.1.50"
|
||||
version = "0.1.51"
|
||||
description = "Long-term memory for AI Agents"
|
||||
authors = ["Mem0 <founders@mem0.ai>"]
|
||||
exclude = [
|
||||
|
||||
Reference in New Issue
Block a user