Improve multimodal functionality (#2297)

This commit is contained in:
Dev Khant
2025-03-10 23:33:18 +05:30
committed by GitHub
parent e9a0be66d8
commit 9c0954133f
4 changed files with 123 additions and 26 deletions

View File

@@ -10,11 +10,25 @@ Mem0 extends its capabilities beyond text by supporting multimodal data, includi
When a user provides an image, Mem0 processes the image to extract textual information and relevant details, which are then added to the user's memory. This feature enhances the system's ability to understand and remember details based on visual inputs.
<Note>
To enable multimodal support, you must set `enable_vision = True` in your configuration. The `vision_details` parameter can be set to "auto" (default), "low", or "high" to control the level of detail in image processing.
</Note>
<CodeGroup>
```python Code
from mem0 import Memory
client = Memory()
config = {
"llm": {
"provider": "openai",
"config": {
"enable_vision": True,
"vision_details": "high"
}
}
}
client = Memory.from_config(config=config)
messages = [
{
@@ -182,11 +196,72 @@ await client.add([imageMessage], { userId: "alice" })
```
</CodeGroup>
By utilizing these methods, you can effectively incorporate images into user interactions, enhancing the multimodal capabilities of your Mem0 instance.
## 3. OpenAI-Compatible Message Format
<Note>
Currently, we support only OpenAI models for image description.
</Note>
You can also use the OpenAI-compatible format to combine text and images in a single message:
<CodeGroup>
```python Python
import base64
# Path to the image file
image_path = "path/to/your/image.jpg"
# Encode the image in Base64
with open(image_path, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode("utf-8")
# Create the message using OpenAI-compatible format
message = {
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
],
}
# Add the message to memory
client.add([message], user_id="alice")
```
```typescript TypeScript
import { Memory, Message } from "mem0ai/oss";
const client = new Memory();
const imagePath = "path/to/your/image.jpg";
const base64Image = fs.readFileSync(imagePath, { encoding: 'base64' });
const message: Message = {
role: "user",
content: [
{
type: "text",
text: "What is in this image?",
},
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${base64Image}`
}
},
],
}
await client.add([message], { userId: "alice" })
```
</CodeGroup>
This format allows you to combine text and images in a single message, making it easier to provide context along with visual content.
By utilizing these methods, you can effectively incorporate images into user interactions, enhancing the multimodal capabilities of your Mem0 instance.
If you have any questions, please feel free to reach out to us using one of the following methods: