Doc: Update API reference section and Add elevenlabs example (#2447)

This commit is contained in:
Dev Khant
2025-03-27 00:13:09 +05:30
committed by GitHub
parent 32ba13b3ea
commit 45e5f2af93
5 changed files with 515 additions and 9 deletions

View File

@@ -0,0 +1,454 @@
---
title: ElevenLabs
---
Create voice-based conversational AI agents with memory capabilities by integrating ElevenLabs and Mem0. This integration enables persistent, context-aware voice interactions that remember past conversations.
## Overview
In this guide, we'll build a voice agent that:
1. Uses ElevenLabs Conversational AI for voice interaction
2. Leverages Mem0 to store and retrieve memories from past conversations
3. Provides personalized responses based on user history
## Setup and Configuration
Install necessary libraries:
```bash
pip install elevenlabs mem0 python-dotenv
```
Configure your environment variables:
<Note>You'll need both an ElevenLabs API key and a Mem0 API key to use this integration.</Note>
```bash
# Create a .env file with these variables
AGENT_ID=your-agent-id
USER_ID=unique-user-identifier
ELEVENLABS_API_KEY=your-elevenlabs-api-key
MEM0_API_KEY=your-mem0-api-key
```
## Integration Code Breakdown
Let's break down the implementation into manageable parts:
### 1. Imports and Environment Setup
First, we import required libraries and set up the environment:
```python
import os
import signal
import sys
from mem0 import AsyncMemoryClient
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
from elevenlabs.conversational_ai.conversation import ClientTools
```
These imports provide:
- Standard Python libraries for system operations and signal handling
- `AsyncMemoryClient` from Mem0 for memory operations
- ElevenLabs components for voice interaction
### 2. Environment Variables and Validation
Next, we validate the required environment variables:
```python
def main():
# Required environment variables
AGENT_ID = os.environ.get('AGENT_ID')
USER_ID = os.environ.get('USER_ID')
API_KEY = os.environ.get('ELEVENLABS_API_KEY')
MEM0_API_KEY = os.environ.get('MEM0_API_KEY')
# Validate required environment variables
if not AGENT_ID:
sys.stderr.write("AGENT_ID environment variable must be set\n")
sys.exit(1)
if not USER_ID:
sys.stderr.write("USER_ID environment variable must be set\n")
sys.exit(1)
if not API_KEY:
sys.stderr.write("ELEVENLABS_API_KEY not set, assuming the agent is public\n")
if not MEM0_API_KEY:
sys.stderr.write("MEM0_API_KEY environment variable must be set\n")
sys.exit(1)
# Set up Mem0 API key in the environment
os.environ['MEM0_API_KEY'] = MEM0_API_KEY
```
This section:
- Retrieves required environment variables
- Performs validation to ensure required variables are present
- Exits the application with an error message if required variables are missing
- Sets the Mem0 API key in the environment for the Mem0 client to use
### 3. Client Initialization
Initialize both the ElevenLabs and Mem0 clients:
```python
# Initialize ElevenLabs client
client = ElevenLabs(api_key=API_KEY)
# Initialize memory client and tools
client_tools = ClientTools()
mem0_client = AsyncMemoryClient()
```
Here we:
- Create an ElevenLabs client with the API key
- Initialize a ClientTools object for registering function tools
- Create an AsyncMemoryClient instance for Mem0 interactions
### 4. Memory Function Definitions
Define the two key memory functions that will be registered as tools:
```python
# Define memory-related functions for the agent
async def add_memories(parameters):
"""Add a message to the memory store"""
message = parameters.get("message")
await mem0_client.add(
messages=message,
user_id=USER_ID,
output_format="v1.1",
version="v2"
)
return "Memory added successfully"
async def retrieve_memories(parameters):
"""Retrieve relevant memories based on the input message"""
message = parameters.get("message")
# Set up filters to retrieve memories for this specific user
filters = {
"AND": [
{
"user_id": USER_ID
}
]
}
# Search for relevant memories using the message as a query
results = await mem0_client.search(
query=message,
version="v2",
filters=filters
)
# Extract and join the memory texts
memories = ' '.join([result["memory"] for result in results])
print("[ Memories ]", memories)
if memories:
return memories
return "No memories found"
```
These functions:
#### `add_memories`:
- Takes a message parameter containing information to remember
- Stores the message in Mem0 using the `add` method
- Associates the memory with the specific USER_ID
- Returns a success message to the agent
#### `retrieve_memories`:
- Takes a message parameter as the search query
- Sets up filters to only retrieve memories for the current user
- Uses semantic search to find relevant memories
- Joins all retrieved memories into a single text
- Prints retrieved memories to the console for debugging
- Returns the memories or a "No memories found" message if none are found
### 5. Registering Memory Functions as Tools
Register the memory functions with the ElevenLabs ClientTools system:
```python
# Register the memory functions as tools for the agent
client_tools.register("addMemories", add_memories, is_async=True)
client_tools.register("retrieveMemories", retrieve_memories, is_async=True)
```
This allows the ElevenLabs agent to:
- Access these functions through function calling
- Wait for asynchronous results (is_async=True)
- Call these functions by name ("addMemories" and "retrieveMemories")
### 6. Conversation Setup
Configure the conversation with ElevenLabs:
```python
# Initialize the conversation
conversation = Conversation(
client,
AGENT_ID,
# Assume auth is required when API_KEY is set
requires_auth=bool(API_KEY),
audio_interface=DefaultAudioInterface(),
client_tools=client_tools,
callback_agent_response=lambda response: print(f"Agent: {response}"),
callback_agent_response_correction=lambda original, corrected: print(f"Agent: {original} -> {corrected}"),
callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
# callback_latency_measurement=lambda latency: print(f"Latency: {latency}ms"),
)
```
This sets up the conversation with:
- The ElevenLabs client and Agent ID
- Authentication requirements based on API key presence
- DefaultAudioInterface for handling audio I/O
- The client_tools with our memory functions
- Callback functions for:
- Displaying agent responses
- Showing corrected responses (when the agent self-corrects)
- Displaying user transcripts for debugging
- (Commented out) Latency measurements
### 7. Conversation Management
Start and manage the conversation:
```python
# Start the conversation
print(f"Starting conversation with user_id: {USER_ID}")
conversation.start_session()
# Handle Ctrl+C to gracefully end the session
signal.signal(signal.SIGINT, lambda sig, frame: conversation.end_session())
# Wait for the conversation to end and get the conversation ID
conversation_id = conversation.wait_for_session_end()
print(f"Conversation ID: {conversation_id}")
if __name__ == '__main__':
main()
```
This final section:
- Prints a message indicating the conversation has started
- Starts the conversation session
- Sets up a signal handler to gracefully end the session on Ctrl+C
- Waits for the session to end and gets the conversation ID
- Prints the conversation ID for reference
## Memory Tools Overview
This integration provides two key memory functions to your conversational AI agent:
### 1. Adding Memories (`addMemories`)
The `addMemories` tool allows your agent to store important information during a conversation, including:
- User preferences
- Important facts shared by the user
- Decisions or commitments made during the conversation
- Action items to follow up on
When the agent identifies information worth remembering, it calls this function to store it in the Mem0 database with the appropriate user ID.
#### How it works:
1. The agent identifies information that should be remembered
2. It formats the information as a message string
3. It calls the `addMemories` function with this message
4. The function stores the memory in Mem0 linked to the user's ID
5. Later conversations can retrieve this memory
#### Example usage in agent prompt:
```
When the user shares important information like preferences or personal details,
use the addMemories function to store this information for future reference.
```
### 2. Retrieving Memories (`retrieveMemories`)
The `retrieveMemories` tool allows your agent to search for and retrieve relevant memories from previous conversations. The agent can:
- Search for context related to the current topic
- Recall user preferences
- Remember previous interactions on similar topics
- Create continuity across multiple sessions
#### How it works:
1. The agent needs context for the current conversation
2. It calls `retrieveMemories` with the current conversation topic or question
3. The function performs a semantic search in Mem0
4. Relevant memories are returned to the agent
5. The agent incorporates these memories into its response
#### Example usage in agent prompt:
```
At the beginning of each conversation turn, use retrieveMemories to check if we've
discussed this topic before or if the user has shared relevant preferences.
```
## Configuring Your ElevenLabs Agent
To enable your agent to effectively use memory:
1. Add function calling capabilities to your agent in the ElevenLabs platform:
- Go to your agent settings in the ElevenLabs platform
- Navigate to the "Tools" section
- Enable function calling for your agent
- Add the memory tools as described below
2. Add the `addMemories` and `retrieveMemories` tools to your agent with these specifications:
For `addMemories`:
```json
{
"name": "addMemories",
"description": "Stores important information from the conversation to remember for future interactions",
"parameters": {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The important information to remember"
}
},
"required": ["message"]
}
}
```
For `retrieveMemories`:
```json
{
"name": "retrieveMemories",
"description": "Retrieves relevant information from past conversations",
"parameters": {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The query to search for in past memories"
}
},
"required": ["message"]
}
}
```
3. Update your agent's prompt to instruct it to use these memory functions. For example:
```
You are a helpful voice assistant that remembers past conversations with the user.
You have access to memory tools that allow you to remember important information:
- Use retrieveMemories at the beginning of the conversation to recall relevant context from prior conversations
- Use addMemories to store new important information such as:
* User preferences
* Personal details the user shares
* Important decisions made
* Tasks or follow-ups promised to the user
Before responding to complex questions, always check for relevant memories first.
When the user shares important information, make sure to store it for future reference.
```
## Example Conversation Flow
Here's how a typical conversation with memory might flow:
1. **User speaks**: "Hi, do you remember my favorite color?"
2. **Agent retrieves memories**:
```python
# Agent calls retrieve_memories
memories = retrieve_memories({"message": "user's favorite color"})
# If found: "The user's favorite color is blue"
```
3. **Agent processes with context**:
- If memories found: Prepares a personalized response
- If no memories: Prepares to ask and store the information
4. **Agent responds**:
- With memory: "Yes, your favorite color is blue!"
- Without memory: "I don't think you've told me your favorite color before. What is it?"
5. **User responds**: "It's actually green."
6. **Agent stores new information**:
```python
# Agent calls add_memories
add_memories({"message": "The user's favorite color is green"})
```
7. **Agent confirms**: "Thanks, I'll remember that your favorite color is green."
## Example Use Cases
- **Personal Assistant** - Remember user preferences, past requests, and important dates
```
User: "What restaurants did I say I liked last time?"
Agent: *retrieves memories* "You mentioned enjoying Bella Italia and The Golden Dragon."
```
- **Customer Support** - Recall previous issues a customer has had
```
User: "I'm having that same problem again!"
Agent: *retrieves memories* "Is this related to the login issue you reported last week?"
```
- **Educational AI** - Track student progress and tailor teaching accordingly
```
User: "Let's continue our math lesson."
Agent: *retrieves memories* "Last time we were working on quadratic equations. Would you like to continue with that?"
```
- **Healthcare Assistant** - Remember symptoms, medications, and health concerns
```
User: "Have I told you about my allergy medication?"
Agent: *retrieves memories* "Yes, you mentioned you're taking Claritin for your pollen allergies."
```
## Troubleshooting
- **Missing API Keys**:
- Error: "API_KEY environment variable must be set"
- Solution: Ensure all environment variables are set correctly in your .env file or system environment
- **Connection Issues**:
- Error: "Failed to connect to API"
- Solution: Check your network connection and API key permissions. Verify the API keys are valid and have the necessary permissions.
- **Empty Memory Results**:
- Symptom: Agent always responds with "No memories found"
- Solution: This is normal for new users. The memory database builds up over time as conversations occur. It's also possible your query isn't semantically similar to stored memories - try different phrasing.
- **Agent Not Using Memories**:
- Symptom: The agent retrieves memories but doesn't incorporate them in responses
- Solution: Update the agent's prompt to explicitly instruct it to use the retrieved memories in its responses
## Conclusion
By integrating ElevenLabs Conversational AI with Mem0, you can create voice agents that maintain context across conversations and provide personalized responses based on user history. This powerful combination enables:
- More natural, context-aware conversations
- Personalized user experiences that improve over time
- Reduced need for users to repeat information
- Long-term relationship building between users and AI agents
## Help
- For more details on ElevenLabs, visit the [ElevenLabs Conversational AI Documentation](https://elevenlabs.io/docs/api-reference/conversational-ai)
- For Mem0 documentation, refer to the [Mem0 Platform](https://app.mem0.ai/)
- If you need further assistance, please feel free to reach out to us through the following methods:
<Snippet file="get-help.mdx" />

View File

@@ -0,0 +1,353 @@
---
title: Livekit
---
This guide demonstrates how to create a memory-enabled voice assistant using LiveKit, Deepgram, OpenAI, and Mem0, focusing on creating an intelligent, context-aware travel planning agent.
## Prerequisites
Before you begin, make sure you have:
1. Installed Livekit Agents SDK with voice dependencies of silero and deepgram:
```bash
pip install livekit \
livekit-agents \
livekit-plugins-silero \
livekit-plugins-deepgram \
livekit-plugins-openai
```
2. Installed Mem0 SDK:
```bash
pip install mem0ai
```
3. Set up your API keys in a `.env` file:
```sh
LIVEKIT_URL=your_livekit_url
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
DEEPGRAM_API_KEY=your_deepgram_api_key
MEM0_API_KEY=your_mem0_api_key
OPENAI_API_KEY=your_openai_api_key
```
> **Note**: Make sure to have a Livekit and Deepgram account. You can find these variables `LIVEKIT_URL` , `LIVEKIT_API_KEY` and `LIVEKIT_API_SECRET` from [LiveKit Cloud Console](https://cloud.livekit.io/) and for more information you can refer this website [LiveKit Documentation](https://docs.livekit.io/home/cloud/keys-and-tokens/). For `DEEPGRAM_API_KEY` you can get from [Deepgram Console](https://console.deepgram.com/) refer this website [Deepgram Documentation](https://developers.deepgram.com/docs/create-additional-api-keys) for more details.
## Code Breakdown
Let's break down the key components of this implementation:
### 1. Setting Up Dependencies and Environment
```python
import asyncio
import logging
import os
from typing import List, Dict, Any, Annotated
import aiohttp
from dotenv import load_dotenv
from livekit.agents import (
AutoSubscribe,
JobContext,
JobProcess,
WorkerOptions,
cli,
llm,
metrics,
)
from livekit import rtc, api
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero
from mem0 import AsyncMemoryClient
# Load environment variables
load_dotenv()
# Configure logging
logger = logging.getLogger("memory-assistant")
logger.setLevel(logging.INFO)
# Define a global user ID for simplicity
USER_ID = "voice_user"
# Initialize Mem0 client
mem0 = AsyncMemoryClient()
```
This section handles:
- Importing required modules
- Loading environment variables
- Setting up logging
- Extracting user identification
- Initializing the Mem0 client
### 2. Memory Enrichment Function
```python
async def _enrich_with_memory(agent: VoicePipelineAgent, chat_ctx: llm.ChatContext):
"""Add memories and Augment chat context with relevant memories"""
if not chat_ctx.messages:
return
# Store user message in Mem0
user_msg = chat_ctx.messages[-1]
await mem0.add(
[{"role": "user", "content": user_msg.content}],
user_id=USER_ID
)
# Search for relevant memories
results = await mem0.search(
user_msg.content,
user_id=USER_ID,
)
# Augment context with retrieved memories
if results:
memories = ' '.join([result["memory"] for result in results])
logger.info(f"Enriching with memory: {memories}")
rag_msg = llm.ChatMessage.create(
text=f"Relevant Memory: {memories}\n",
role="assistant",
)
# Modify chat context with retrieved memories
chat_ctx.messages[-1] = rag_msg
chat_ctx.messages.append(user_msg)
```
This function:
- Stores user messages in Mem0
- Performs semantic search for relevant memories
- Augments the chat context with retrieved memories
- Enables contextually aware responses
### 3. Prewarm and Entrypoint Functions
```python
def prewarm_process(proc: JobProcess):
# Preload silero VAD in memory to speed up session start
proc.userdata["vad"] = silero.VAD.load()
async def entrypoint(ctx: JobContext):
# Connect to LiveKit room
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
# Wait for participant
participant = await ctx.wait_for_participant()
# Initialize Mem0 client
mem0 = AsyncMemoryClient()
# Define initial system context
initial_ctx = llm.ChatContext().append(
role="system",
text=(
"""
You are a helpful voice assistant.
You are a travel guide named George and will help the user to plan a travel trip of their dreams.
You should help the user plan for various adventures like work retreats, family vacations or solo backpacking trips.
You should be careful to not suggest anything that would be dangerous, illegal or inappropriate.
You can remember past interactions and use them to inform your answers.
Use semantic memory retrieval to provide contextually relevant responses.
"""
),
)
# Create VoicePipelineAgent with memory capabilities
agent = VoicePipelineAgent(
chat_ctx=initial_ctx,
vad=silero.VAD.load(),
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o-mini"),
tts=openai.TTS(),
before_llm_cb=_enrich_with_memory,
)
# Start agent and initial greeting
agent.start(ctx.room, participant)
await agent.say(
"Hello! I'm George. Can I help you plan an upcoming trip? ",
allow_interruptions=True
)
# Run the application
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm_process))
```
The entrypoint function:
- Connects to LiveKit room
- Initializes Mem0 memory client
- Sets up initial system context
- Creates a VoicePipelineAgent with memory enrichment
- Starts the agent with an initial greeting
## Create a Memory-Enabled Voice Agent
Now that we've explained each component, here's the complete implementation that combines OpenAI Agents SDK for voice with Mem0's memory capabilities:
```python
import asyncio
import logging
import os
from typing import List, Dict, Any, Annotated
import aiohttp
from dotenv import load_dotenv
from livekit.agents import (
AutoSubscribe,
JobContext,
JobProcess,
WorkerOptions,
cli,
llm,
metrics,
)
from livekit import rtc, api
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero
from mem0 import AsyncMemoryClient
# Load environment variables
load_dotenv()
# Configure logging
logger = logging.getLogger("memory-assistant")
logger.setLevel(logging.INFO)
# Define a global user ID for simplicity
USER_ID = "voice_user"
# Initialize Mem0 memory client
mem0 = AsyncMemoryClient()
def prewarm_process(proc: JobProcess):
# Preload silero VAD in memory to speed up session start
proc.userdata["vad"] = silero.VAD.load()
async def entrypoint(ctx: JobContext):
# Connect to LiveKit room
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
# Wait for participant
participant = await ctx.wait_for_participant()
async def _enrich_with_memory(agent: VoicePipelineAgent, chat_ctx: llm.ChatContext):
"""Add memories and Augment chat context with relevant memories"""
if not chat_ctx.messages:
return
# Store user message in Mem0
user_msg = chat_ctx.messages[-1]
await mem0.add(
[{"role": "user", "content": user_msg.content}],
user_id=USER_ID
)
# Search for relevant memories
results = await mem0.search(
user_msg.content,
user_id=USER_ID,
)
# Augment context with retrieved memories
if results:
memories = ' '.join([result["memory"] for result in results])
logger.info(f"Enriching with memory: {memories}")
rag_msg = llm.ChatMessage.create(
text=f"Relevant Memory: {memories}\n",
role="assistant",
)
# Modify chat context with retrieved memories
chat_ctx.messages[-1] = rag_msg
chat_ctx.messages.append(user_msg)
# Define initial system context
initial_ctx = llm.ChatContext().append(
role="system",
text=(
"""
You are a helpful voice assistant.
You are a travel guide named George and will help the user to plan a travel trip of their dreams.
You should help the user plan for various adventures like work retreats, family vacations or solo backpacking trips.
You should be careful to not suggest anything that would be dangerous, illegal or inappropriate.
You can remember past interactions and use them to inform your answers.
Use semantic memory retrieval to provide contextually relevant responses.
"""
),
)
# Create VoicePipelineAgent with memory capabilities
agent = VoicePipelineAgent(
chat_ctx=initial_ctx,
vad=silero.VAD.load(),
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o-mini"),
tts=openai.TTS(),
before_llm_cb=_enrich_with_memory,
)
# Start agent and initial greeting
agent.start(ctx.room, participant)
await agent.say(
"Hello! I'm George. Can I help you plan an upcoming trip? ",
allow_interruptions=True
)
# Run the application
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm_process))
```
## Key Features of This Implementation
1. **Semantic Memory Retrieval**: Uses Mem0 to store and retrieve contextually relevant memories
2. **Voice Interaction**: Leverages LiveKit for voice communication
3. **Intelligent Context Management**: Augments conversations with past interactions
4. **Travel Planning Specialization**: Focused on creating a helpful travel guide assistant
## Running the Example
To run this example:
1. Install all required dependencies
2. Set up your `.env` file with the necessary API keys
3. Ensure your microphone and audio setup are configured
4. Run the script with Python 3.11 or newer and with the following command:
```sh
python mem0-livekit-voice-agent.py start
```
5. After the script starts, you can interact with the voice agent using [Livekit's Agent Platform](https://agents-playground.livekit.io/) and Connect to the agent inorder to start conversations.
## Best Practices for Voice Agents with Memory
1. **Context Preservation**: Store enough context with each memory for effective retrieval
2. **Privacy Considerations**: Implement secure memory management
3. **Relevant Memory Filtering**: Use semantic search to retrieve only the most pertinent memories
4. **Error Handling**: Implement robust error handling for memory operations
## Debugging Function Tools
- To run the script in debug mode simply start the assistant with `dev` mode:
```sh
python mem0-livekit-voice-agent.py dev
```
- When working with memory-enabled voice agents, use Python's `logging` module for effective debugging:
```python
import logging
# Set up logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger("memory_voice_agent")
```

View File

@@ -220,4 +220,49 @@ Here are the available integrations for Mem0:
>
Integrate Mem0 as an MCP Server in Cursor.
</Card>
<Card
title="Livekit"
icon={
<svg
viewBox="0 0 24 24"
xmlns="http://www.w3.org/2000/svg"
width="24"
height="24"
>
<text
x="12"
y="16"
fontFamily="Arial"
fontSize="12"
textAnchor="middle"
fill="currentColor"
fontWeight="bold"
>
LK
</text>
</svg>
}
href="/integrations/livekit"
>
Integrate Mem0 with Livekit for voice agents.
</Card>
<Card
title="ElevenLabs"
icon={
<svg
xmlns="http://www.w3.org/2000/svg"
width="24"
height="24"
viewBox="0 0 24 24"
fill="none"
>
<rect width="24" height="24" fill="white"/>
<rect x="8" y="4" width="2" height="16" fill="black"/>
<rect x="14" y="4" width="2" height="16" fill="black"/>
</svg>
}
href="/integrations/elevenlabs"
>
Build voice agents with memory using ElevenLabs Conversational AI.
</Card>
</CardGroup>