Prompting gpt-realtime
Our voice agents are built on top of ‘gpt-realtime’ a small, realtime, OpenAI model that is designed to be conversational. In order to build the best voice agents, you need to familiarize yourself with gpt-realtime’s capabilities, limitations and best practices. Remember: Most of the classic use cases for voice agents (banking, insurance, telecom, etc.) are already solved by some architectural pattern, you just need to find the right one and adapt it to your use case.Start Here: The Bible
- You MUST read and internalize this guide:
OpenAI Realtime Prompting Guide . - This guide contains the exact patterns the model was trained on and how to prompt it for those patterns.
- It also contains example solutions to many use cases.
- Important note: This guide contains instructions for prompting the model “Voice”, this is only relevant if you are using the Voice-To-Voice orchestrator.
Mental Model: Realtime Model Is a Thin Conversation Model
- Short context window: Treat
gpt-realtimeas having very limited working memory.- The model has about 32k context, actually the performance degrades after about 10k tokens.
- Conversational, not reasoning:
- Great at turn-by-turn conversation, style, pacing, and making the interaction feel natural.
- Performs badly in these situations:
- Given a long, strict prompt
- When there are too many tools available
- If tools return too much data
- When the model needs to format data (such as dates)
- When running pre-scripted conversations
- Instruction following is not perfect:
- If you have a flow of “do X only if Y happens”, the model will sometimes do X even if Y did not happen.
- Use tools to enforce exact policy before doing an action.
- The model likes simple words:
- Don’t use complex symbols, use the most common words.
- The prompt should be written so a 5 year old can understand it.
- The model was trained on text, give it the words that it most likely saw in its training data, most common english words
technician_scheduling -> service_booking
Context Engineering ‘gpt-realtime’: A check-list.
- Short prompt, only behavior-interaction instructions and tool-use logic.
- 8-10 tool at most.
- All heavy lifting offloaded to tools or background LLMs inside those tools.
- Do not trust the model to follow exact policy before doing an action, offload these decisions to a smarter LLM inside tools.
- Keep the context short, clean and high-signal.