7 Steps to Human-Like Sound AI Voice Agent
This is a Retell AI tutorial to improve how AI voice agents sound and pronunciation. For a full video, visit: https://youtu.be/U1nFU67xRaY
Arief Rachman
2/26/2025


Introduction
Creating an AI voice agent that truly resembles human speech has long presented challenges for developers and researchers alike. Issues such as rapid speech delivery, unclear pronunciation of numbers, and mispronounced words often render these voice agents less effective. Moreover, the artificial sound of AI when articulating dates or web addresses can be quite striking. Fortunately, innovative solutions are emerging to address these concerns, thanks to pioneers like Evie Wang, co-founder and CMO of Retell AI. This blog outlines seven essential steps to transform your AI voice agent from sounding mechanical to delivering a more natural and human-like sound.
Step 1: Use Custom Voice
By using custom voice we could improve how voice agents sound. The better accent and emotion in the voice recording will bring better results.
Step 2: Give pause for Clarity
A common issue faced by AI voice agents is speaking too rapidly. To resolve this, adjust the speech rate to maintain user engagement and comprehension. Finding the optimal pace allows listeners to absorb the information being conveyed without feeling overwhelmed by the speed of delivery.
In retell AI you need to give a dash for pause.
Use “ - “ for short pause
Or multiple dash for longer pause “ - - - - - “
Step 3: Number and Date Prompt
Another critical step is focusing on the pronunciation of dates and numbers. Ensuring accurate and clear enunciation of these elements can greatly enhance the user experience. Use phonetic data and database lookups to enable better pronunciation accuracy, helping your voice agent communicate with greater authenticity.
This is an example for phone number:
When speaking the phone number, transform the format as follows:
- Input formats like 4158923245, (415) 892-3245, or 415-892-3245
- Should be pronounced as: "four one five - eight nine two - three two four five"
and this is an example for date:
When speaking the date, transform the format as follows:
- Input formats like 24th July, Monday 14th, or Feb 9, 2025
- Should be pronounced as: "twenty fourth July, Monday Fourteenth, February Nine - twenty twenty five"
Step 4: States Time Slowly
For a better experience, you need to convert numbers into words, Use this exact Prompt:
For 1:00 PM, say “One P M.”
For 3:30 PM, say “Three thirty P M.”
For 8:45 AM, say “Eight forty-five A M.”
Never say “O’Clock.” Instead just say O-Clock never O'clock, This is non-negotiable—always say “A M” or “P M.
Step 5: Emai and Website Prompt
Focus on the pronunciation of email and website. Ensuring accurate and clear enunciation of these elements can greatly enhance the user experience.
Use this prompt for email and website pronunciation:
Identify each segment of the email and domain name.
If a segment consists of individual letters (e.g., "NK"), pronounce each letter using its spoken form in English (e.g., "N" → "en," "K" → "kay").
If a segment is a recognizable word (e.g., "laundry"), pronounce it normally as that word.
Pronounce “@” as “at” and “.” as “dot” before stating the top-level domain (e.g., “at gmail dot com,” “at yahoo net,” “at hotmail dot org,” etc.).
Website Example:
“nklaundry.com” → “en - kay - laundry dot com”
“abctest.net” → “A B C test dot net”
“xyzco.org” → “ex-why-zee-co dot org”
Email Example:
“mike@nklaundry.com” → “mike at en-kay-laundry dot com”
“a.laura@abctest.net” → “ai dot laura at A B C test dot net”
“jason@xyzco.org” → “jason at ex-why-zee-co dot org”
Adhere to this phonetic breakdown carefully to ensure clarity and proper pronunciation for customers.
Step 6: Add natural Reactions
Use Backchanelling function in Retell AI to get better natural sound.
- Use Phrases like “uh-huh”, “mmh”, “yeah”, and “i see”
- Adjust frequency to keep it natural
Step 7: Give custom pronunciation for technical Word
You can add custom pronunciation in speech settings. Use IPA or CMU Pronunciation for better phonetics.