
September 2023 Releases
The tone, pace, and delivery of your AI Voice Agent directly impacts whether a contact stays on the line, engages, or opts out.
With Regal, voice configuration is a core performance lever, on par with script optimization and routing logic.
This article breaks down what voice tuning actually is, what it involves, and how different traits affect outcomes. Whether you're optimizing for conversion, retention, or simply sounding more human, this guide will show you what to adjust and why.
Voice tuning in Regal is a structured configuration process, designed to give you precise control over how AI Voice Agents sound and perform. From initial design, through testing and long-term optimization.
This is done using individual toggles for each setting within our Voice AI Agent builder.
While Regal Voice AI Agents come with smart defaults, every parameter can be adjusted and tested at whatever level needed (and as much as needed) to perfect each use case.
We’ll take a look at each setting below and cover:
Voice speed controls how fast the agent speaks. Regal supports a wide range, from 0.5 (very slow) to 2.0 (very fast). Most teams stay within the middle or moderate range (1.0-1.18) to ensure the agent sounds natural and human.
Total Available Range: 0.50–2.00
Regal Default Setting: Varies depending on agent type.
Slow (<1.00): Deliberate and steady. Slower speech can help improve clarity for older demographics or high-stakes conversations, but may feel sluggish for general audiences.
Slower speech can also lead to increased call duration, and therefore increased cost (at scale).
Here's an AI Agent with a voice speed setting of 0.70:
Moderate (1.00–1.18): Natural and conversational. The sweet spot for most call types. Keeps calls human-sounding and efficient. Fast enough to maintain energy, slow enough to avoid sounding robotic.
Here's an agent with the most common overall voice speed setting in Regal (1.08):
Fast (1.18+): Brisk and efficient. Great for transactional flows where speed is key (and less emotion/patience is required), or longer conversations that can run higher costs at scale. Think, collections calls, confirmations, or inbound lead capture.
This agent has a very high speed setting of 1.50:
Faster cadence (within reasonable limits) reduces dead air and shortens average handle time, which can increase throughput and lower per-call cost, but runs the risk of sounding rushed or robotic.
Voice temperature controls the variation of voice tone and expressiveness on calls. It's a subtle but powerful setting that affects emotional variability and perceived warmth/empathy.
Warmer tones (higher temperature), for example, help agents sound more human and empathetic, which is valuable in high-trust environments like healthcare, financial services, insurance, etc.
Total Available Range: 0.00–2.00
Regal Default Setting: Varies depending on agent type.
Low (<1.00): Consistent and flat. Best for use cases where precision and predictability matter (IVR flows, compliance language, legal disclaimers). Limits variation in tone, which can make the agent sound more robotic but reduces risk of misinterpretation.
An agent with a temperature setting of 0.50, for example, might sound something like:
Moderate (1.00–1.25): Balanced and professional. The ideal default for most calls. Adds just enough emotional variance to sound natural and expressive without losing clarity or control.
An example of the most commonly used temperature setting in Regal (1.10):
High (1.25+): Expressive and animated. Can be useful for open-ended support or conversational sales calls where emotional range builds trust. But risks sounding exaggerated or inconsistent, especially if your prompt includes strict scripting.
A high setting of 1.80 might sound like this:
Volume is straightforward, but can be important depending on the audience, their device, and their surrounding environment.
Most Regal users (86%) keep this setting at the default (1.00)
Available Range: 0.00–2.00
Default Setting: 1.00
Quieter (<1.00): Useful for calls that benefit from a softer tone, like when you know the caller will be in a quieter/private environment.
Moderate (1.00–1.50): Best for most use cases. Balanced, clear, and easy to understand across a range of devices and backgrounds.
Louder (1.50+): More assertive and bold. Can help in noisy environments, mobile-heavy audiences, or when the caller may be hard of hearing.
This controls how quickly the agent starts speaking after the customer finishes. Lower values create longer pauses; higher values make the agent respond almost immediately.
For most users, this setting is on the higher end of the scale so conversations are well-paced and feel natural (especially combined with moderate-high interruption sensitivity).
Voice speed is used much more commonly as a control over call times.
Available Range: 0.00–1.00
Default Setting: 1.00
Less Responsive (<0.80): Patient and considerate. Ideal for older or slower-speaking audiences (for example, paired with certain accents or languages to match natural style). Helps prevent the AI from speaking over the customer, but may make the agent feel less dynamic to general audiences.
More Responsive (0.80+): Sharp and well-paced. Feels more real-time and efficient, while remaining natural-sounding. Can make the agent feel more “live,” but may occasionally interrupt slow speakers.
Interruption sensitivity determines how easily the agent detects and stops when a customer tries to interrupt it mid-sentence.
Most Regal users (88%) stick with the default setting (0.80).
Available Range: 0.00–1.00
Default Setting: 0.80
Less Sensitive (<0.70): Focused and steady. Better for interactions that are more one-sided (reminders, confirmations, etc.). May ignore minor background noise and quick interjections from the caller.
More Sensitive (0.70+): Alert and reactive. Better for a more natural, conversational feel. Higher sensitivity means the AI will “listen” better on calls, letting callers interject freely.
Very high sensitivity (0.95+) introduces the risk of false pauses to background chatter or other non-communicative sounds made by the caller.
Accent isn’t a slider, but is part of your voice selection. This can impact regional trust and engagement, and make callers feel more comfortable. It’s great for local services.
Most Regal users stick with Neutral American voices across all call types.
Beyond these settings, there are additional structural settings that improve the reliability and perceived personality of your AI Agent.
Voice fallback settings ensure resilience in the rare case of provider outages, so even if your primary TTS provider goes down, the agent continues without interruption. Regal supports automatic fallback routing across providers like ElevenLabs, OpenAI, and Cartesia. More on that here.
Silence handling controls whether the agent fills gaps in conversation or waits patiently, impacting how talkative or reserved the agent feels during calls.
Voicemail detection ensures your agent doesn’t waste time talking to a machine, or delivers the right pre-scripted message once voicemail is reached.
While subtle, these settings can meaningfully influence how the agent is perceived, especially depending on how they are paired with tuning parameters like speed, tone, and responsiveness.
Each voice setting can directly impact how an AI Agent is perceived—how human they sound, how naturally they interact, and how well they contain and drive specific outcomes.
Certain characteristics predictably influence caller behavior
Human-likeness improves long-term trust and reduces opt-outs after the initial greeting.
Familiarity matters. Callers are more likely to stay engaged with voices that sound like them demographically or regionally.
Natural tone and pace helps calls feel lifelike and keep them moving steadily, increasing both comprehension, comfort, and caller agreeability.
Voice settings work together as well, not in isolation:
While there are certainly edge cases (i.e. slower voices for older audiences), in general, Regals production data shows that you want voices to sound natural and approachable. That means a natural tone, steady pace, with moderate warmth.
When in doubt, use a female voice: 78% of companies using Regal use female voices (across all use cases), which tend to test higher for warmth and approachability.
For Lead Qualification and Collections Agents, that number goes up to 90%.
What voice should I choose for my Voice AI Agent?
In the end, the right combination of speed, temperature, responsiveness, and interruption handling makes all the difference in how your agent is perceived and how well they perform.
With Regal’s production data, smart defaults, and expert support, fine-tuning your agent’s voice to match your audience, use case, and brand tone is well within reach.
Ready to see Regal in action?
Book a personalized demo.