
September 2023 Releases
AI agents can hold natural, dynamic conversations with customers at scale, across dozens of different use cases.
And while that flexibility is what makes them so impactful, it also introduces risk and complexity to the development process. AI Agents conversations can branch off in dozens of different directions—some predictable, some not at all.
When it comes to testing, manual QA can no longer keep up with the complexity, non-determinism, and scale of enterprise AI agent deployments. It’s too time consuming, and leaves too many blind spots in scenario coverage.
You can’t role-play every scenario. And with manual testing alone, you have to start from scratch every time you make an update to the agent prompt. And because of the non-deterministic nature of conversations, if your first live test goes well, that doesn’t assure that it’ll test well again the next time through.
So, it’s either… Spend weeks manually testing as many cases as possible, or find a happy path quickly (say, 70-80% scenario coverage), launch, and let real customers be the ones to discover conversational flaws.
AI Simulations are Regal’s answer to that tradeoff.
Simulations let you auto-generate and simultaneously run dozens of scenario-aware end-to-end conversations, simulating hours of conversations in seconds. The AI keeps regenerating test cases as you update your agent prompt, so every future iteration can be validated instantly without rebuilding from scratch.
All in all, your iteration cycles shrink, you cover more edge cases, and you harden your agents against risk before a single live call is made.
With Simulation Testing, Regal uses an LLM to simulate the customer side of a call, interacting with your AI agent end-to-end.
How does it work? Mini-prompts are used to define the specific scenario being tested, and the personality and traits of the contact on-call.
Regal uses AI to auto-generate test cases in bulk, but you can also manually configure and delete tests as desired.
After you run tests, you’re left with the simulated transcript for proof.
So, overall, with Simulations:
With this, you can quickly pinpoint where an agent mishandled an objection or question, potentially misfired on retrieval, missed a required action, or disregarded a step in the conversation.
And, by simulating these test cases before an agent is live, no real customers are exposed to faulty interactions.
LLMs can’t generate meaningful test cases on their own. They can analyze language, but they don’t know your objections, your workflows, or your branded style.
With Regal, you can auto-generate test cases with AI.
Regal’s AI-generated test cases are built using the contents of your existing agent prompt—so they’re completely rooted in your voice settings, your objection handling, your guardrails, your branded style, and your knowledge bases.
In the Test Cases page, simply select “Get Me Started,” and the AI will generate 10 spanning use cases in seconds:
Once generated, you can run these test cases individually, or dozens at a time.
Test cases can easily be regenerated and rerun as your agents evolve, as well. So coverage always stays current, and the cases always stay relevant to the most up-to-date prompt.
In the midst of building your AI agent, you’ll utilize Test Logic and Test Audio. These interfaces enable you to have a conversation with your AI Agent in-platform (either via live chat or audio) to test basic prompt logic, common objection handling, pronunciation, and how the voice sounds, quickly rewinding and iterating as needed.
Once you’ve confirmed baseline conversation paths with Test Logic and Test Audio, you’ll move to Simulations for end-to-end scenario testing—running cases in bulk, and re-running test cases when needed to uncover reliability issues in LLM responses or prompt design.
Each test case contains two prompts.
First, a Contact Prompt, which includes:
Then, the Success Criteria—a plain language description of what the correct outcome looks like (e.g. “Agent addresses concerns, explains pricing clearly, and attempts to schedule an appointment”).
The power of Simulation Testing isn’t just in running one-off conversations, it’s in refining them.
Every test transcript helps surface where an AI Agent succeeds and where it falls short. From there, you can quickly adjust prompts, knowledge bases, or guardrails, regenerate your test cases, and re-run them until the agent consistently meets expectations.
That refinement loop gives enterprises full coverage across the core dimensions of AI Agent performance, allowing you to measure and refine:
That coverage extends into more nuanced observability of a conversation as well:
Testing is a critical part of the agent development process because it greatly impacts both the speed of deployment, and the performance of the AI.
Enterprises win by iterating faster than the competition. Simulations are an accelerator.
The impact on speed is threefold:
That level of precision lets you make immediate targeted fixes that dramatically lower risk, without sitting through hours of role-play calls.
In enterprise contact centers, the cost of a missed “branch” could mean lost revenue or broken customer trust.
AI agents are handling more complex, dynamic conversations than ever (lead qualification, scheduling, inbound support). These conversations don’t typically fail on the “happy path.”
They fail in the edge cases:
Simulations provide the safety net that manual QA can’t. Instead of roleplaying calls one by one, you validate entire conversation flows end-to-end, across hundreds of branches, edge cases, and personas—all in bulk.
This equates to scalable coverage across different conversation types, but also scalable coverage across every potential path within every conversation type.
Simulation Testing comes to life when you see how it exposes breakdowns in real conversation flows:
The Test Case: Customer Requests More Information on Pricing
Result: Simulation shows the AI agent responds vaguely: “Pricing depends on your plan.” The agent doesn’t explain the structure or try to move the conversation forward. You want your agent to be more informative and proactive.
Action Taken: You update the prompt to instruct the agent to refer to your “Policy Pricing” knowledge base and explain pricing tiers clearly, and then offer to explain coverage tiers for more context on pricing.
Re-run Result: The updated simulation shows the agent now responds with: “Pricing can be as low as $85 per month, but varies depending on your personal details, vehicle, driving habits and history, and the plan you’re approved for. Would you like me to walk you through our coverage options?” From here, the agent dives into specific policy rates.
The Test Case: Callback Scheduling Request
Action Taken: You refine the task prompt to more explicitly confirm with the contact what time would work best for them, and then add in custom actions to gather_date/time and then end the call once the time is confirmed.
Re-run Result: The updated simulation shows the agent now responds with: “I totally understand! Is there a day and time that works best to call back?”
Upon the contact responding with a day and time, the AI confirms the time with custom action “gather_date/time” and ends the call.
The Test Case: Unsupported Service Handling
Action Taken: You update the guardrail and prompt to ensure unsupported services are addressed clearly and offer a transfer path when necessary.
Re-run Result: The updated simulation shows the agent now responds with: “I want to be clear that our clinic doesn’t provide physical therapy services. Would you like me to transfer you to a representative who can recommend trusted providers in your area?”
In enterprise environments, scaling up isn’t just about taking more calls. It’s about consistently driving results across every possible branch.
Testing with AI Simulations compresses iteration cycles while reducing risk. With automated test case generation, simulated conversations tailored to your prompt and customer based, and comprehensive visibility into the agent’s performance on the call, you can move faster without exposing customers to broken flows, compliance gaps, or off-brand dialogue.
You can ship updates quickly, prove correctness at scale, and keep agents aligned with your brand—all while running continuous regression testing that safeguards every critical flow before a single customer hears a word.
If you’re looking to generate AI test coverage at scale, talk with one of our experts today to see it live.
Ready to see Regal in action?
Book a personalized demo.