Transcription Settings
Transcription settings define how the agent listens, processes, and interprets user speech during a conversation. These controls significantly impact responsiveness, fluidity, and interruptions. You can configure these under Agent Editor > Config > Transcription Settings.
Provider, Model, Language
-
Provider: Choose between supported transcription providers.
-
Model: Select a model best suited for your language or use case.
-
Language: Set the spoken language (e.g.,
en-IN,hi-IN).

Turn settings
|
Setting |
Description |
|---|---|
|
Silence Duration |
Minimum silence required to detect the end of speech. |
|
Wait in Case of No Punctuation |
Extra time to wait if the transcript ends without punctuation. |
|
Wait in Case of Number |
Extra time to wait if transcript ends with a number. |
|
No Transcript Timeout |
Maximum time to wait after detecting speech if no transcript is received. |
|
Smart Turn Detection |
Uses context and classifier model to detect end of speech. Decides a dynamic wait duration. |
How extra wait works:
After the base silence duration ends, the agent checks the transcript. If no punctuation or number is found, it continues to listen for the additional defined wait duration. If any new transcript or speech starts (as defined in interruption settings) within wait duration, listening continues. Otherwise, after the duration ends, the transcript is sent to the LLM.
Interim transcripts are received every 1 second, and speech duration to interrupt is usually 1 second. So, extra wait time values should be >1s to be meaningful, otherwise, that extra wait is not useful and agent will end up interrupting user.
Interruption settings
These determine how and when a user can interrupt the agent.
|
Setting |
Description |
|---|---|
|
Interruption |
Enable/disable user interruptions |
|
Number of Words |
Minimum word count needed to trigger an interruption |
|
Speech Duration |
Minimum speech length in seconds required to interrupt |
|
Wait Duration |
Time agent waits before resuming after being interrupted |
|
Block First Message Interruption |
Prevents interruption of agent’s first message |
A user interrupt is triggered if either the number of words or the speech duration threshold is met.
Transcript settings
|
Setting |
Provider |
Description |
|---|---|---|
|
Boosted Keywords |
Deepgram |
Helps transcription engine better recognize domain terms |
|
Smart Format |
Deepgram |
Converts phone numbers, dates into structured format |
|
Numerals |
Deepgram |
Transcribes spoken numbers as digits |
|
Punctuation |
Deepgram |
Applies punctuation to utterance endings |
|
Finalize Transcript |
Deepgram |
Forces model to give final transcript |
|
Prompt |
Sarvam (Saaras) |
Provide context prompt for better accuracy |
Ideal settings by use case
1. Short Responses (Yes/No, Acknowledgements)
-
Silence Duration: < 0.25s
-
Why: User is expected to speak in short bursts with minimal pause.
-
Examples:
-
Agent: “Are you available tomorrow?” → User: “Yes”
-
Agent: “Your OTP is 4382. Can you repeat that?” → User: “4382”
-
-
Effect: Quicker turn-taking, snappier interactions.
2. Long, Detailed Responses (Complaints, Explanations, Address Capture)
-
Silence Duration: 0.8–1.2s (or higher)
-
Why: User might take pauses mid-sentence to think.
-
Examples:
-
Agent: “Can you describe what issue you’re facing?” → User: “Yes… I ordered on the 5th and still haven’t received it.”
-
Agent: “What’s your full address?” → User: “It’s 221B Baker Street… uh… near the station.”
-
-
Effect: Prevents premature agent response or interruption.
3. Interruptibility
-
Set interruption to enabled when user may need to stop a long bot message (e.g., in support calls).
-
Start with speech duration = 0.8s and word count = 2, and fine tune values based on interruption patterns in your conversations.
Providers
Deepgram (nova-2, nova-3)
-
Supports boosting, smart formatting, and numerals.
-
Use boosted keywords (e.g., "refund", "warranty") to improve domain accuracy.
-
Does not support transcript prompt context.
Sarvam (Saaras, Sarika)
-
Native support for Indic languages like Hindi, Tamil, Telugu, Bengali, etc.
-
Transcript prompt available to improve recognition accuracy.
-
Ideal for multilingual and regional use cases.