Transcription Settings

Transcription settings define how the agent listens, processes, and interprets user speech during a conversation. These controls significantly impact responsiveness, fluidity, and interruptions. You can configure these under Agent Editor > Config > Transcription Settings.

Provider, Model, Language

Provider: Choose between supported transcription providers.
Model: Select a model best suited for your language or use case.
Language: Set the spoken language (e.g., en-IN, hi-IN).

Turn settings

Setting	Description
Silence Duration	Minimum silence required to detect the end of speech.
Wait in Case of No Punctuation	Extra time to wait if the transcript ends without punctuation.
Wait in Case of Number	Extra time to wait if transcript ends with a number.
No Transcript Timeout	Maximum time to wait after detecting speech if no transcript is received.
Smart Turn Detection	Uses context and classifier model to detect end of speech. Decides a dynamic wait duration.

How extra wait works:
After the base silence duration ends, the agent checks the transcript. If no punctuation or number is found, it continues to listen for the additional defined wait duration. If any new transcript or speech starts (as defined in interruption settings) within wait duration, listening continues. Otherwise, after the duration ends, the transcript is sent to the LLM.

Interim transcripts are received every 1 second, and speech duration to interrupt is usually 1 second. So, extra wait time values should be >1s to be meaningful, otherwise, that extra wait is not useful and agent will end up interrupting user.

Interruption settings

These determine how and when a user can interrupt the agent.

Setting	Description
Interruption	Enable/disable user interruptions
Number of Words	Minimum word count needed to trigger an interruption
Speech Duration	Minimum speech length in seconds required to interrupt
Wait Duration	Time agent waits before resuming after being interrupted
Block First Message Interruption	Prevents interruption of agent’s first message

A user interrupt is triggered if either the number of words or the speech duration threshold is met.

Transcript settings

Setting	Provider	Description
Boosted Keywords	Deepgram	Helps transcription engine better recognize domain terms
Smart Format	Deepgram	Converts phone numbers, dates into structured format
Numerals	Deepgram	Transcribes spoken numbers as digits
Punctuation	Deepgram	Applies punctuation to utterance endings
Finalize Transcript	Deepgram	Forces model to give final transcript
Prompt	Sarvam (Saaras)	Provide context prompt for better accuracy

Ideal settings by use case

1. Short Responses (Yes/No, Acknowledgements)

Silence Duration: < 0.25s
Why: User is expected to speak in short bursts with minimal pause.
Examples:
- Agent: “Are you available tomorrow?” → User: “Yes”
- Agent: “Your OTP is 4382. Can you repeat that?” → User: “4382”
Effect: Quicker turn-taking, snappier interactions.

2. Long, Detailed Responses (Complaints, Explanations, Address Capture)

Silence Duration: 0.8–1.2s (or higher)
Why: User might take pauses mid-sentence to think.
Examples:
- Agent: “Can you describe what issue you’re facing?” → User: “Yes… I ordered on the 5th and still haven’t received it.”
- Agent: “What’s your full address?” → User: “It’s 221B Baker Street… uh… near the station.”
Effect: Prevents premature agent response or interruption.

3. Interruptibility

Set interruption to enabled when user may need to stop a long bot message (e.g., in support calls).
Start with speech duration = 0.8s and word count = 2, and fine tune values based on interruption patterns in your conversations.

Transcription Settings

Provider, Model, Language

Turn settings

Interruption settings

Transcript settings

Ideal settings by use case

1. Short Responses (Yes/No, Acknowledgements)

2. Long, Detailed Responses (Complaints, Explanations, Address Capture)

3. Interruptibility

Providers

Deepgram (nova-2, nova-3)

Sarvam (Saaras, Sarika)