Hooman Labs DocsGuideReference

Transcription Settings

Transcription settings define how the agent listens, processes, and interprets user speech during a conversation. These controls significantly impact responsiveness, fluidity, and interruptions. You can configure these under Agent Editor > Config > Transcription Settings.

Provider, Model, Language

  • Provider: Choose between supported transcription providers.

  • Model: Select a model best suited for your language or use case.

  • Language: Set the spoken language (e.g., en-IN, hi-IN).

Turn settings

Setting

Description

Silence Duration

Minimum silence required to detect the end of speech.

Wait in Case of No Punctuation

Extra time to wait if the transcript ends without punctuation.

Wait in Case of Number

Extra time to wait if transcript ends with a number.

No Transcript Timeout

Maximum time to wait after detecting speech if no transcript is received.

Smart Turn Detection

Uses context and classifier model to detect end of speech. Decides a dynamic wait duration.

How extra wait works:
After the base silence duration ends, the agent checks the transcript. If no punctuation or number is found, it continues to listen for the additional defined wait duration. If any new transcript or speech starts (as defined in interruption settings) within wait duration, listening continues. Otherwise, after the duration ends, the transcript is sent to the LLM.

Interim transcripts are received every 1 second, and speech duration to interrupt is usually 1 second. So, extra wait time values should be >1s to be meaningful, otherwise, that extra wait is not useful and agent will end up interrupting user.

Interruption settings

These determine how and when a user can interrupt the agent.

Setting

Description

Interruption

Enable/disable user interruptions

Number of Words

Minimum word count needed to trigger an interruption

Speech Duration

Minimum speech length in seconds required to interrupt

Wait Duration

Time agent waits before resuming after being interrupted

Block First Message Interruption

Prevents interruption of agent’s first message

A user interrupt is triggered if either the number of words or the speech duration threshold is met.

Transcript settings

Setting

Provider

Description

Boosted Keywords

Deepgram

Helps transcription engine better recognize domain terms

Smart Format

Deepgram

Converts phone numbers, dates into structured format

Numerals

Deepgram

Transcribes spoken numbers as digits

Punctuation

Deepgram

Applies punctuation to utterance endings

Finalize Transcript

Deepgram

Forces model to give final transcript

Prompt

Sarvam (Saaras)

Provide context prompt for better accuracy

Ideal settings by use case

1. Short Responses (Yes/No, Acknowledgements)

  • Silence Duration: < 0.25s

  • Why: User is expected to speak in short bursts with minimal pause.

  • Examples:

    • Agent: “Are you available tomorrow?” → User: “Yes”

    • Agent: “Your OTP is 4382. Can you repeat that?” → User: “4382”

  • Effect: Quicker turn-taking, snappier interactions.

2. Long, Detailed Responses (Complaints, Explanations, Address Capture)

  • Silence Duration: 0.8–1.2s (or higher)

  • Why: User might take pauses mid-sentence to think.

  • Examples:

    • Agent: “Can you describe what issue you’re facing?” → User: “Yes… I ordered on the 5th and still haven’t received it.”

    • Agent: “What’s your full address?” → User: “It’s 221B Baker Street… uh… near the station.”

  • Effect: Prevents premature agent response or interruption.

3. Interruptibility

  • Set interruption to enabled when user may need to stop a long bot message (e.g., in support calls).

  • Start with speech duration = 0.8s and word count = 2, and fine tune values based on interruption patterns in your conversations.

Providers

Deepgram (nova-2, nova-3)

  • Supports boosting, smart formatting, and numerals.

  • Use boosted keywords (e.g., "refund", "warranty") to improve domain accuracy.

  • Does not support transcript prompt context.

Sarvam (Saaras, Sarika)

  • Native support for Indic languages like Hindi, Tamil, Telugu, Bengali, etc.

  • Transcript prompt available to improve recognition accuracy.

  • Ideal for multilingual and regional use cases.