Transcription Settings

Transcription settings define how your AI agent listens, processes, and understands user speech during a conversation. These controls significantly impact the agent's responsiveness, fluidity, and how it handles user interruptions.

1. Provider, Model, and Language

This is where you set the foundation for your agent's "ears".

Provider: Choose your Speech-to-Text (STT) provider based on your specific language support and accuracy requirements.
- Deepgram: Excellent for general use cases, offering high accuracy, smart formatting, and fast transcription speeds.
- Sarvam: Highly recommended for native support of Indic languages (Hindi, Tamil, Telugu, Bengali, etc.).
- Soniox: The ideal choice if your use case requires robust, out-of-the-box support for multiple languages.
- Other providers include Elevenlabs, Navana and Assembly
Model: Select the specific model tier (e.g., nova-3 for Deepgram or saaras for Sarvam) best suited for your use case.
Language: Set the primary spoken language expected from the user (e.g., hi-IN for Hindi).

2. Turn Settings (When Should the Agent Speak?)

Turn settings determine how the agent decides that the user has finished speaking and it is time to reply.

Mode: Generally set to "Heuristic" for standard conversational flows.
Sensitivity: This controls how long the agent waits after the caller stops speaking before responding.
- Low: Shorter wait times. Feels highly responsive but may cut off slow speakers. Ideal for quick, transactional yes/no responses.
- Medium: The balanced default. Ideal for regular, everyday conversations.
- High: Longer wait times to accommodate thoughtful pauses. Best when users need to explain complex issues (like filing a complaint).
- Custom: Allows you to manually dial in exact silence duration thresholds.

Pronounciation

Speech Settings