Transcription Settings

Transcription settings define how your AI agent listens, processes, and understands user speech during a conversation. These controls significantly impact the agent's responsiveness, fluidity, and how it handles user interruptions.

1. Provider, Model, and Language

This is where you set the foundation for your agent's "ears".

  • Provider: Choose your Speech-to-Text (STT) provider based on your specific language support and accuracy requirements.

    • Deepgram: Excellent for general use cases, offering high accuracy, smart formatting, and fast transcription speeds.

    • Sarvam: Highly recommended for native support of Indic languages (Hindi, Tamil, Telugu, Bengali, etc.).

    • Soniox: The ideal choice if your use case requires robust, out-of-the-box support for multiple languages.

    • Other providers include Elevenlabs, Navana and Assembly

  • Model: Select the specific model tier (e.g., nova-3 for Deepgram or saaras for Sarvam) best suited for your use case.

  • Language: Set the primary spoken language expected from the user (e.g., hi-IN for Hindi).

2. Turn Settings (When Should the Agent Speak?)

Turn settings determine how the agent decides that the user has finished speaking and it is time to reply.

  • Mode: Generally set to "Heuristic" for standard conversational flows.

  • Sensitivity: This controls how long the agent waits after the caller stops speaking before responding.

    • Low: Shorter wait times. Feels highly responsive but may cut off slow speakers. Ideal for quick, transactional yes/no responses.

    • Medium: The balanced default. Ideal for regular, everyday conversations.

    • High: Longer wait times to accommodate thoughtful pauses. Best when users need to explain complex issues (like filing a complaint).

    • Custom: Allows you to manually dial in exact silence duration thresholds.