Speech Settings
Speech settings section allows you to configure how your agent speaks during conversations.
Providers
You can choose from the following text-to-speech (TTS) providers:
-
ElevenLabs – Ultra-realistic voices optimized for expressive delivery
-
Smallest – Lightweight and fast TTS engine
-
Cartesia – Balanced between speed and realism
-
Google – Strong multi-language support
-
Azure – Microsoft’s robust voice platform
-
Polly – Amazon’s neural TTS with a wide range of voice options
Configuration
Once a provider is selected, you can configure:
-
Model (e.g., flash v2.5)
-
Language (e.g., hi for Hindi)
-
Gender (e.g., female)
-
Voice (select from available voices provided by the provider)

Speech parameters
You can fine-tune the voice output using the following settings:
-
Speed – Controls how fast the voice speaks
-
Stability – Adjusts how consistent the voice sounds across different utterances
-
Similarity Boost – Controls how closely the voice sticks to its reference tone
Pronunciation Dictionary
The pronunciation dictionary helps improve how specific words or phrases are spoken.
-
Dictionaries can be created once and reused across multiple agents
-
Each dictionary contains rules where you define a phrase and how it should be pronounced
-
Useful for handling brand names, abbreviations, or uncommon words that are often mispronounced
