Hooman Labs DocsGuideReference

Speech Settings

Speech settings section allows you to configure how your agent speaks during conversations.

Providers

You can choose from the following text-to-speech (TTS) providers:

  • ElevenLabs – Ultra-realistic voices optimized for expressive delivery

  • Smallest – Lightweight and fast TTS engine

  • Cartesia – Balanced between speed and realism

  • Google – Strong multi-language support

  • Azure – Microsoft’s robust voice platform

  • Polly – Amazon’s neural TTS with a wide range of voice options

Configuration

Once a provider is selected, you can configure:

  • Model (e.g., flash v2.5)

  • Language (e.g., hi for Hindi)

  • Gender (e.g., female)

  • Voice (select from available voices provided by the provider)

Speech parameters

You can fine-tune the voice output using the following settings:

  • Speed – Controls how fast the voice speaks

  • Stability – Adjusts how consistent the voice sounds across different utterances

  • Similarity Boost – Controls how closely the voice sticks to its reference tone

Pronunciation Dictionary

The pronunciation dictionary helps improve how specific words or phrases are spoken.

  • Dictionaries can be created once and reused across multiple agents

  • Each dictionary contains rules where you define a phrase and how it should be pronounced

  • Useful for handling brand names, abbreviations, or uncommon words that are often mispronounced