Input format#

The TTS API accepts raw UTF-8 text bytes. This page describes the character set the model recognizes, the tts_preprocess() normalization pipeline that converts raw text into that form, and the SSML markup the model understands.

Required format#

Pass text as UTF-8-encoded bytes to push(). The model’s phoneme dictionary operates on a restricted character set:

Accepted

Examples

Letters (upper and lowercase)

az, AZ

Space

Apostrophe

'

Comma, period

, .

Question mark, exclamation

? !

Hyphen

-

Angle brackets (SSML delimiters)

< >

Digits, accents, currency symbols, and most punctuation are not in the model’s vocabulary. Passing them produces silent gaps or garbled output. Use tts_preprocess() to convert raw text before encoding it.

Preprocessing with tts_preprocess#

tts_preprocess() normalizes a Python string into a form the TTS model can pronounce. It returns a str; encode it to bytes before pushing.

from abr_sdk.tts_preprocess import tts_preprocess

text = tts_preprocess("Dr. Smith owed $42 to the WHO.")
# -> "doctor Smith owed forty two dollars to the W H O."
pcm_input = text.encode("utf-8")

What the pipeline does#

The default pipeline runs these steps in order:

Step

Example

Spell out acronyms

WHOW H O, HTTP2H T T P two

Normalize dates

12/25/2024December twenty fifth, two thousand twenty four

Numbers to words

$42.50forty two dollars and fifty cents, 3.14three point one four

Unicode normalization

ée, ,, , , ,

Collapse repeated punctuation

!!!!!

Collapse whitespace

multiple spaces → single space

Remove special characters

drops anything outside the model alphabet

Expand abbreviations

Dr.doctor, St.street, etc.et cetera

A trailing period is added when the text has no sentence-ending punctuation, so the synthesis pipeline has a terminal to flush on.

Custom pipelines#

Pass a list of callables as pipeline to replace the default order or use a subset of steps:

from abr_sdk.tts_preprocess import tts_preprocess, collapse_whitespace, expand_abbreviations

text = tts_preprocess(raw, pipeline=[collapse_whitespace, expand_abbreviations])

Each function in the pipeline takes a str and returns a str. The steps in abr_sdk.tts_preprocess are plain functions you can import individually.

SSML markup#

The model supports a subset of SSML (Speech Synthesis Markup Language) for controlling prosody and emotion. Tags are passed through tts_preprocess untouched: only the spoken text between tags is normalized.

text = tts_preprocess('<abr:emotion type="calm">Hello!</abr:emotion>')
# spoken text normalized; tags preserved verbatim

Supported tags#

Tag

Description

<speak>

Optional top-level wrapper. A closing </speak> forces a pipeline flush.

<prosody rate="..." pitch="..." volume="...">

Controls speech rate, pitch, and volume. Accepts named presets (x-slow, slow, medium, fast, x-fast for rate; x-low through x-high for pitch; silent through x-loud for volume) or percentage values. Tags are nestable.

<abr:emotion type="...">

Applies a composite prosody preset. Nestable with <prosody>.

Supported <abr:emotion> types: apologetic, calm, empathetic, firm, lively.

Unsupported tags and attributes are ignored.

When SSML tags are present, tts_preprocess does not add a trailing period between or after the runs. Use explicit sentence-ending punctuation, a </speak> tag, or the flush byte to end a tagged utterance.

Next steps

  • Examples: full examples showing how to preprocess and synthesize text.

  • Overview: overview of the TTS streaming model and output format.