Input format#

The TTS API accepts raw UTF-8 text bytes. This page describes the character set the model recognizes, the tts_preprocess() normalization pipeline that converts raw text into that form, and the SSML markup the model understands.

Required format#

Pass text as UTF-8-encoded bytes to push(). The model’s phoneme dictionary operates on a restricted character set:

Accepted	Examples
Letters (upper and lowercase)	`a`–`z`, `A`–`Z`
Space
Apostrophe	`'`
Comma, period	`,` `.`
Question mark, exclamation	`?` `!`
Hyphen	`-`
Angle brackets (SSML delimiters)	`<` `>`

Digits, accents, currency symbols, and most punctuation are not in the model’s vocabulary. Passing them produces silent gaps or garbled output. Use tts_preprocess() to convert raw text before encoding it.

Preprocessing with `tts_preprocess`#

tts_preprocess() normalizes a Python string into a form the TTS model can pronounce. It returns a str; encode it to bytes before pushing.

from abr_sdk.tts_preprocess import tts_preprocess

text = tts_preprocess("Dr. Smith owed $42 to the WHO.")
# -> "doctor Smith owed forty two dollars to the W H O."
pcm_input = text.encode("utf-8")

What the pipeline does#

The default pipeline runs these steps in order:

Step	Example
Spell out acronyms	`WHO` → `W H O`, `HTTP2` → `H T T P two`
Normalize dates	`12/25/2024` → `December twenty fifth, two thousand twenty four`
Numbers to words	`$42.50` → `forty two dollars and fifty cents`, `3.14` → `three point one four`
Unicode normalization	`é` → `e`, `—` → `,`, `…` → `, , ,`
Collapse repeated punctuation	`!!!!` → `!`
Collapse whitespace	multiple spaces → single space
Remove special characters	drops anything outside the model alphabet
Expand abbreviations	`Dr.` → `doctor`, `St.` → `street`, `etc.` → `et cetera`

A trailing period is added when the text has no sentence-ending punctuation, so the synthesis pipeline has a terminal to flush on.

Custom pipelines#

Pass a list of callables as pipeline to replace the default order or use a subset of steps:

from abr_sdk.tts_preprocess import tts_preprocess, collapse_whitespace, expand_abbreviations

text = tts_preprocess(raw, pipeline=[collapse_whitespace, expand_abbreviations])

Each function in the pipeline takes a str and returns a str. The steps in abr_sdk.tts_preprocess are plain functions you can import individually.

SSML markup#

The model supports a subset of SSML (Speech Synthesis Markup Language) for controlling prosody and emotion. Tags are passed through tts_preprocess untouched: only the spoken text between tags is normalized.

text = tts_preprocess('<abr:emotion type="calm">Hello!</abr:emotion>')
# spoken text normalized; tags preserved verbatim

Supported tags#

Tag	Description
`<speak>`	Optional top-level wrapper. A closing `</speak>` forces a pipeline flush.
`<prosody rate="..." pitch="..." volume="...">`	Controls speech rate, pitch, and volume. Accepts named presets (`x-slow`, `slow`, `medium`, `fast`, `x-fast` for rate; `x-low` through `x-high` for pitch; `silent` through `x-loud` for volume) or percentage values. Tags are nestable.
`<abr:emotion type="...">`	Applies a composite prosody preset. Nestable with `<prosody>`.

Supported <abr:emotion> types: apologetic, calm, empathetic, firm, lively.

Unsupported tags and attributes are ignored.

When SSML tags are present, tts_preprocess does not add a trailing period between or after the runs. Use explicit sentence-ending punctuation, a </speak> tag, or the flush byte to end a tagged utterance.

Next steps

Examples: full examples showing how to preprocess and synthesize text.
Overview: overview of the TTS streaming model and output format.

Input format

Contents

Input format#

Required format#

Preprocessing with tts_preprocess#

What the pipeline does#

Custom pipelines#

SSML markup#

Supported tags#

Preprocessing with `tts_preprocess`#