Input format#
The TTS API accepts raw UTF-8 text bytes. This page describes the character set the model
recognizes, the tts_preprocess() normalization pipeline that
converts raw text into that form, and the SSML markup the model understands.
Required format#
Pass text as UTF-8-encoded bytes to push(). The model’s phoneme
dictionary operates on a restricted character set:
Accepted |
Examples |
|---|---|
Letters (upper and lowercase) |
|
Space |
|
Apostrophe |
|
Comma, period |
|
Question mark, exclamation |
|
Hyphen |
|
Angle brackets (SSML delimiters) |
|
Digits, accents, currency symbols, and most punctuation are not in the model’s vocabulary. Passing
them produces silent gaps or garbled output. Use tts_preprocess()
to convert raw text before encoding it.
Preprocessing with tts_preprocess#
tts_preprocess() normalizes a Python string into a form the TTS
model can pronounce. It returns a str; encode it to bytes before pushing.
from abr_sdk.tts_preprocess import tts_preprocess
text = tts_preprocess("Dr. Smith owed $42 to the WHO.")
# -> "doctor Smith owed forty two dollars to the W H O."
pcm_input = text.encode("utf-8")
What the pipeline does#
The default pipeline runs these steps in order:
Step |
Example |
|---|---|
Spell out acronyms |
|
Normalize dates |
|
Numbers to words |
|
Unicode normalization |
|
Collapse repeated punctuation |
|
Collapse whitespace |
multiple spaces → single space |
Remove special characters |
drops anything outside the model alphabet |
Expand abbreviations |
|
A trailing period is added when the text has no sentence-ending punctuation, so the synthesis pipeline has a terminal to flush on.
Custom pipelines#
Pass a list of callables as pipeline to replace the default order or use a subset of steps:
from abr_sdk.tts_preprocess import tts_preprocess, collapse_whitespace, expand_abbreviations
text = tts_preprocess(raw, pipeline=[collapse_whitespace, expand_abbreviations])
Each function in the pipeline takes a str and returns a str. The steps in
abr_sdk.tts_preprocess are plain functions you can import individually.
SSML markup#
The model supports a subset of SSML (Speech Synthesis Markup Language) for controlling prosody and
emotion. Tags are passed through tts_preprocess untouched: only the spoken text between tags is
normalized.
text = tts_preprocess('<abr:emotion type="calm">Hello!</abr:emotion>')
# spoken text normalized; tags preserved verbatim