abr_sdk.tts_preprocess#
Text preprocessing functions for TTS.
Module Contents#
Classes#
Convert numbers to English words. |
|
Number normalization for TTS. |
Functions#
Normalize unicode characters. |
|
Drop characters outside _SPOKEN_CHARS. |
|
Spell out acronyms for better pronunciation. |
|
Expand common abbreviations to words. |
|
Spell a fraction denominator (e.g. 4 -> “quarter”, 8 plural -> “eighths”). |
|
Convert slash-joined numbers to words. |
|
Collapse runs of repeated punctuation (e.g. “!!!!” -> “!”, “???” -> “?”). |
|
Collapse multiple spaces/tabs to single space. |
|
Ensure text ends with a period. |
|
Trim and collapse repeated whitespace to single spaces. |
|
Map one word to its American spelling, preserving leading capitalization. |
|
Map common British spellings to American (e.g. “favourite” -> “favorite”). |
|
Split a tag like |
|
Check one SSML tag for structural problems. |
|
Return the spoken characters that tts_preprocess would have removed. |
|
Preprocess text for TTS synthesis. |
Data#
API#
- class abr_sdk.tts_preprocess.NumberToWords#
Convert numbers to English words.
Based on the JavaScript implementation in text-preprocessor.js.
- ONES: ClassVar#
[‘’, ‘one’, ‘two’, ‘three’, ‘four’, ‘five’, ‘six’, ‘seven’, ‘eight’, ‘nine’]
- TEENS: ClassVar#
[‘ten’, ‘eleven’, ‘twelve’, ‘thirteen’, ‘fourteen’, ‘fifteen’, ‘sixteen’, ‘seventeen’, ‘eighteen’, ‘…
- TENS: ClassVar#
[‘’, ‘’, ‘twenty’, ‘thirty’, ‘forty’, ‘fifty’, ‘sixty’, ‘seventy’, ‘eighty’, ‘ninety’]
- _ORDINAL_WORD: ClassVar#
None
- SCALES: ClassVar#
[‘’, ‘thousand’, ‘million’, ‘billion’, ‘trillion’, ‘quadrillion’, ‘quintillion’, ‘sextillion’, ‘sept…
- DIGIT_FALLBACK_MIN: ClassVar#
None
- _convert_hundreds(num: int) str#
Convert a number (0-999) to words.
- _ordinalize_word(word: str) str#
Turn one cardinal word into its ordinal form.
Applied to the final word of a cardinal reading, which is the only word an English ordinal inflects (“forty two” -> “forty second”, “one hundred” -> “one hundredth”, “two thousand” -> “two thousandth”).
- _spell_digits(digits: str) str#
Read a digit string one digit at a time (e.g. “905” -> “nine zero five”).
- _grouped_words(num: int) list[str]#
Cardinal words for the thousands-groups of a positive number, high to low.
Splitting on base 1000 keeps every group in 0-999, which is exactly what
_convert_hundredsaccepts, so no input magnitude can push a group out of range. Groups that are zero contribute nothing.
- to_cardinal(num: int | str) str#
Convert integer to cardinal words.
- to_year(num: int | str) str#
Read a 4-digit integer as a spoken year, not a plain cardinal.
- to_ordinal(num: int | str) str#
Convert integer to ordinal words.
- to_decimal(num: float | str) str#
Convert decimal number to words (reads out digits after decimal).
- to_currency(num: float | str) str#
Convert currency to words (USD).
- class abr_sdk.tts_preprocess.ItoNormalization#
Number normalization for TTS.
Adapted from Keith Ito’s Tacotron preprocessing: https://github.com/keithito/tacotron/blob/master/text/numbers.py
Initialization
- _remove_commas(match: re.Match) str#
- _convert_currency(match: re.Match) str#
- _convert_decimal(match: re.Match) str#
- _convert_ordinal(match: re.Match) str#
- _convert_cardinal(match: re.Match) str#
- __call__(text: str) str#
Apply all number-normalization substitutions to the text.
- abr_sdk.tts_preprocess.unicode_normalization(text: str) str#
Normalize unicode characters.
Handles accents, dashes, ellipsis, parentheses, and other special punctuation for better TTS pronunciation.
- abr_sdk.tts_preprocess._SPOKEN_CHARS#
‘frozenset(…)’
- abr_sdk.tts_preprocess.remove_special_characters(text: str) str#
Drop characters outside _SPOKEN_CHARS.
A “/” first becomes a space so a slash-joined pair reads as two words (“and/or” -> “and or”) instead of running together.
- abr_sdk.tts_preprocess.spell_out_acronyms(text: str) str#
Spell out acronyms for better pronunciation.
Examples: “VUI” -> “V U I”, “VUIs” -> “V U I zz”, “API” -> “ay P I”
- abr_sdk.tts_preprocess.expand_abbreviations(text: str) str#
Expand common abbreviations to words.
- abr_sdk.tts_preprocess._MONTHS#
[‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘…
- abr_sdk.tts_preprocess._NUM2WORDS#
‘NumberToWords(…)’
- abr_sdk.tts_preprocess._FRACTION_DENOMINATORS#
None
- abr_sdk.tts_preprocess._fraction_denominator_word(denominator: int, *, plural: bool) str#
Spell a fraction denominator (e.g. 4 -> “quarter”, 8 plural -> “eighths”).
- abr_sdk.tts_preprocess._SLASH_NUMBER_RE#
‘compile(…)’
- abr_sdk.tts_preprocess._replace_slash_number(match: re.Match) str#
- abr_sdk.tts_preprocess.normalize_slash_numbers(text: str) str#
Convert slash-joined numbers to words.
A trailing year marks a date (“12/25/2024” -> “December twenty fifth, two thousand twenty four”). A bare “N/M” is read as a fraction (“1/2” -> “one half”), never as a date, so “1/2 a cup” reads as “one half a cup” rather than “January second a cup”.
- abr_sdk.tts_preprocess.collapse_repeated_punctuation(text: str) str#
Collapse runs of repeated punctuation (e.g. “!!!!” -> “!”, “???” -> “?”).
- abr_sdk.tts_preprocess.collapse_whitespace(text: str) str#
Collapse multiple spaces/tabs to single space.
- abr_sdk.tts_preprocess.add_final_period(text: str) str#
Ensure text ends with a period.
- abr_sdk.tts_preprocess.final_cleanup(text: str) str#
Trim and collapse repeated whitespace to single spaces.
- abr_sdk.tts_preprocess._BRITISH_TO_AMERICAN#
None
- abr_sdk.tts_preprocess._americanize_word(word: str) str#
Map one word to its American spelling, preserving leading capitalization.
- abr_sdk.tts_preprocess.normalize_british_spelling(text: str) str#
Map common British spellings to American (e.g. “favourite” -> “favorite”).
- abr_sdk.tts_preprocess._DEFAULT_PIPELINE#
None
- abr_sdk.tts_preprocess._SSML_TAG_RE#
‘compile(…)’
- abr_sdk.tts_preprocess._SSML_TAGS: dict[str, set[str]]#
None
- abr_sdk.tts_preprocess._ssml_split_tag(tag: str) tuple[bool, str, str]#
Split a tag like
<prosody rate="slow">into (is_closing, name, attrs).The tag always has a name: the matcher that produced it requires one.
- abr_sdk.tts_preprocess._ssml_tag_error(tag: str) str | None#
Check one SSML tag for structural problems.
Returns a human-readable reason if the tag should be dropped (unknown tag, unknown attribute, or an unquoted/unterminated attribute value), or
Noneif it is well-formed and known. Attribute values must be wrapped in matching quotes, the same rule libexpat enforces;rate=fastis rejected here so it never reaches the backend and halts playback.
- abr_sdk.tts_preprocess.find_unpreprocessed_chars(text: str) set[str]#
Return the spoken characters that tts_preprocess would have removed.
Inspects only the runs between SSML tags, so tag syntax (
=, quotes, digits in attribute values) is not flagged; an empty result means the text looks preprocessed. Assumes whole tags: a tag split mid-stream may flag its internals.
- abr_sdk.tts_preprocess.tts_preprocess(text: str, *, pipeline: list[collections.abc.Callable] | None = None) str#
Preprocess text for TTS synthesis.
Normalizes raw text into a form the TTS phoneme dictionary can pronounce: spells out acronyms, turns numbers/dates/currency into words, strips accents and fancy punctuation, and expands abbreviations. Sentence-ending
.?!the caller wrote are kept (the model renders each with distinct prosody).Well-formed SSML markup (
<prosody>,</speak>,<abr:emotion>, …) passes through untouched: the input is split on tags and only the spoken runs between them are normalized. Normalizing the whole string would strip the="/and attribute values the tags need to stay well-formed.Structurally malformed tags (unquoted value like
rate=fast, unknown tag, unknown attribute) are dropped with a warning printed to stdout, keeping the surrounding text. A single malformed tag would otherwise abort the whole utterance inside the backend’s XML parser. When a malformed opening tag is dropped, its matching close (tracked by a LIFO name stack) is dropped too, so nesting stays balanced; an unmatched closing tag is left alone, since in a streamed push its opening tag may have arrived in an earlier call.Tag-free text gets a trailing
.when it has no sentence-ending punctuation, so libtfs has a terminal to flush on. Tagged input gets no such synthetic terminator: its runs may be one sentence split across tags, or only part of a sentence in a streaming push, so a period between them would be wrong; the caller’s own punctuation, a</speak>, or the flush byte ends it.