abr_sdk.tts_preprocess#

Text preprocessing functions for TTS.

Module Contents#

Classes#

NumberToWords

Convert numbers to English words.

ItoNormalization

Number normalization for TTS.

Functions#

unicode_normalization

Normalize unicode characters.

remove_special_characters

Drop characters outside _SPOKEN_CHARS.

spell_out_acronyms

Spell out acronyms for better pronunciation.

expand_abbreviations

Expand common abbreviations to words.

_fraction_denominator_word

Spell a fraction denominator (e.g. 4 -> “quarter”, 8 plural -> “eighths”).

_replace_slash_number

normalize_slash_numbers

Convert slash-joined numbers to words.

collapse_repeated_punctuation

Collapse runs of repeated punctuation (e.g. “!!!!” -> “!”, “???” -> “?”).

collapse_whitespace

Collapse multiple spaces/tabs to single space.

add_final_period

Ensure text ends with a period.

final_cleanup

Trim and collapse repeated whitespace to single spaces.

_americanize_word

Map one word to its American spelling, preserving leading capitalization.

normalize_british_spelling

Map common British spellings to American (e.g. “favourite” -> “favorite”).

_ssml_split_tag

Split a tag like <prosody rate="slow"> into (is_closing, name, attrs).

_ssml_tag_error

Check one SSML tag for structural problems.

find_unpreprocessed_chars

Return the spoken characters that tts_preprocess would have removed.

tts_preprocess

Preprocess text for TTS synthesis.

Data#

API#

class abr_sdk.tts_preprocess.NumberToWords#

Convert numbers to English words.

Based on the JavaScript implementation in text-preprocessor.js.

ONES: ClassVar#

[‘’, ‘one’, ‘two’, ‘three’, ‘four’, ‘five’, ‘six’, ‘seven’, ‘eight’, ‘nine’]

TEENS: ClassVar#

[‘ten’, ‘eleven’, ‘twelve’, ‘thirteen’, ‘fourteen’, ‘fifteen’, ‘sixteen’, ‘seventeen’, ‘eighteen’, ‘…

TENS: ClassVar#

[‘’, ‘’, ‘twenty’, ‘thirty’, ‘forty’, ‘fifty’, ‘sixty’, ‘seventy’, ‘eighty’, ‘ninety’]

_ORDINAL_WORD: ClassVar#

None

SCALES: ClassVar#

[‘’, ‘thousand’, ‘million’, ‘billion’, ‘trillion’, ‘quadrillion’, ‘quintillion’, ‘sextillion’, ‘sept…

DIGIT_FALLBACK_MIN: ClassVar#

None

_convert_hundreds(num: int) str#

Convert a number (0-999) to words.

_ordinalize_word(word: str) str#

Turn one cardinal word into its ordinal form.

Applied to the final word of a cardinal reading, which is the only word an English ordinal inflects (“forty two” -> “forty second”, “one hundred” -> “one hundredth”, “two thousand” -> “two thousandth”).

_spell_digits(digits: str) str#

Read a digit string one digit at a time (e.g. “905” -> “nine zero five”).

_grouped_words(num: int) list[str]#

Cardinal words for the thousands-groups of a positive number, high to low.

Splitting on base 1000 keeps every group in 0-999, which is exactly what _convert_hundreds accepts, so no input magnitude can push a group out of range. Groups that are zero contribute nothing.

to_cardinal(num: int | str) str#

Convert integer to cardinal words.

to_year(num: int | str) str#

Read a 4-digit integer as a spoken year, not a plain cardinal.

to_ordinal(num: int | str) str#

Convert integer to ordinal words.

to_decimal(num: float | str) str#

Convert decimal number to words (reads out digits after decimal).

to_currency(num: float | str) str#

Convert currency to words (USD).

class abr_sdk.tts_preprocess.ItoNormalization#

Number normalization for TTS.

Adapted from Keith Ito’s Tacotron preprocessing: https://github.com/keithito/tacotron/blob/master/text/numbers.py

Initialization

_remove_commas(match: re.Match) str#
_convert_currency(match: re.Match) str#
_convert_decimal(match: re.Match) str#
_convert_ordinal(match: re.Match) str#
_convert_cardinal(match: re.Match) str#
__call__(text: str) str#

Apply all number-normalization substitutions to the text.

abr_sdk.tts_preprocess.unicode_normalization(text: str) str#

Normalize unicode characters.

Handles accents, dashes, ellipsis, parentheses, and other special punctuation for better TTS pronunciation.

abr_sdk.tts_preprocess._SPOKEN_CHARS#

‘frozenset(…)’

abr_sdk.tts_preprocess.remove_special_characters(text: str) str#

Drop characters outside _SPOKEN_CHARS.

A “/” first becomes a space so a slash-joined pair reads as two words (“and/or” -> “and or”) instead of running together.

abr_sdk.tts_preprocess.spell_out_acronyms(text: str) str#

Spell out acronyms for better pronunciation.

Examples: “VUI” -> “V U I”, “VUIs” -> “V U I zz”, “API” -> “ay P I”

abr_sdk.tts_preprocess.expand_abbreviations(text: str) str#

Expand common abbreviations to words.

abr_sdk.tts_preprocess._MONTHS#

[‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘…

abr_sdk.tts_preprocess._NUM2WORDS#

‘NumberToWords(…)’

abr_sdk.tts_preprocess._FRACTION_DENOMINATORS#

None

abr_sdk.tts_preprocess._fraction_denominator_word(denominator: int, *, plural: bool) str#

Spell a fraction denominator (e.g. 4 -> “quarter”, 8 plural -> “eighths”).

abr_sdk.tts_preprocess._SLASH_NUMBER_RE#

‘compile(…)’

abr_sdk.tts_preprocess._replace_slash_number(match: re.Match) str#
abr_sdk.tts_preprocess.normalize_slash_numbers(text: str) str#

Convert slash-joined numbers to words.

A trailing year marks a date (“12/25/2024” -> “December twenty fifth, two thousand twenty four”). A bare “N/M” is read as a fraction (“1/2” -> “one half”), never as a date, so “1/2 a cup” reads as “one half a cup” rather than “January second a cup”.

abr_sdk.tts_preprocess.collapse_repeated_punctuation(text: str) str#

Collapse runs of repeated punctuation (e.g. “!!!!” -> “!”, “???” -> “?”).

abr_sdk.tts_preprocess.collapse_whitespace(text: str) str#

Collapse multiple spaces/tabs to single space.

abr_sdk.tts_preprocess.add_final_period(text: str) str#

Ensure text ends with a period.

abr_sdk.tts_preprocess.final_cleanup(text: str) str#

Trim and collapse repeated whitespace to single spaces.

abr_sdk.tts_preprocess._BRITISH_TO_AMERICAN#

None

abr_sdk.tts_preprocess._americanize_word(word: str) str#

Map one word to its American spelling, preserving leading capitalization.

abr_sdk.tts_preprocess.normalize_british_spelling(text: str) str#

Map common British spellings to American (e.g. “favourite” -> “favorite”).

abr_sdk.tts_preprocess._DEFAULT_PIPELINE#

None

abr_sdk.tts_preprocess._SSML_TAG_RE#

‘compile(…)’

abr_sdk.tts_preprocess._SSML_TAGS: dict[str, set[str]]#

None

abr_sdk.tts_preprocess._ssml_split_tag(tag: str) tuple[bool, str, str]#

Split a tag like <prosody rate="slow"> into (is_closing, name, attrs).

The tag always has a name: the matcher that produced it requires one.

abr_sdk.tts_preprocess._ssml_tag_error(tag: str) str | None#

Check one SSML tag for structural problems.

Returns a human-readable reason if the tag should be dropped (unknown tag, unknown attribute, or an unquoted/unterminated attribute value), or None if it is well-formed and known. Attribute values must be wrapped in matching quotes, the same rule libexpat enforces; rate=fast is rejected here so it never reaches the backend and halts playback.

abr_sdk.tts_preprocess.find_unpreprocessed_chars(text: str) set[str]#

Return the spoken characters that tts_preprocess would have removed.

Inspects only the runs between SSML tags, so tag syntax (=, quotes, digits in attribute values) is not flagged; an empty result means the text looks preprocessed. Assumes whole tags: a tag split mid-stream may flag its internals.

abr_sdk.tts_preprocess.tts_preprocess(text: str, *, pipeline: list[collections.abc.Callable] | None = None) str#

Preprocess text for TTS synthesis.

Normalizes raw text into a form the TTS phoneme dictionary can pronounce: spells out acronyms, turns numbers/dates/currency into words, strips accents and fancy punctuation, and expands abbreviations. Sentence-ending . ? ! the caller wrote are kept (the model renders each with distinct prosody).

Well-formed SSML markup (<prosody>, </speak>, <abr:emotion>, …) passes through untouched: the input is split on tags and only the spoken runs between them are normalized. Normalizing the whole string would strip the = " / and attribute values the tags need to stay well-formed.

Structurally malformed tags (unquoted value like rate=fast, unknown tag, unknown attribute) are dropped with a warning printed to stdout, keeping the surrounding text. A single malformed tag would otherwise abort the whole utterance inside the backend’s XML parser. When a malformed opening tag is dropped, its matching close (tracked by a LIFO name stack) is dropped too, so nesting stays balanced; an unmatched closing tag is left alone, since in a streamed push its opening tag may have arrived in an earlier call.

Tag-free text gets a trailing . when it has no sentence-ending punctuation, so libtfs has a terminal to flush on. Tagged input gets no such synthetic terminator: its runs may be one sentence split across tags, or only part of a sentence in a streaming push, so a period between them would be wrong; the caller’s own punctuation, a </speak>, or the flush byte ends it.