abr_sdk.tts_preprocess

`abr_sdk.tts_preprocess`#

Text preprocessing functions for TTS.

Module Contents#

Classes#

`NumberToWords`	Convert numbers to English words.
`ItoNormalization`	Number normalization for TTS.

Functions#

`unicode_normalization`	Normalize unicode characters.
`remove_special_characters`	Drop characters outside _SPOKEN_CHARS.
`spell_out_acronyms`	Spell out acronyms for better pronunciation.
`expand_abbreviations`	Expand common abbreviations to words.
`_fraction_denominator_word`	Spell a fraction denominator (e.g. 4 -> “quarter”, 8 plural -> “eighths”).
`_replace_slash_number`
`normalize_slash_numbers`	Convert slash-joined numbers to words.
`collapse_repeated_punctuation`	Collapse runs of repeated punctuation (e.g. “!!!!” -> “!”, “???” -> “?”).
`collapse_whitespace`	Collapse multiple spaces/tabs to single space.
`add_final_period`	Ensure text ends with a period.
`final_cleanup`	Trim and collapse repeated whitespace to single spaces.
`_americanize_word`	Map one word to its American spelling, preserving leading capitalization.
`normalize_british_spelling`	Map common British spellings to American (e.g. “favourite” -> “favorite”).
`_ssml_split_tag`	Split a tag like `<prosody rate="slow">` into (is_closing, name, attrs).
`_ssml_tag_error`	Check one SSML tag for structural problems.
`find_unpreprocessed_chars`	Return the spoken characters that tts_preprocess would have removed.
`tts_preprocess`	Preprocess text for TTS synthesis.

Data#

`_SPOKEN_CHARS`
`_MONTHS`
`_NUM2WORDS`
`_FRACTION_DENOMINATORS`
`_SLASH_NUMBER_RE`
`_BRITISH_TO_AMERICAN`
`_DEFAULT_PIPELINE`
`_SSML_TAG_RE`
`_SSML_TAGS`

API#

class abr_sdk.tts_preprocess.NumberToWords#

Convert numbers to English words.

Based on the JavaScript implementation in text-preprocessor.js.

ONES: ClassVar#: [‘’, ‘one’, ‘two’, ‘three’, ‘four’, ‘five’, ‘six’, ‘seven’, ‘eight’, ‘nine’]

TEENS: ClassVar#: [‘ten’, ‘eleven’, ‘twelve’, ‘thirteen’, ‘fourteen’, ‘fifteen’, ‘sixteen’, ‘seventeen’, ‘eighteen’, ‘…

TENS: ClassVar#: [‘’, ‘’, ‘twenty’, ‘thirty’, ‘forty’, ‘fifty’, ‘sixty’, ‘seventy’, ‘eighty’, ‘ninety’]

_ORDINAL_WORD: ClassVar#: None

SCALES: ClassVar#: [‘’, ‘thousand’, ‘million’, ‘billion’, ‘trillion’, ‘quadrillion’, ‘quintillion’, ‘sextillion’, ‘sept…

DIGIT_FALLBACK_MIN: ClassVar#: None

_convert_hundreds(num: int) → str#: Convert a number (0-999) to words.

_ordinalize_word(word: str) → str#

Turn one cardinal word into its ordinal form.

Applied to the final word of a cardinal reading, which is the only word an English ordinal inflects (“forty two” -> “forty second”, “one hundred” -> “one hundredth”, “two thousand” -> “two thousandth”).

_spell_digits(digits: str) → str#: Read a digit string one digit at a time (e.g. “905” -> “nine zero five”).

_grouped_words(num: int) → list[str]#

Cardinal words for the thousands-groups of a positive number, high to low.

Splitting on base 1000 keeps every group in 0-999, which is exactly what _convert_hundreds accepts, so no input magnitude can push a group out of range. Groups that are zero contribute nothing.

to_cardinal(num: int | str) → str#: Convert integer to cardinal words.

to_year(num: int | str) → str#: Read a 4-digit integer as a spoken year, not a plain cardinal.

to_ordinal(num: int | str) → str#: Convert integer to ordinal words.

to_decimal(num: float | str) → str#: Convert decimal number to words (reads out digits after decimal).

to_currency(num: float | str) → str#: Convert currency to words (USD).

class abr_sdk.tts_preprocess.ItoNormalization#

Number normalization for TTS.

Adapted from Keith Ito’s Tacotron preprocessing: https://github.com/keithito/tacotron/blob/master/text/numbers.py

Initialization

_remove_commas(match: re.Match) → str#

_convert_currency(match: re.Match) → str#

_convert_decimal(match: re.Match) → str#

_convert_ordinal(match: re.Match) → str#

_convert_cardinal(match: re.Match) → str#

__call__(text: str) → str#: Apply all number-normalization substitutions to the text.

abr_sdk.tts_preprocess.unicode_normalization(text: str) → str#

Normalize unicode characters.

Handles accents, dashes, ellipsis, parentheses, and other special punctuation for better TTS pronunciation.

abr_sdk.tts_preprocess._SPOKEN_CHARS#: ‘frozenset(…)’

abr_sdk.tts_preprocess.remove_special_characters(text: str) → str#

Drop characters outside _SPOKEN_CHARS.

A “/” first becomes a space so a slash-joined pair reads as two words (“and/or” -> “and or”) instead of running together.

abr_sdk.tts_preprocess.spell_out_acronyms(text: str) → str#

Spell out acronyms for better pronunciation.

Examples: “VUI” -> “V U I”, “VUIs” -> “V U I zz”, “API” -> “ay P I”

abr_sdk.tts_preprocess.expand_abbreviations(text: str) → str#: Expand common abbreviations to words.

abr_sdk.tts_preprocess._MONTHS#: [‘January’, ‘February’, ‘March’, ‘April’, ‘May’, ‘June’, ‘July’, ‘August’, ‘September’, ‘October’, ‘…

abr_sdk.tts_preprocess._NUM2WORDS#: ‘NumberToWords(…)’

abr_sdk.tts_preprocess._FRACTION_DENOMINATORS#: None

abr_sdk.tts_preprocess._fraction_denominator_word(denominator: int, *, plural: bool) → str#: Spell a fraction denominator (e.g. 4 -> “quarter”, 8 plural -> “eighths”).

abr_sdk.tts_preprocess._SLASH_NUMBER_RE#: ‘compile(…)’

abr_sdk.tts_preprocess._replace_slash_number(match: re.Match) → str#

abr_sdk.tts_preprocess.normalize_slash_numbers(text: str) → str#

Convert slash-joined numbers to words.

A trailing year marks a date (“12/25/2024” -> “December twenty fifth, two thousand twenty four”). A bare “N/M” is read as a fraction (“1/2” -> “one half”), never as a date, so “1/2 a cup” reads as “one half a cup” rather than “January second a cup”.

abr_sdk.tts_preprocess.collapse_repeated_punctuation(text: str) → str#: Collapse runs of repeated punctuation (e.g. “!!!!” -> “!”, “???” -> “?”).

abr_sdk.tts_preprocess.collapse_whitespace(text: str) → str#: Collapse multiple spaces/tabs to single space.

abr_sdk.tts_preprocess.add_final_period(text: str) → str#: Ensure text ends with a period.

abr_sdk.tts_preprocess.final_cleanup(text: str) → str#: Trim and collapse repeated whitespace to single spaces.

abr_sdk.tts_preprocess._BRITISH_TO_AMERICAN#: None

abr_sdk.tts_preprocess._americanize_word(word: str) → str#: Map one word to its American spelling, preserving leading capitalization.

abr_sdk.tts_preprocess.normalize_british_spelling(text: str) → str#: Map common British spellings to American (e.g. “favourite” -> “favorite”).

abr_sdk.tts_preprocess._DEFAULT_PIPELINE#: None

abr_sdk.tts_preprocess._SSML_TAG_RE#: ‘compile(…)’

abr_sdk.tts_preprocess._SSML_TAGS: dict[str, set[str]]#: None

abr_sdk.tts_preprocess._ssml_split_tag(tag: str) → tuple[bool, str, str]#

Split a tag like <prosody rate="slow"> into (is_closing, name, attrs).

The tag always has a name: the matcher that produced it requires one.

abr_sdk.tts_preprocess._ssml_tag_error(tag: str) → str | None#

Check one SSML tag for structural problems.

Returns a human-readable reason if the tag should be dropped (unknown tag, unknown attribute, or an unquoted/unterminated attribute value), or None if it is well-formed and known. Attribute values must be wrapped in matching quotes, the same rule libexpat enforces; rate=fast is rejected here so it never reaches the backend and halts playback.

abr_sdk.tts_preprocess.find_unpreprocessed_chars(text: str) → set[str]#

Return the spoken characters that tts_preprocess would have removed.

Inspects only the runs between SSML tags, so tag syntax (=, quotes, digits in attribute values) is not flagged; an empty result means the text looks preprocessed. Assumes whole tags: a tag split mid-stream may flag its internals.

abr_sdk.tts_preprocess.tts_preprocess(text: str, *, pipeline: list[collections.abc.Callable] | None = None) → str#

Preprocess text for TTS synthesis.

Normalizes raw text into a form the TTS phoneme dictionary can pronounce: spells out acronyms, turns numbers/dates/currency into words, strips accents and fancy punctuation, and expands abbreviations. Sentence-ending . ? ! the caller wrote are kept (the model renders each with distinct prosody).

Well-formed SSML markup (<prosody>, </speak>, <abr:emotion>, …) passes through untouched: the input is split on tags and only the spoken runs between them are normalized. Normalizing the whole string would strip the = " / and attribute values the tags need to stay well-formed.

Structurally malformed tags (unquoted value like rate=fast, unknown tag, unknown attribute) are dropped with a warning printed to stdout, keeping the surrounding text. A single malformed tag would otherwise abort the whole utterance inside the backend’s XML parser. When a malformed opening tag is dropped, its matching close (tracked by a LIFO name stack) is dropped too, so nesting stays balanced; an unmatched closing tag is left alone, since in a streamed push its opening tag may have arrived in an earlier call.

Tag-free text gets a trailing . when it has no sentence-ending punctuation, so libtfs has a terminal to flush on. Tagged input gets no such synthetic terminator: its runs may be one sentence split across tags, or only part of a sentence in a streaming push, so a period between them would be wrong; the caller’s own punctuation, a </speak>, or the flush byte ends it.