---
myst:
  html_meta:
    'description': 'Text input requirements for the ABR SDK TTS API: character set, the tts_preprocess preprocessing pipeline, and SSML markup.'
    'keywords': 'tts, text input, preprocessing, tts_preprocess, ssml, normalization'
---

# Input format

The TTS API accepts raw UTF-8 text bytes. This page describes the character set the model
recognizes, the {py:func}`~abr_sdk.tts_preprocess.tts_preprocess` normalization pipeline that
converts raw text into that form, and the SSML markup the model understands.

## Required format

Pass text as UTF-8-encoded `bytes` to {py:meth}`~abr_sdk.tts.Tts.push`. The model's phoneme
dictionary operates on a restricted character set:

| Accepted                         | Examples         |
| -------------------------------- | ---------------- |
| Letters (upper and lowercase)    | `a`–`z`, `A`–`Z` |
| Space                            | ` `              |
| Apostrophe                       | `'`              |
| Comma, period                    | `,` `.`          |
| Question mark, exclamation       | `?` `!`          |
| Hyphen                           | `-`              |
| Angle brackets (SSML delimiters) | `<` `>`          |

Digits, accents, currency symbols, and most punctuation are not in the model's vocabulary. Passing
them produces silent gaps or garbled output. Use {py:func}`~abr_sdk.tts_preprocess.tts_preprocess`
to convert raw text before encoding it.

## Preprocessing with `tts_preprocess`

{py:func}`~abr_sdk.tts_preprocess.tts_preprocess` normalizes a Python string into a form the TTS
model can pronounce. It returns a `str`; encode it to `bytes` before pushing.

```python
from abr_sdk.tts_preprocess import tts_preprocess

text = tts_preprocess("Dr. Smith owed $42 to the WHO.")
# -> "doctor Smith owed forty two dollars to the W H O."
pcm_input = text.encode("utf-8")
```

### What the pipeline does

The default pipeline runs these steps in order:

| Step                          | Example                                                                         |
| ----------------------------- | ------------------------------------------------------------------------------- |
| Spell out acronyms            | `WHO` → `W H O`, `HTTP2` → `H T T P two`                                        |
| Normalize dates               | `12/25/2024` → `December twenty fifth, two thousand twenty four`                |
| Numbers to words              | `$42.50` → `forty two dollars and fifty cents`, `3.14` → `three point one four` |
| Unicode normalization         | `é` → `e`, `—` → `,`, `…` → `, , ,`                                             |
| Collapse repeated punctuation | `!!!!` → `!`                                                                    |
| Collapse whitespace           | multiple spaces → single space                                                  |
| Remove special characters     | drops anything outside the model alphabet                                       |
| Expand abbreviations          | `Dr.` → `doctor`, `St.` → `street`, `etc.` → `et cetera`                        |

A trailing period is added when the text has no sentence-ending punctuation, so the synthesis
pipeline has a terminal to flush on.

### Custom pipelines

Pass a list of callables as `pipeline` to replace the default order or use a subset of steps:

```python
from abr_sdk.tts_preprocess import tts_preprocess, collapse_whitespace, expand_abbreviations

text = tts_preprocess(raw, pipeline=[collapse_whitespace, expand_abbreviations])
```

Each function in the pipeline takes a `str` and returns a `str`. The steps in
`abr_sdk.tts_preprocess` are plain functions you can import individually.

## SSML markup

The model supports a subset of SSML (Speech Synthesis Markup Language) for controlling prosody and
emotion. Tags are passed through `tts_preprocess` untouched: only the spoken text between tags is
normalized.

```python
text = tts_preprocess('<abr:emotion type="calm">Hello!</abr:emotion>')
# spoken text normalized; tags preserved verbatim
```

### Supported tags

| Tag                                             | Description                                                                                                                                                                                                                               |
| ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `<speak>`                                       | Optional top-level wrapper. A closing `</speak>` forces a pipeline flush.                                                                                                                                                                 |
| `<prosody rate="..." pitch="..." volume="...">` | Controls speech rate, pitch, and volume. Accepts named presets (`x-slow`, `slow`, `medium`, `fast`, `x-fast` for rate; `x-low` through `x-high` for pitch; `silent` through `x-loud` for volume) or percentage values. Tags are nestable. |
| `<abr:emotion type="...">`                      | Applies a composite prosody preset. Nestable with `<prosody>`.                                                                                                                                                                            |

Supported `<abr:emotion>` types: `apologetic`, `calm`, `empathetic`, `firm`, `lively`.

Unsupported tags and attributes are ignored.

When SSML tags are present, `tts_preprocess` does **not** add a trailing period between or after the
runs. Use explicit sentence-ending punctuation, a `</speak>` tag, or the flush byte to end a tagged
utterance.

:::{admonition} Next steps
:class: hint

- {doc}`/tts/examples`: full examples showing how to preprocess and synthesize text.
- {doc}`/tts/overview`: overview of the TTS streaming model and output format.
  :::
