---
myst:
  html_meta:
    'description': 'Overview of text-to-speech in the ABR SDK, covering on-device synthesis, the streaming push model, and PCM output format.'
    'keywords': 'tts, text-to-speech, streaming, on-device, pcm, speech synthesis'
---

# Overview

The ABR SDK synthesizes speech from text on-device, with no network round-trip. You push UTF-8 text
bytes and receive synthesized PCM (uncompressed audio) back through a callback as the model produces
it.

The TTS API is provided by the {py:mod}`~abr_sdk.tts` module. The main class you interact with is
{py:class}`~abr_sdk.tts.Tts`.

## How synthesis works

TTS accepts raw UTF-8 text bytes and produces mono PCM audio. You push text in pieces and a callback
fires each time the model produces new audio. Output begins arriving before the full text has been
consumed: the model streams audio as it synthesizes, sentence by sentence.

```python title="synthesize.py"
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess

LIBRARY_PATH = "/path/to/nith-5m-live.en-m-slate/libnith_5m_live.so"  # male voice

text = tts_preprocess("Hello, world.").encode("utf-8")

chunks: list[bytes] = []

with Tts(LIBRARY_PATH) as tts:
    tts.push(text, on_pcm=chunks.append)
    tts.wait_for_completion()

pcm = b"".join(chunks)
```

The {py:meth}`~abr_sdk.tts.Tts.push` call is non-blocking: it feeds text to the model and returns as
soon as the input is accepted. The `on_pcm` callback fires as PCM becomes available.
{py:meth}`~abr_sdk.tts.Tts.wait_for_completion` signals end-of-input, flushes the synthesis
pipeline, and blocks until all audio has been delivered through the callback.

## Output format

The PCM bytes delivered to `on_pcm` are in the same format as the audio the ASR API consumes:

| Property    | Value                       |
| ----------- | --------------------------- |
| Encoding    | Signed 16-bit integer (S16) |
| Byte order  | Little-endian (LE)          |
| Channels    | Mono (1 channel)            |
| Sample rate | 16,000 Hz                   |
| Container   | Raw bytes. No file headers  |

One second of output is 32,000 bytes (16,000 samples × 2 bytes per sample). Each `on_pcm` call may
deliver any number of samples.

## Text preprocessing

The TTS model expects clean, speakable text: no numbers written as digits, no acronyms, no accented
characters. {py:func}`~abr_sdk.tts_preprocess.tts_preprocess` normalizes raw text into a form the
model can pronounce.

```python
from abr_sdk.tts_preprocess import tts_preprocess

text = tts_preprocess("Dr. Smith owed $42 to the WHO.")
# -> "doctor Smith owed forty two dollars to the W H O."
```

Call `tts_preprocess` on every string before encoding and pushing it. See {doc}`/tts/input-format`
for what it does and when to skip it.

## What you need to use TTS

To run TTS you need three things:

1. **The SDK Python package** (`pip install abr-sdk`).
2. **An application package** for your target platform. This archive contains the compiled model,
   the network weights, and supporting files. You download it from the
   [ABR developer portal](https://dev.appliedbrainresearch.com) and extract it on the device. See
   {doc}`/concepts/application-packages`.
3. **Text in the expected form.** The model requires UTF-8 text with numbers and abbreviations
   already converted to spoken words. {py:func}`~abr_sdk.tts_preprocess.tts_preprocess` handles this
   automatically.

:::{admonition} Next steps
:class: hint

- New to the SDK? Start with {doc}`/getting-started/tts-quickstart`.
- {doc}`/tts/input-format`: text requirements, the preprocessing pipeline, and SSML markup.
- {doc}`/tts/examples`: full, runnable examples for file output, speaker playback, and more.
  :::
