Overview#

The ABR SDK synthesizes speech from text on-device, with no network round-trip. You push UTF-8 text bytes and receive synthesized PCM (uncompressed audio) back through a callback as the model produces it.

The TTS API is provided by the tts module. The main class you interact with is Tts.

How synthesis works#

TTS accepts raw UTF-8 text bytes and produces mono PCM audio. You push text in pieces and a callback fires each time the model produces new audio. Output begins arriving before the full text has been consumed: the model streams audio as it synthesizes, sentence by sentence.

from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess

LIBRARY_PATH = "/path/to/nith-5m-live.en-m-slate/libnith_5m_live.so"  # male voice

text = tts_preprocess("Hello, world.").encode("utf-8")

chunks: list[bytes] = []

with Tts(LIBRARY_PATH) as tts:
    tts.push(text, on_pcm=chunks.append)
    tts.wait_for_completion()

pcm = b"".join(chunks)

The push() call is non-blocking: it feeds text to the model and returns as soon as the input is accepted. The on_pcm callback fires as PCM becomes available. wait_for_completion() signals end-of-input, flushes the synthesis pipeline, and blocks until all audio has been delivered through the callback.

Output format#

The PCM bytes delivered to on_pcm are in the same format as the audio the ASR API consumes:

Property	Value
Encoding	Signed 16-bit integer (S16)
Byte order	Little-endian (LE)
Channels	Mono (1 channel)
Sample rate	16,000 Hz
Container	Raw bytes. No file headers

One second of output is 32,000 bytes (16,000 samples × 2 bytes per sample). Each on_pcm call may deliver any number of samples.

Text preprocessing#

The TTS model expects clean, speakable text: no numbers written as digits, no acronyms, no accented characters. tts_preprocess() normalizes raw text into a form the model can pronounce.

from abr_sdk.tts_preprocess import tts_preprocess

text = tts_preprocess("Dr. Smith owed $42 to the WHO.")
# -> "doctor Smith owed forty two dollars to the W H O."

Call tts_preprocess on every string before encoding and pushing it. See Input format for what it does and when to skip it.

What you need to use TTS#

To run TTS you need three things:

The SDK Python package (pip install abr-sdk).
An application package for your target platform. This archive contains the compiled model, the network weights, and supporting files. You download it from the ABR developer portal and extract it on the device. See Application packages.
Text in the expected form. The model requires UTF-8 text with numbers and abbreviations already converted to spoken words. tts_preprocess() handles this automatically.

Next steps

New to the SDK? Start with TTS quickstart.
Input format: text requirements, the preprocessing pipeline, and SSML markup.
Examples: full, runnable examples for file output, speaker playback, and more.