Overview#
The ABR SDK synthesizes speech from text on-device, with no network round-trip. You push UTF-8 text bytes and receive synthesized PCM (uncompressed audio) back through a callback as the model produces it.
The TTS API is provided by the tts module. The main class you interact with is
Tts.
How synthesis works#
TTS accepts raw UTF-8 text bytes and produces mono PCM audio. You push text in pieces and a callback fires each time the model produces new audio. Output begins arriving before the full text has been consumed: the model streams audio as it synthesizes, sentence by sentence.
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess
LIBRARY_PATH = "/path/to/nith-5m-live.en-m-slate/libnith_5m_live.so" # male voice
text = tts_preprocess("Hello, world.").encode("utf-8")
chunks: list[bytes] = []
with Tts(LIBRARY_PATH) as tts:
tts.push(text, on_pcm=chunks.append)
tts.wait_for_completion()
pcm = b"".join(chunks)
The push() call is non-blocking: it feeds text to the model and returns as
soon as the input is accepted. The on_pcm callback fires as PCM becomes available.
wait_for_completion() signals end-of-input, flushes the synthesis
pipeline, and blocks until all audio has been delivered through the callback.
Output format#
The PCM bytes delivered to on_pcm are in the same format as the audio the ASR API consumes:
Property |
Value |
|---|---|
Encoding |
Signed 16-bit integer (S16) |
Byte order |
Little-endian (LE) |
Channels |
Mono (1 channel) |
Sample rate |
16,000 Hz |
Container |
Raw bytes. No file headers |
One second of output is 32,000 bytes (16,000 samples × 2 bytes per sample). Each on_pcm call may
deliver any number of samples.
Text preprocessing#
The TTS model expects clean, speakable text: no numbers written as digits, no acronyms, no accented
characters. tts_preprocess() normalizes raw text into a form the
model can pronounce.
from abr_sdk.tts_preprocess import tts_preprocess
text = tts_preprocess("Dr. Smith owed $42 to the WHO.")
# -> "doctor Smith owed forty two dollars to the W H O."
Call tts_preprocess on every string before encoding and pushing it. See Input format
for what it does and when to skip it.
What you need to use TTS#
To run TTS you need three things:
The SDK Python package (
pip install abr-sdk).An application package for your target platform. This archive contains the compiled model, the network weights, and supporting files. You download it from the ABR developer portal and extract it on the device. See Application packages.
Text in the expected form. The model requires UTF-8 text with numbers and abbreviations already converted to spoken words.
tts_preprocess()handles this automatically.
Next steps
New to the SDK? Start with TTS quickstart.
Input format: text requirements, the preprocessing pipeline, and SSML markup.
Examples: full, runnable examples for file output, speaker playback, and more.