Examples#

All examples assume the TTS application package has been extracted and LIBRARY_PATH points to the shared library inside it. Replace the path with the actual location on your device.

Two English voices are available: nith-5m-live.en-f-quill (female) and nith-5m-live.en-m-slate (male). Both packages contain a library named libnith_5m_live.so; you select a voice by pointing LIBRARY_PATH at the corresponding package directory. The examples below alternate between the two.

Every example calls tts_preprocess() before encoding. Skip that step only when the input is already in the expected format.

Synthesize to a raw PCM file#

Collect all audio chunks and write them to a .pcm file. The file contains raw S16 LE mono 16 kHz samples with no container headers. Play it back with aplay -f S16_LE -r 16000 -c 1 output.pcm on Linux.

Run with:

python synthesize_pcm.py
aplay -f S16_LE -r 16000 -c 1 output.pcm
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess

LIBRARY_PATH = "/path/to/nith-5m-live.en-f-quill/libnith_5m_live.so"  # female voice

text = tts_preprocess("Hello, world. This is on-device text-to-speech.").encode("utf-8")

chunks: list[bytes] = []

with Tts(LIBRARY_PATH) as tts:
    tts.push(text, on_pcm=chunks.append)
    tts.wait_for_completion()

with open("output.pcm", "wb") as f:
    f.write(b"".join(chunks))

Stream audio directly to speakers#

Stream PCM bytes to your default audio output device as they arrive using sounddevice. The first audio plays before synthesis has finished.

Run with:

python stream_speakers.py
import sounddevice as sd
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess

LIBRARY_PATH = "/path/to/nith-5m-live.en-m-slate/libnith_5m_live.so"  # male voice
SAMPLE_RATE = 16000

text = tts_preprocess("Streaming speech output to speakers.").encode("utf-8")

with sd.RawOutputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
    with Tts(LIBRARY_PATH) as tts:
        tts.push(text, on_pcm=stream.write)
        tts.wait_for_completion()

Synthesize multiple sentences#

Push each sentence separately. The model flushes synthesis at sentence boundaries, so splitting input at natural breaks reduces latency to first audio.

Run with:

python synthesize_sentences.py
import sounddevice as sd
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess

LIBRARY_PATH = "/path/to/nith-5m-live.en-f-quill/libnith_5m_live.so"  # female voice
SAMPLE_RATE = 16000

sentences = [
    "The ABR SDK runs entirely on-device.",
    "No audio data is sent to an external server.",
    "It uses a state-space model architecture.",
]

with sd.RawOutputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
    with Tts(LIBRARY_PATH) as tts:
        for sentence in sentences:
            tts.push(tts_preprocess(sentence).encode("utf-8"), on_pcm=stream.write)
        tts.wait_for_completion()

Read text from stdin#

Read lines from stdin and synthesize each one. Useful for testing or for piping text from another process.

Run with:

echo "Hello world" | python synthesize_stdin.py
import sys
import sounddevice as sd
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess

LIBRARY_PATH = "/path/to/nith-5m-live.en-m-slate/libnith_5m_live.so"  # male voice
SAMPLE_RATE = 16000

with sd.RawOutputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
    with Tts(LIBRARY_PATH) as tts:
        for line in sys.stdin:
            line = line.rstrip("\n")
            if line:
                tts.push(tts_preprocess(line).encode("utf-8"), on_pcm=stream.write)
        tts.wait_for_completion()

Synthesize LLM output token by token#

Buffer LLM tokens into sentences and push each complete sentence. This keeps latency low: the first spoken sentence plays before the LLM has produced the full response.

from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess

LIBRARY_PATH = "/path/to/nith-5m-live.en-f-quill/libnith_5m_live.so"  # female voice

SENTENCE_ENDS = {".", "!", "?"}

def speak_llm_stream(token_iterator, on_pcm):
    buf = ""
    with Tts(LIBRARY_PATH) as tts:
        for token in token_iterator:
            buf += token
            if any(buf.rstrip().endswith(p) for p in SENTENCE_ENDS):
                tts.push(tts_preprocess(buf).encode("utf-8"), on_pcm=on_pcm)
                buf = ""
        if buf.strip():
            tts.push(tts_preprocess(buf).encode("utf-8"), on_pcm=on_pcm)
        tts.wait_for_completion()

Pass any iterator of token strings and a callback that receives raw PCM bytes:

# token_iterator: any iterable of string tokens from an LLM
# on_pcm: called with each bytes chunk as audio is synthesized

speak_llm_stream(
    token_iterator=llm.stream("Tell me a story."),
    on_pcm=audio_output.write,
)

Next steps

  • Input format: text requirements and the full preprocessing pipeline.

  • Overview: output format and how the streaming model works.