Examples#
All examples assume the TTS application package has been extracted and LIBRARY_PATH points to the
shared library inside it. Replace the path with the actual location on your device.
Two English voices are available: nith-5m-live.en-f-quill (female) and nith-5m-live.en-m-slate
(male). Both packages contain a library named libnith_5m_live.so; you select a voice by pointing
LIBRARY_PATH at the corresponding package directory. The examples below alternate between the two.
Every example calls tts_preprocess() before encoding. Skip that step
only when the input is already in the expected format.
Synthesize to a raw PCM file#
Collect all audio chunks and write them to a .pcm file. The file contains raw S16 LE mono 16 kHz
samples with no container headers. Play it back with aplay -f S16_LE -r 16000 -c 1 output.pcm on
Linux.
Run with:
python synthesize_pcm.py
aplay -f S16_LE -r 16000 -c 1 output.pcm
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess
LIBRARY_PATH = "/path/to/nith-5m-live.en-f-quill/libnith_5m_live.so" # female voice
text = tts_preprocess("Hello, world. This is on-device text-to-speech.").encode("utf-8")
chunks: list[bytes] = []
with Tts(LIBRARY_PATH) as tts:
tts.push(text, on_pcm=chunks.append)
tts.wait_for_completion()
with open("output.pcm", "wb") as f:
f.write(b"".join(chunks))
Stream audio directly to speakers#
Stream PCM bytes to your default audio output device as they arrive using sounddevice. The first
audio plays before synthesis has finished.
Run with:
python stream_speakers.py
import sounddevice as sd
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess
LIBRARY_PATH = "/path/to/nith-5m-live.en-m-slate/libnith_5m_live.so" # male voice
SAMPLE_RATE = 16000
text = tts_preprocess("Streaming speech output to speakers.").encode("utf-8")
with sd.RawOutputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
with Tts(LIBRARY_PATH) as tts:
tts.push(text, on_pcm=stream.write)
tts.wait_for_completion()
Synthesize multiple sentences#
Push each sentence separately. The model flushes synthesis at sentence boundaries, so splitting input at natural breaks reduces latency to first audio.
Run with:
python synthesize_sentences.py
import sounddevice as sd
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess
LIBRARY_PATH = "/path/to/nith-5m-live.en-f-quill/libnith_5m_live.so" # female voice
SAMPLE_RATE = 16000
sentences = [
"The ABR SDK runs entirely on-device.",
"No audio data is sent to an external server.",
"It uses a state-space model architecture.",
]
with sd.RawOutputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
with Tts(LIBRARY_PATH) as tts:
for sentence in sentences:
tts.push(tts_preprocess(sentence).encode("utf-8"), on_pcm=stream.write)
tts.wait_for_completion()
Read text from stdin#
Read lines from stdin and synthesize each one. Useful for testing or for piping text from another process.
Run with:
echo "Hello world" | python synthesize_stdin.py
import sys
import sounddevice as sd
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess
LIBRARY_PATH = "/path/to/nith-5m-live.en-m-slate/libnith_5m_live.so" # male voice
SAMPLE_RATE = 16000
with sd.RawOutputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
with Tts(LIBRARY_PATH) as tts:
for line in sys.stdin:
line = line.rstrip("\n")
if line:
tts.push(tts_preprocess(line).encode("utf-8"), on_pcm=stream.write)
tts.wait_for_completion()
Synthesize LLM output token by token#
Buffer LLM tokens into sentences and push each complete sentence. This keeps latency low: the first spoken sentence plays before the LLM has produced the full response.
from abr_sdk.tts import Tts
from abr_sdk.tts_preprocess import tts_preprocess
LIBRARY_PATH = "/path/to/nith-5m-live.en-f-quill/libnith_5m_live.so" # female voice
SENTENCE_ENDS = {".", "!", "?"}
def speak_llm_stream(token_iterator, on_pcm):
buf = ""
with Tts(LIBRARY_PATH) as tts:
for token in token_iterator:
buf += token
if any(buf.rstrip().endswith(p) for p in SENTENCE_ENDS):
tts.push(tts_preprocess(buf).encode("utf-8"), on_pcm=on_pcm)
buf = ""
if buf.strip():
tts.push(tts_preprocess(buf).encode("utf-8"), on_pcm=on_pcm)
tts.wait_for_completion()
Pass any iterator of token strings and a callback that receives raw PCM bytes:
# token_iterator: any iterable of string tokens from an LLM
# on_pcm: called with each bytes chunk as audio is synthesized
speak_llm_stream(
token_iterator=llm.stream("Tell me a story."),
on_pcm=audio_output.write,
)
Next steps
Input format: text requirements and the full preprocessing pipeline.
Overview: output format and how the streaming model works.