Examples#

Each example below is a complete, runnable script. Set LIBRARY_PATH to the path of the .so file inside your extracted application package directory before running.

Live microphone#

Transcribes audio from a live microphone and displays the transcript in the terminal as speech is recognized. chunk.update(buf) applies each overwrite in place, so the transcript refines on the same line as ACCURATE and post-processing revisions arrive.

Run with:

arecord -f S16_LE -c 1 -r 16000 -t raw -q | python microphone.py

import sys
from abr_sdk.asr import Asr, AsrChunk

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
CHUNK_BYTES = 16000 * 2 // 10  # 100 ms of 16 kHz mono S16_LE audio

buf = bytearray()

def on_chunk(chunk: AsrChunk) -> None:
    chunk.update(buf)
    print(buf.decode("utf-8"), end="\r", flush=True)

with Asr(LIBRARY_PATH) as asr:
    print("Listening... press Ctrl+C to stop.", flush=True)
    try:
        while chunk := sys.stdin.buffer.read(CHUNK_BYTES):
            asr.push(chunk, on_chunk=on_chunk)
    except KeyboardInterrupt:
        pass
    asr.wait_for_completion()

print()

Audio file via ffmpeg#

Transcribes any audio file by piping it through ffmpeg. ffmpeg handles format conversion, so this works with WAV, MP3, FLAC, or any other format it supports.

Run with:

ffmpeg -loglevel quiet -i recording.wav -f s16le -ar 16000 -ac 1 - | python stream_file.py

import sys
from abr_sdk.asr import Asr, AsrTranscript

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
CHUNK_BYTES = 16000 * 2 // 10  # 100 ms of 16 kHz mono S16_LE audio

transcript = AsrTranscript()

with Asr(LIBRARY_PATH) as asr:
    while chunk := sys.stdin.buffer.read(CHUNK_BYTES):
        asr.push(chunk, on_chunk=transcript.chunks.append)
    asr.wait_for_completion()

print(transcript.text)

WAV file with the `wave` module#

Reads a WAV file using the standard library wave module and streams it through the ASR pipeline in 100-millisecond chunks. The assertions confirm the file is already in the required format before transcription begins. See Input format for format requirements.

Run with:

python from_wav.py

import wave
from abr_sdk.asr import Asr, AsrTranscript

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
CHUNK_FRAMES = 1600  # 100 ms at 16 kHz

transcript = AsrTranscript()

with wave.open("recording.wav", "rb") as wf:
    assert wf.getsampwidth() == 2, "expected 16-bit PCM"
    assert wf.getnchannels() == 1, "expected mono"
    assert wf.getframerate() == 16000, "expected 16 kHz"

    with Asr(LIBRARY_PATH) as asr:
        while frames := wf.readframes(CHUNK_FRAMES):
            asr.push(bytes(frames), on_chunk=transcript.chunks.append)
        asr.wait_for_completion()

print(transcript.text)

WAV or FLAC file with `soundfile`#

Reads a WAV or FLAC file using the soundfile library. soundfile returns the file’s native sample rate alongside the data; that rate is passed directly to the Asr constructor. The SDK does not resample: if the file is not at 16,000 Hz, resample it before passing.

Run with:

python from_soundfile.py

import soundfile as sf
from abr_sdk.asr import Asr, AsrTranscript

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"

data, sample_rate = sf.read("recording.wav", dtype="int16", always_2d=False)
assert data.ndim == 1, "expected mono audio"

CHUNK_FRAMES = sample_rate // 10  # 100 ms

transcript = AsrTranscript()

with Asr(LIBRARY_PATH, sample_rate=sample_rate) as asr:
    for i in range(0, len(data), CHUNK_FRAMES):
        asr.push(data[i : i + CHUNK_FRAMES].tobytes(), on_chunk=transcript.chunks.append)
    asr.wait_for_completion()

print(transcript.text)

Language model feed (FAST stream)#

When the transcript goes to a downstream language model, the FAST stream is recommended. Language models absorb minor transcription errors, and low-latency FAST output keeps the downstream response time short. This example filters to CAUSAL chunks only and discards ACCURATE and post-processing revisions.

Run with:

arecord -f S16_LE -c 1 -r 16000 -t raw -q | python llm_feed.py

import sys
from abr_sdk.asr import Asr, AsrChunk, AsrTextChunkType

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
CHUNK_BYTES = 16000 * 2 // 10

def on_chunk(chunk: AsrChunk) -> None:
    if chunk.type == AsrTextChunkType.CAUSAL:
        send_to_llm(chunk.data.decode("utf-8"))  # replace with your LLM handler

with Asr(LIBRARY_PATH) as asr:
    try:
        while chunk := sys.stdin.buffer.read(CHUNK_BYTES):
            asr.push(chunk, on_chunk=on_chunk)
    except KeyboardInterrupt:
        pass
    asr.wait_for_completion()

Subtitle display (POST-PROC stream)#

For subtitle or caption display, where text stability matters most, update the display only on POSTPROC chunks. This trades some latency for the highest-accuracy, punctuated output.

Run with:

arecord -f S16_LE -c 1 -r 16000 -t raw -q | python subtitle.py

import sys
from abr_sdk.asr import Asr, AsrChunk, AsrTextChunkType

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
CHUNK_BYTES = 16000 * 2 // 10

buf = bytearray()

def on_chunk(chunk: AsrChunk) -> None:
    if chunk.type == AsrTextChunkType.POSTPROC:
        chunk.update(buf)
        print(buf.decode("utf-8"), end="\r", flush=True)

with Asr(LIBRARY_PATH) as asr:
    try:
        while chunk := sys.stdin.buffer.read(CHUNK_BYTES):
            asr.push(chunk, on_chunk=on_chunk)
    except KeyboardInterrupt:
        pass
    asr.wait_for_completion()

print()

Terminal live display with `rich`#

Uses the rich library to display the full transcript in the terminal as it arrives, showing revision through all three passes. The transcript visibly improves as FAST output is overwritten by ACCURATE and then post-processed text.

Run with:

arecord -f S16_LE -c 1 -r 16000 -t raw -q | python rich_display.py

import sys
from rich.live import Live
from rich.text import Text
from abr_sdk.asr import Asr, AsrChunk

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
CHUNK_BYTES = 16000 * 2 // 10

buf = bytearray()

def on_chunk(chunk: AsrChunk) -> None:
    chunk.update(buf)

with Asr(LIBRARY_PATH) as asr:
    with Live() as live:
        try:
            while chunk := sys.stdin.buffer.read(CHUNK_BYTES):
                asr.push(chunk, on_chunk=on_chunk)
                live.update(Text(buf.decode("utf-8")))
        except KeyboardInterrupt:
            pass
        asr.wait_for_completion()
        live.update(Text(buf.decode("utf-8")))

Benchmarking#

For benchmarking the model or batch-processing a collection of pre-loaded audio clips, process() takes a complete clip as a single bytes object and returns the final AsrTranscript. It blocks until the neural network has finished.

Run with:

python benchmark.py

from abr_sdk.asr import Asr

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"

with open("recording.pcm", "rb") as f:
    pcm_bytes = f.read()

with Asr(LIBRARY_PATH) as asr:
    transcript = asr.process(pcm_bytes)
    print(transcript.text)

recording.pcm must contain raw PCM audio in the required format: little-endian 16-bit mono at 16,000 Hz. For batch workloads, call asr.process() multiple times on the same Asr instance; each call is independent.

Examples

Contents

Examples#

Live microphone#

Audio file via ffmpeg#

WAV file with the wave module#

WAV or FLAC file with soundfile#

Language model feed (FAST stream)#

Subtitle display (POST-PROC stream)#

Terminal live display with rich#

Benchmarking#

WAV file with the `wave` module#

WAV or FLAC file with `soundfile`#

Terminal live display with `rich`#