ASR quickstart#

This tutorial takes you from a fresh installation to a running transcript. By the end you will have a Python script that streams audio from your microphone and prints the transcribed text to stdout.

Prerequisites#

You need:

  • The ABR SDK: pip install abr-sdk

  • An ASR application package downloaded and extracted from the ABR developer portal. You should have a directory containing a .so shared library file.

  • sounddevice: pip install sounddevice used to capture audio from your microphone.

  • A working microphone connected to your device.

If you have not downloaded a package yet, follow Installation first.

Write the script#

Now we’ll write a script that streams audio from your microphone and prints the transcribed text to stdout.

Create a file called asr_quickstart.py:

import sounddevice as sd
from abr_sdk.asr import Asr, AsrTranscript

LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
SAMPLE_RATE = 16000
CHUNK_FRAMES = SAMPLE_RATE // 10  # 100 ms

transcript = AsrTranscript()

with Asr(LIBRARY_PATH, sample_rate=SAMPLE_RATE) as asr:
    print("Listening… press Ctrl-C to stop.")
    with sd.RawInputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
        try:
            while True:
                data, _ = stream.read(CHUNK_FRAMES)
                asr.push(bytes(data), on_chunk=transcript.chunks.append)
        except KeyboardInterrupt:
            pass
    asr.wait_for_completion()

print(transcript.text)

Set LIBRARY_PATH to the absolute path of the .so file inside your extracted package directory. For example, the niagara-38m-live.en package contains libniagara_38m_live.so.

Run it#

python asr_quickstart.py

Speak into your microphone, then press Ctrl-C to stop. The final transcript is printed to stdout.

What just happened#

sd.RawInputStream(samplerate=16000, channels=1, dtype="int16") opens your default microphone and configures it to deliver raw 16-bit mono PCM samples at 16 kHz, the format the ASR model expects.

stream.read(CHUNK_FRAMES) pulls the next 100 ms of audio from the microphone buffer. The _ return value is an overflow flag that can be ignored for this example.

CHUNK_FRAMES = SAMPLE_RATE // 10 gives 100 milliseconds of audio per push call. The model does not require a specific chunk size; smaller chunks reduce latency.

AsrTranscript() collects chunks and assembles the final transcript, handling all overwrites automatically. Calling .text after wait_for_completion() returns the final string.

asr.push(..., on_chunk=transcript.chunks.append) hands each audio chunk to the model. push() is non-blocking; it returns as soon as the input bytes are accepted.

asr.wait_for_completion() signals end-of-audio, flushes the model pipeline, and blocks until every pending text chunk has been delivered.

Next steps