ASR quickstart#
This tutorial takes you from a fresh installation to a running transcript. By the end you will have a Python script that streams audio from your microphone and prints the transcribed text to stdout.
Prerequisites#
You need:
The ABR SDK:
pip install abr-sdkAn ASR application package downloaded and extracted from the ABR developer portal. You should have a directory containing a
.soshared library file.sounddevice:
pip install sounddeviceused to capture audio from your microphone.A working microphone connected to your device.
If you have not downloaded a package yet, follow Installation first.
Write the script#
Now we’ll write a script that streams audio from your microphone and prints the transcribed text to stdout.
Create a file called asr_quickstart.py:
import sounddevice as sd
from abr_sdk.asr import Asr, AsrTranscript
LIBRARY_PATH = "/path/to/niagara-38m-live.en/libniagara_38m_live.so"
SAMPLE_RATE = 16000
CHUNK_FRAMES = SAMPLE_RATE // 10 # 100 ms
transcript = AsrTranscript()
with Asr(LIBRARY_PATH, sample_rate=SAMPLE_RATE) as asr:
print("Listening… press Ctrl-C to stop.")
with sd.RawInputStream(samplerate=SAMPLE_RATE, channels=1, dtype="int16") as stream:
try:
while True:
data, _ = stream.read(CHUNK_FRAMES)
asr.push(bytes(data), on_chunk=transcript.chunks.append)
except KeyboardInterrupt:
pass
asr.wait_for_completion()
print(transcript.text)
Set LIBRARY_PATH to the absolute path of the .so file inside your extracted package directory.
For example, the niagara-38m-live.en package contains libniagara_38m_live.so.
Run it#
python asr_quickstart.py
Speak into your microphone, then press Ctrl-C to stop. The final transcript is printed to stdout.
What just happened#
sd.RawInputStream(samplerate=16000, channels=1, dtype="int16") opens your default microphone
and configures it to deliver raw 16-bit mono PCM samples at 16 kHz, the format the ASR model expects.
stream.read(CHUNK_FRAMES) pulls the next 100 ms of audio from the microphone buffer. The _
return value is an overflow flag that can be ignored for this example.
CHUNK_FRAMES = SAMPLE_RATE // 10 gives 100 milliseconds of audio per push call. The model does
not require a specific chunk size; smaller chunks reduce latency.
AsrTranscript() collects chunks and assembles the final transcript, handling all overwrites
automatically. Calling .text after wait_for_completion() returns the final string.
asr.push(..., on_chunk=transcript.chunks.append) hands each audio chunk to the model.
push() is non-blocking; it returns as soon as the input bytes are accepted.
asr.wait_for_completion() signals end-of-audio, flushes the model pipeline, and blocks until
every pending text chunk has been delivered.
Next steps
Read the Overview for a map of the full ASR API.
Understand how the model produces and revises output in Transcription stages.