abr_sdk.asr#

Automatic speech recognition (ASR) wrapper and chunk parser.

Author: Andreas Stöckel (Applied Brain Research) Author: Pawel Jaworski (Applied Brain Research)

Module Contents#

Classes#

AsrMode

Decoder mode selecting the latency / accuracy trade-off.

Asr

An ABR instance with automatic speech recognition support.

AsrChunk

A text chunk produced by the ASR subsystem.

AsrTranscript

Collected ASR output chunks with text assembly.

Processor

Event loop for streaming PCM audio through an :class:Asr application.

API#

class abr_sdk.asr.AsrMode(*args, **kwds)#

Bases: enum.Enum

Decoder mode selecting the latency / accuracy trade-off.

Initialization

FAST#

‘fast’

ACCURATE#

‘accurate’

__str__() str#
class abr_sdk.asr.Asr(lib_or_path: str | pathlib.Path | abr_sdk.core.Library, *, mode: abr_sdk.asr.AsrMode | None = None, enable_spellcheck: bool | None = None, enable_punctuation: bool | None = None, lib_search_paths: list[str | pathlib.Path] | None = None, use_default_lib_search_paths: bool = True, resources_dir: str | pathlib.Path | None = None, logger: logging.Logger | None = None)#

Bases: abr_sdk.core.Application

An ABR instance with automatic speech recognition support.

Can be constructed from a library path with keyword arguments:

with Asr("libabr-asr.so") as asr:

Simple (blocking) API – process an entire audio clip at once::

transcript = asr.process(pcm_bytes)
print(transcript.text)

Streaming API – push audio incrementally::

transcript = AsrTranscript()
asr.push(chunk1, on_chunk=transcript.chunks.append)
asr.push(chunk2, on_chunk=transcript.chunks.append)
asr.wait_for_completion()
print(transcript.text)

For finer control over the streaming event loop, use :class:Processor directly.

Initialization

Create a new object instance wrapping the given handle and ABI instance.

Arguments

cabi CABI instance providing access to the low-level C functions. A C ABI object may be obtained by loading an ABR SDK shared library.

handle Pointer at the ABR SDK object that should be wrapped by the new Handle instance. Must be non-None unless handle_may_be_none is True.

handle_may_be_none If True, handle may be None. Used for access to the library metadata.

class_ String containing the expected object class. If not set to None, then the given string is compared to the “class” property of the handle.

input_buffer: abr_sdk.core.Buffer#

None

FIFO byte queue that receives the raw PCM audio pushed into the model.

text_chunk_output_buffer: abr_sdk.core.Buffer#

None

FIFO byte queue of serialized :class:AsrChunk records the model produces.

__enter__() abr_sdk.asr.Asr#
flush() None#

Flush the ASR pipeline to finish processing remaining audio.

process(data: bytes) abr_sdk.asr.AsrTranscript#

Process PCM audio data and return the complete transcript.

This is a synchronous/blocking call that pushes all data through the ASR pipeline, waits for the neural network to finish, and returns a :class:AsrTranscript containing the result. Cannot be used while a streaming session started with :meth:push is in progress.

Parameters

data PCM audio as a little-endian 16-bit byte array.

push(data: bytes, *, on_chunk: collections.abc.Callable[[abr_sdk.asr.AsrChunk], None] | None = None, output_poll_timeout_ms: int = 0) None#

Push PCM audio data into the ASR network (streaming API).

On the first call an internal :class:Processor is created with on_chunk as the listener callback. Subsequent calls reuse the same processor (the on_chunk argument is ignored after the first call). Call :meth:wait_for_completion after the last audio chunk has been pushed.

Parameters

data PCM audio as little-endian 16-bit bytes. on_chunk Callback invoked for each transcribed text chunk. Only used on the first call (when the internal processor is created). output_poll_timeout_ms Extra time in milliseconds to spend waiting for output after the input has been pushed. 0 (the default) returns as soon as all input bytes have been consumed.

wait_for_completion() None#

Block until all previously pushed data is fully processed.

The on_chunk callback may be invoked during this call. When this method returns, the internal processor is closed and a new streaming session can be started by calling :meth:push again.

close() None#

Release all resources held by this instance.

class abr_sdk.asr.AsrChunk#

A text chunk produced by the ASR subsystem.

Parse from raw buffer output with :meth:parse. Apply to a running transcript with :meth:update.

SIZE#

‘sizeof(…)’

type: abr_sdk.cabi.AsrTextChunkType#

None

replace_byte_offset_begin: int#

None

replace_byte_offset_end: int#

None

data: bytes#

None

static parse(raw: bytes | bytearray) abr_sdk.asr.AsrChunk#

Parse raw bytes from the ASR output buffer into an :class:AsrChunk.

raw must be exactly :attr:SIZE bytes.

update(buf: bytearray) None#

Apply this chunk to a running transcript bytearray.

class abr_sdk.asr.AsrTranscript#

Collected ASR output chunks with text assembly.

Initialization

property text: str#

Assemble and return the full transcript text from all chunks.

class abr_sdk.asr.Processor(asr: abr_sdk.asr.Asr, on_chunk: collections.abc.Callable[[abr_sdk.asr.AsrChunk], None] | None = None)#

Event loop for streaming PCM audio through an :class:Asr application.

Feeds audio into the ASR pipeline and delivers transcribed text chunks. Attach to an :class:Asr instance and push audio data incrementally. Output chunks are delivered via the on_chunk callback::

with Processor(asr, on_chunk=my_callback) as proc:
    proc.push(chunk1)
    proc.push(chunk2)
    proc.wait_for_completion()

This class is also used internally by :meth:Asr.process.

Initialization

process_and_wait_for_output(data: bytes | None, timeout_ms: int, flush: bool) None#

Push input data and wait for output text chunks.

This is the core event loop. Higher-level methods :meth:push and

Meth:

wait_for_completion delegate to this method.

Parameters

data PCM input bytes (little-endian 16-bit), or None to push no new data. timeout_ms Maximum time in milliseconds to spend waiting for output after all input has been pushed. 0 means return immediately once input is consumed. The timeout is measured from when this method is called. flush If True, flush the ASR pipeline after all input is streamed and wait until the neural network becomes idle.

push(data: bytes, output_poll_timeout_ms: int = 0) None#

Push PCM audio data into the ASR network.

This may block briefly if the input buffer is full. The on_chunk callback may be invoked during this call.

Parameters

data PCM audio as little-endian 16-bit bytes. output_poll_timeout_ms Extra time in milliseconds to spend waiting for output after the input has been consumed. 0 (the default) returns as soon as the input is pushed.

wait_for_completion() None#

Block until all previously pushed data is fully processed.

The on_chunk callback may be invoked during this call.

close() None#

Release all event resources held by this processor.

__enter__() abr_sdk.asr.Processor#
__exit__(type_: type[BaseException] | None, value: BaseException | None, traceback: types.TracebackType | None) None#