abr_sdk.asr#
Automatic speech recognition (ASR) wrapper and chunk parser.
Author: Andreas Stöckel (Applied Brain Research) Author: Pawel Jaworski (Applied Brain Research)
Module Contents#
Classes#
Decoder mode selecting the latency / accuracy trade-off. |
|
An ABR instance with automatic speech recognition support. |
|
A text chunk produced by the ASR subsystem. |
|
Collected ASR output chunks with text assembly. |
|
Event loop for streaming PCM audio through an :class: |
API#
- class abr_sdk.asr.AsrMode(*args, **kwds)#
Bases:
enum.EnumDecoder mode selecting the latency / accuracy trade-off.
Initialization
- FAST#
‘fast’
- ACCURATE#
‘accurate’
- __str__() str#
- class abr_sdk.asr.Asr(lib_or_path: str | pathlib.Path | abr_sdk.core.Library, *, mode: abr_sdk.asr.AsrMode | None = None, enable_spellcheck: bool | None = None, enable_punctuation: bool | None = None, lib_search_paths: list[str | pathlib.Path] | None = None, use_default_lib_search_paths: bool = True, resources_dir: str | pathlib.Path | None = None, logger: logging.Logger | None = None)#
Bases:
abr_sdk.core.ApplicationAn ABR instance with automatic speech recognition support.
Can be constructed from a library path with keyword arguments:
with Asr("libabr-asr.so") as asr:Simple (blocking) API – process an entire audio clip at once::
transcript = asr.process(pcm_bytes) print(transcript.text)
Streaming API – push audio incrementally::
transcript = AsrTranscript() asr.push(chunk1, on_chunk=transcript.chunks.append) asr.push(chunk2, on_chunk=transcript.chunks.append) asr.wait_for_completion() print(transcript.text)
For finer control over the streaming event loop, use :class:
Processordirectly.Initialization
Create a new object instance wrapping the given handle and ABI instance.
Arguments
cabi CABI instance providing access to the low-level C functions. A C ABI object may be obtained by loading an ABR SDK shared library.
handle Pointer at the ABR SDK object that should be wrapped by the new
Handleinstance. Must be non-Noneunless handle_may_be_none isTrue.handle_may_be_none If
True, handle may beNone. Used for access to the library metadata.class_ String containing the expected object class. If not set to
None, then the given string is compared to the “class” property of the handle.- input_buffer: abr_sdk.core.Buffer#
None
FIFO byte queue that receives the raw PCM audio pushed into the model.
- text_chunk_output_buffer: abr_sdk.core.Buffer#
None
FIFO byte queue of serialized :class:
AsrChunkrecords the model produces.
- __enter__() abr_sdk.asr.Asr#
- flush() None#
Flush the ASR pipeline to finish processing remaining audio.
- process(data: bytes) abr_sdk.asr.AsrTranscript#
Process PCM audio data and return the complete transcript.
This is a synchronous/blocking call that pushes all data through the ASR pipeline, waits for the neural network to finish, and returns a :class:
AsrTranscriptcontaining the result. Cannot be used while a streaming session started with :meth:pushis in progress.Parameters
data PCM audio as a little-endian 16-bit byte array.
- push(data: bytes, *, on_chunk: collections.abc.Callable[[abr_sdk.asr.AsrChunk], None] | None = None, output_poll_timeout_ms: int = 0) None#
Push PCM audio data into the ASR network (streaming API).
On the first call an internal :class:
Processoris created with on_chunk as the listener callback. Subsequent calls reuse the same processor (the on_chunk argument is ignored after the first call). Call :meth:wait_for_completionafter the last audio chunk has been pushed.Parameters
data PCM audio as little-endian 16-bit bytes. on_chunk Callback invoked for each transcribed text chunk. Only used on the first call (when the internal processor is created). output_poll_timeout_ms Extra time in milliseconds to spend waiting for output after the input has been pushed.
0(the default) returns as soon as all input bytes have been consumed.
- wait_for_completion() None#
Block until all previously pushed data is fully processed.
The
on_chunkcallback may be invoked during this call. When this method returns, the internal processor is closed and a new streaming session can be started by calling :meth:pushagain.
- close() None#
Release all resources held by this instance.
- class abr_sdk.asr.AsrChunk#
A text chunk produced by the ASR subsystem.
Parse from raw buffer output with :meth:
parse. Apply to a running transcript with :meth:update.- SIZE#
‘sizeof(…)’
- type: abr_sdk.cabi.AsrTextChunkType#
None
- replace_byte_offset_begin: int#
None
- replace_byte_offset_end: int#
None
- data: bytes#
None
- static parse(raw: bytes | bytearray) abr_sdk.asr.AsrChunk#
Parse raw bytes from the ASR output buffer into an :class:
AsrChunk.raw must be exactly :attr:
SIZEbytes.
- update(buf: bytearray) None#
Apply this chunk to a running transcript
bytearray.
- class abr_sdk.asr.AsrTranscript#
Collected ASR output chunks with text assembly.
Initialization
- property text: str#
Assemble and return the full transcript text from all chunks.
- class abr_sdk.asr.Processor(asr: abr_sdk.asr.Asr, on_chunk: collections.abc.Callable[[abr_sdk.asr.AsrChunk], None] | None = None)#
Event loop for streaming PCM audio through an :class:
Asrapplication.Feeds audio into the ASR pipeline and delivers transcribed text chunks. Attach to an :class:
Asrinstance and push audio data incrementally. Output chunks are delivered via the on_chunk callback::with Processor(asr, on_chunk=my_callback) as proc: proc.push(chunk1) proc.push(chunk2) proc.wait_for_completion()This class is also used internally by :meth:
Asr.process.Initialization
- process_and_wait_for_output(data: bytes | None, timeout_ms: int, flush: bool) None#
Push input data and wait for output text chunks.
This is the core event loop. Higher-level methods :meth:
pushand- Meth:
wait_for_completiondelegate to this method.
Parameters
data PCM input bytes (little-endian 16-bit), or None to push no new data. timeout_ms Maximum time in milliseconds to spend waiting for output after all input has been pushed.
0means return immediately once input is consumed. The timeout is measured from when this method is called. flush If True, flush the ASR pipeline after all input is streamed and wait until the neural network becomes idle.
- push(data: bytes, output_poll_timeout_ms: int = 0) None#
Push PCM audio data into the ASR network.
This may block briefly if the input buffer is full. The
on_chunkcallback may be invoked during this call.Parameters
data PCM audio as little-endian 16-bit bytes. output_poll_timeout_ms Extra time in milliseconds to spend waiting for output after the input has been consumed.
0(the default) returns as soon as the input is pushed.
- wait_for_completion() None#
Block until all previously pushed data is fully processed.
The
on_chunkcallback may be invoked during this call.
- close() None#
Release all event resources held by this processor.
- __enter__() abr_sdk.asr.Processor#