---
myst:
  html_meta:
    'description': 'How the ABR SDK produces and revises transcripts through FAST and ACCURATE model outputs and post-processing, and how to handle chunk overwrites in your application.'
    'keywords': 'asr, transcription, streaming, chunks, fast, accurate, post-processing, overwrite, replace'
---

# Transcription stages

The ASR model begins processing audio immediately with the first input and emits text incrementally
as audio arrives. Each audio segment produces a model output chunk followed by a post-processing
chunk that overwrites it.

This page is for developers building applications that display or forward text as it arrives: a
terminal renderer, a subtitle overlay, a real-time feed to a downstream service.

If you only need the final transcript after all audio has been processed,
{py:class}`~abr_sdk.asr.AsrTranscript` handles the bookkeeping automatically. See
{doc}`/asr/examples` for complete runnable implementations.

## Model output and post-processing

For each segment of audio, the SDK emits two chunks in sequence.

The first is the **model output**, determined by your {py:class}`~abr_sdk.asr.AsrMode` setting:

- {py:attr}`AsrTextChunkType.CAUSAL <abr_sdk.cabi.AsrTextChunkType.CAUSAL>` (**FAST** mode): emitted
  as soon as possible, roughly 100 milliseconds after the audio arrives. The model uses no
  look-ahead. Lowest latency.
- {py:attr}`AsrTextChunkType.NONCAUSAL <abr_sdk.cabi.AsrTextChunkType.NONCAUSAL>` (**ACCURATE**
  mode): emitted after the model has used a small window of subsequent audio as context. Higher
  accuracy, higher latency.

The second is the **post-processing output**
({py:attr}`AsrTextChunkType.POSTPROC <abr_sdk.cabi.AsrTextChunkType.POSTPROC>`), applied to the model
output regardless of mode. Adds punctuation, capitalization, and spell correction. Replaces the
model output chunk for the same segment.

| Chunk type                   | When emitted                | Relative latency |
| ---------------------------- | --------------------------- | ---------------- |
| `AsrTextChunkType.CAUSAL`    | FAST model output           | Lowest           |
| `AsrTextChunkType.NONCAUSAL` | ACCURATE model output       | Medium           |
| `AsrTextChunkType.POSTPROC`  | Post-processed model output | Highest          |

See [Choosing a mode](#choosing-a-mode) for guidance on which model output to use.

## How replace ranges work

Each {py:class}`~abr_sdk.asr.AsrChunk` carries two byte offset fields that tell you where its text
belongs in the running transcript. All text is encoded as UTF-8, and the byte offsets always index
to UTF-8 code-point boundaries: you will never receive an offset that splits a multi-byte character.

- `replace_byte_offset_begin` and `replace_byte_offset_end` are **negative** byte offsets measured
  from the current end of the transcript buffer.
- A value of `0` means "the current end," so a chunk with both offsets at `0` appends new text
  without replacing anything.
- A chunk with `replace_byte_offset_begin = -N` and `replace_byte_offset_end = 0` replaces the last
  `N` bytes.

### Worked example

Consider the phrase "hello world" arriving in FAST mode. The transcript starts as an empty buffer.

**Step 1: CAUSAL chunk arrives.** Both offsets are `0`, the text is appended:

```text
buffer: b"helo wrold"   (10 bytes)
```

**Step 2: POSTPROC chunk arrives.** `replace_byte_offset_begin = -10`, `replace_byte_offset_end = 0`.
The last 10 bytes are replaced:

```text
buffer: b"Hello, world."  (14 bytes)
```

In ACCURATE mode the steps are the same, but the NONCAUSAL chunk in step 1 is more accurate before
post-processing arrives.

### Applying chunks to a buffer

{py:meth}`~abr_sdk.asr.AsrChunk.update` applies a chunk to a `bytearray`, handling the slice
replacement in place:

```python
buf = bytearray()

def on_chunk(chunk: AsrChunk) -> None:
    chunk.update(buf)
```

This is equivalent to what {py:class}`~abr_sdk.asr.AsrTranscript` does internally when you call
`.text`. Use `AsrChunk.update()` directly only when you are maintaining a custom buffer for a
renderer that needs per-chunk control.

## Rendering examples

Different output targets need different policies for which chunks to accept.

### FAST output

For a downstream language model, use FAST mode and forward only
{py:attr}`~abr_sdk.cabi.AsrTextChunkType.CAUSAL` chunks. Language models are robust to minor
transcription errors, and low-latency FAST output keeps the downstream response time short.

```python
from abr_sdk.asr import Asr, AsrChunk, AsrTextChunkType

def on_chunk(chunk: AsrChunk) -> None:
    if chunk.type == AsrTextChunkType.CAUSAL:
        send_to_llm(chunk.data.decode("utf-8"))
```

### ACCURATE output

For terminal rendering, use ACCURATE mode and accept only
{py:attr}`~abr_sdk.cabi.AsrTextChunkType.NONCAUSAL` chunks. This gives corrected output without the
additional latency of post-processing.

```python
from abr_sdk.asr import Asr, AsrChunk, AsrTextChunkType

buf = bytearray()

def on_chunk(chunk: AsrChunk) -> None:
    if chunk.type == AsrTextChunkType.NONCAUSAL:
        chunk.update(buf)
        print(buf.decode("utf-8"), end="\r", flush=True)
```

### Post-processed output

For subtitle or caption display, accept only {py:attr}`~abr_sdk.cabi.AsrTextChunkType.POSTPROC`
chunks. Post-processing is applied regardless of mode; use ACCURATE mode for the highest-accuracy
base before post-processing.

```python
from abr_sdk.asr import Asr, AsrChunk, AsrTextChunkType

buf = bytearray()

def on_chunk(chunk: AsrChunk) -> None:
    if chunk.type == AsrTextChunkType.POSTPROC:
        chunk.update(buf)
        print(buf.decode("utf-8"), end="\r", flush=True)
```

## Choosing a mode

The {py:class}`~abr_sdk.asr.AsrMode` enum controls whether you receive FAST
({py:attr}`~abr_sdk.cabi.AsrTextChunkType.CAUSAL`) or ACCURATE
({py:attr}`~abr_sdk.cabi.AsrTextChunkType.NONCAUSAL`) model output. Post-processing is applied in
both modes. Pass `mode` to the {py:class}`~abr_sdk.asr.Asr` constructor:

```python
from abr_sdk.asr import Asr, AsrMode

with Asr(library_path=LIBRARY_PATH, mode=AsrMode.ACCURATE) as asr:
    ...
```

If `mode` is not specified, the library defaults to {py:attr}`~abr_sdk.asr.AsrMode.ACCURATE`.

| Situation                                                 | Recommended mode |
| --------------------------------------------------------- | ---------------- |
| Transcript consumed by a language model                   | `FAST`           |
| Transcript displayed to a user (subtitles, live captions) | `ACCURATE`       |
| Latency is the primary constraint                         | `FAST`           |
| Accuracy is the primary constraint                        | `ACCURATE`       |

Both modes use the same underlying model. The difference is in when the model commits to its output.

:::{admonition} Next steps
:class: hint

- {doc}`/asr/examples`: Full, runnable examples for each rendering strategy.
- {doc}`/asr/overview`: Overview of the ASR API and transcription modes.
  :::