Python API

The core inference class is Muaalem in src/quran_muaalem/inference.py. It runs the multi‑level CTC model and returns phoneme predictions plus per‑phoneme sifat attributes.

Class signature

python

class Muaalem:
    def __init__(
        self,
        model_name_or_path: str = "obadx/muaalem-model-v3_2",
        device: str = "cpu",
        dtype=torch.bfloat16,
    ):
        ...

    @torch.no_grad()
    def __call__(
        self,
        waves: list[list[float] | torch.FloatTensor | NDArray],
        ref_quran_phonetic_script_list: list[QuranPhoneticScriptOutput],
        sampling_rate: int,
    ) -> list[MuaalemOutput]:
        ...

Inputs

1) Audio (`waves`)

A list of waveforms (batch).
Each waveform can be:
- list[float]
- torch.FloatTensor
- numpy.ndarray
Required sample rate: 16000 Hz. The implementation raises ValueError if it is not 16000.

2) Reference phonetic scripts (`ref_quran_phonetic_script_list`)

A list of QuranPhoneticScriptOutput objects.
Generate them using quran_transcript.quran_phonetizer(..., remove_spaces=True) to match the expected alignment.

Example reference generation:

python

from quran_transcript import Aya, quran_phonetizer, MoshafAttributes

uthmani_ref = Aya(8, 75).get_by_imlaey_words(17, 9).uthmani
moshaf = MoshafAttributes(
    rewaya="hafs",
    madd_monfasel_len=4,
    madd_mottasel_len=4,
    madd_mottasel_waqf=4,
    madd_aared_len=4,
)
ref = quran_phonetizer(uthmani_ref, moshaf, remove_spaces=True)

Outputs

The call returns list[MuaalemOutput] (one per input waveform). See src/quran_muaalem/muaalem_typing.py:

Unit — decoded sequence with text, probs, and ids.
Sifa — per‑phoneme group of phonetic attributes (SingleUnit | None).
MuaalemOutput — container with phonemes and sifat.

For a detailed schema and example output, see Outputs.

Minimal example

python

from librosa.core import load
import torch
from quran_transcript import Aya, quran_phonetizer, MoshafAttributes
from quran_muaalem import Muaalem

sampling_rate = 16000
wave, _ = load("./assets/test.wav", sr=sampling_rate, mono=True)

uthmani_ref = Aya(8, 75).get_by_imlaey_words(17, 9).uthmani
moshaf = MoshafAttributes(
    rewaya="hafs",
    madd_monfasel_len=4,
    madd_mottasel_len=4,
    madd_mottasel_waqf=4,
    madd_aared_len=4,
)
ref = quran_phonetizer(uthmani_ref, moshaf, remove_spaces=True)

model = Muaalem(device="cuda" if torch.cuda.is_available() else "cpu")
outs = model([wave], [ref], sampling_rate=sampling_rate)
print(outs[0].phonemes.text)

Error handling and edge cases

sampling_rate != 16000 → ValueError.
If alignment lengths mismatch, the decoder may insert padding tokens; some Sifa attributes can be None.
Model runs in torch.no_grad() by design (inference only).

Performance notes

Default dtype is torch.bfloat16. You can override it (e.g., torch.float16) if your GPU does not support BF16.
The model is loaded on construction; reuse the same Muaalem instance for multiple calls to avoid reload cost.

Python API ​

Class signature ​

Inputs ​

1) Audio (waves) ​

2) Reference phonetic scripts (ref_quran_phonetic_script_list) ​

Outputs ​

Minimal example ​

Error handling and edge cases ​

Performance notes ​

Python API

Class signature

Inputs

1) Audio (`waves`)

2) Reference phonetic scripts (`ref_quran_phonetic_script_list`)

Outputs

Minimal example

Error handling and edge cases

Performance notes