diarization-js playground

Booting

Loading models…

Initializing…

⬇

Drop an audio file here or

wav · mp3 · flac · m4a · ogg — anything Chrome can decode

Local processing

+ Transcribe with Whisper tiny.en runs after diarization, ~10-20 s extra on a 2 min file

How it works

diarization-js ports pyannote.audio's community-1 pipeline to TypeScript + ONNX so the entire diarization graph runs client-side. Three building blocks, same architecture as the Python reference:

Speaker segmentation — pyannote/segmentation-3.0 (5.6 MB ONNX). A 10 s window slides over the audio; the model emits per-frame powerset activations covering all subsets of up to 3 simultaneous speakers (max 2 concurrent). Argmax + a lookup table flatten that to a (frame × speaker) multilabel grid.
Speaker embedding — WeSpeaker ResNet34-LM (26.5 MB ONNX). For each (chunk, local-speaker) pair, the ResNet ingests Kaldi-compliant 80-mel fbank features (computed in pure TS, validated to 1e-4 vs torchaudio.compliance.kaldi.fbank) and emits a 256-d speaker fingerprint.
VBx clustering — direct port of pyannote.audio 4.0.4's utils/vbx.py (no HMM variant). Centroid-linkage AHC seeds Variational Bayes-x clustering with the pre-computed PLDA matrices shipped by the community-1 checkpoint. Convergence on ELBO; speakers with vanishing prior are pruned.

Reconstruction (sliding-window aggregation + top-K by instantaneous speaker count) merges per-chunk results into a single annotation. End-to-end DER vs the Python reference on a 7 min mono recording: 1.73 % (Hungarian optimal-mapping).

Use it as a developer

The library is browser- and Node-compatible. The same code runs in either runtime, you just provide the appropriate onnxruntime-* module:

npm install diarization-js onnxruntime-web

import * as ort from "onnxruntime-web/webgpu";
import { DiarizationPipeline } from "diarization-js";

const pipeline = await DiarizationPipeline.create({
  ort,
  segmentationModel: await fetch("/models/segmentation-3.0.onnx").then(r => r.arrayBuffer()),
  embeddingModel:    await fetch("/models/embedding-resnet34.onnx").then(r => r.arrayBuffer()),
  pldaParamsJson:    await fetch("/models/plda-params-vbx.json").then(r => r.json()),
  executionProviders: ["webgpu", "wasm"],
});

const { result, metrics } = await pipeline.run(samples, sampleRate);
// result.segments: Array<{ start, end, speaker }>
// metrics.rtf, metrics.totalProcessingMs, etc.

See the GitHub repo for the full source, the bench (apps/bench/) for batch evaluation, and the export scripts (scripts/export-models/) for rebuilding the ONNX artifacts from the upstream PyTorch checkpoints.

Licenses & credit

pyannote/segmentation-3.0 — MIT, by the pyannote.audio team.
pyannote/wespeaker-voxceleb-resnet34-LM — CC-BY-4.0; pyannote wrapper around WeSpeaker's pretrained checkpoint (not fine-tuned).
community-1 PLDA matrices — derived from BUT VBx (Landini et al., 2022), packaged by pyannote.
onnxruntime-web — MIT, by Microsoft.

audio file

—

diarization-js local

—

speakers

—

turns

—

wall time

—

RTF

waiting...

community-1 cloud

—

speakers

—

turns

—

wall time

—

RTF

waiting...

precision-2 cloud

—

speakers

—

turns

—

wall time

—

RTF

waiting...

Pipelines — timeline replay

Click anywhere on a track to seek; the playhead and active speaker(s) sync as the audio plays.

JSON output

—

Speaker diarization, head‑to‑head.

Loading models…

Benchmark across your tests

Recent tests

How it works

Use it as a developer

Licenses & credit

Changelog

Running diarization…

—

Side-by-side comparison

Agreement — precision-2 as reference

Pipelines — timeline replay

Transcript (idle)

JSON output

Loading models…

Benchmark across your tests

Recent tests

How it works

Use it as a developer

Licenses & credit

Changelog

Running diarization…

—

Side-by-side comparison

Agreement — precision-2 as reference

Pipelines — timeline replay

Transcript (idle)

JSON output

pyannoteAI API key