What is Whisper Large V3 Turbo?

Whisper Large V3 Turbo is a faster variant of Whisper Large V3. It keeps the same encoder, reduces decoder layers from 32 to 4, and drops parameters from 1.55B to 809M for much faster transcription with near-identical word error rate.

How much faster is Whisper Large V3 Turbo than V3?

In Whisper Notes benchmarks on Apple Silicon, Whisper Large V3 Turbo processes the same audio about 5x faster than Large V3. Community GPU benchmarks commonly show a smaller but still meaningful speedup.

Does Whisper Large V3 Turbo support translation?

No. Whisper Large V3 Turbo was trained for transcription, not the audio-to-English translation task. Whisper Notes handles translation separately after transcription.

Whisper Large V3 Turbo vs V3: 5× Faster on Mac (Benchmark)

OpenAI's Whisper Large-v3 Turbo cuts the decoder from 32 layers to 4, dropping parameters from 1.55B to 809M. The result: 2–5× faster transcription with near-identical accuracy. Whisper Notes ships it on Mac with Apple Silicon.

Whisper Large V3 Turbo vs V3 architecture comparison

V3 Turbo vs V3: What Changed

Turbo is not a new architecture. It's the exact same Whisper Large-v3 model with the decoder pruned from 32 layers to 4, then fine-tuned to recover accuracy. The encoder is untouched.

	Large-v3 Turbo	Large-v3
Parameters	809M	1,550M
Decoder layers	4	32
Languages	99	99
Translation task	Not supported	Supported
License	MIT	Apache 2.0

The translation task was explicitly excluded from Turbo's training data. The full Large-v3 model supports it, but Whisper Notes ships Turbo only — translation is handled separately via Apple Intelligence.

The Base Model: What Is Whisper Large-v3?

Whisper Large-v3 is OpenAI's flagship open-source speech recognition model, released in November 2023. It has 1.55B parameters, uses a 128 mel-bin spectrogram input, was trained on 5 million hours of audio (1M weakly labeled + 4M pseudo-labeled), and supports 99 languages including Cantonese. On the Hugging Face Open ASR Leaderboard it averages ~7.4% word error rate — the accuracy ceiling that Turbo is measured against throughout this article. For how Large-v3 stacks up against every other on-device model, see our Whisper models comparison.

Speed Benchmark: Whisper Notes on Apple Silicon

In Whisper Notes for Mac, Turbo runs via CoreML on the Neural Engine. Processing 10 minutes of audio:

Device	Whisper V3	V3 Turbo	Speedup
iPhone 15 Pro	425 s	82 s	5.2×
iPad Pro M2	380 s	71 s	5.4×
MacBook Pro M2	316 s	63 s	5.0×

The 5× speedup is specific to Whisper Notes on Apple Silicon, where the smaller decoder benefits from Neural Engine optimization. On GPU with frameworks like faster-whisper, the gap narrows to ~2.7× (see community benchmarks below).

Accuracy: WER Comparison

The Hugging Face Open ASR Leaderboard tests both models on the same English datasets. Turbo's word error rate is within half a point of V3 across every benchmark:

Dataset	V3 Turbo WER	V3 WER
LibriSpeech Clean	2.10%	2.01%
LibriSpeech Other	4.24%	3.91%
GigaSpeech	10.14%	10.02%
Earnings22	11.63%	11.29%
AMI	16.13%	15.95%
Mean WER	7.83%	7.44%

V3 is slightly more accurate on every dataset, but the gap is small — 0.39 percentage points on average. For most real-world transcription, you won't hear the difference.

On the YouTube-commons long-form evaluation (one of the largest open-source ASR benchmarks), Turbo scores 13.40% WER vs V3's 13.20% — while running at 129.5× real-time factor vs 55.3×. That's 2.3× faster with nearly identical accuracy on real-world audio.

How Accurate Is Turbo in Korean, Russian, and Other Languages?

The benchmarks above are English. Per OpenAI's model card, Turbo's pruned 4-layer decoder costs slightly more accuracy in non-English languages than in English, with the largest degradation on lower-resource languages. For Russian and most European languages, Turbo stays close to full Large-v3 — and if you're on Whisper Notes, Parakeet V3 covers Russian and 24 other European languages at 10× Whisper's speed.

For Korean, Japanese, Chinese, and Cantonese, a purpose-built model is both faster and better-punctuated: SenseVoice transcribes CJK at 52× real-time. Whisper Notes ships SenseVoice alongside Turbo on both Mac and iOS, so you can pick the right model per language instead of forcing everything through one.

Community Benchmarks: GPU & CPU

Independent benchmarks from the faster-whisper and whisper.cpp communities show consistent results across hardware. Transcribing 13 minutes of audio with faster-whisper on GPU:

Model	Precision	Time	GPU Memory	WER
Large-v3 Turbo	fp16	19.2 s	2,537 MB	1.92%
Large-v3	fp16	52.0 s	4,521 MB	2.88%
Large-v3 Turbo	int8	19.6 s	1,545 MB	1.92%
Distil-Large-v3	fp16	26.1 s	2,409 MB	2.39%

Source: faster-whisper benchmark on NVIDIA GPU, LibriSpeech clean validation split. Turbo int8 uses only 1.5 GB VRAM — it fits on a 2 GB GPU.

Batched inference on an RTX 3060 Laptop (6 GB VRAM, int8 precision) pushes the advantage further:

Model	Sequential	Batched (10)	Batched WER
Large-v3 Turbo	46.1 s	18.7 s	7.7%
Large-v3	230.8 s	43.0 s	7.9%
Large-v2	178.3 s	43.2 s	8.8%
Medium	113.3 s	26.3 s	8.9%

Source: NilaierMusic benchmark, Intel i7-12650H + RTX 3060 Laptop 6 GB, French audio, int8 precision.

With batched processing, Turbo achieves the best WER of any model tested (7.7%) while being the fastest. It's the clear sweet spot for production use.

Turbo vs Medium vs Every Whisper Model Size

Before Turbo, Medium was the usual compromise: acceptable accuracy at tolerable speed. Turbo makes that trade-off obsolete — at 809M parameters it is barely larger than Medium (769M), yet delivers large-class accuracy at several times the speed. Here's the full model family side by side:

Model	Parameters	Disk Size	Relative Speed	Accuracy Tier
tiny	39M	~75 MB	~10×	Lowest
base	74M	~142 MB	~7×	Low
small	244M	~466 MB	~4×	Moderate
medium	769M	~1.5 GB	~2×	High
large-v3	1,550M	~2.9 GB	1× (baseline)	Highest
large-v3-turbo	809M	~1.6 GB	~8×	Near-highest

Released September 30, 2024, Turbo is 809M parameters — smaller than you'd expect for large-class accuracy. If you were choosing Medium to save disk space or speed, Turbo now beats it on both accuracy and speed at roughly the same footprint.

Known Limitations (and How Whisper Notes Handles Them)

No built-in translation

Turbo was trained without translation data. It transcribes in the source language only — unlike Large-v3, which supports audio→English translation.

Whisper Notes — Apple Intelligence auto-translates transcripts into your chosen language, giving you bilingual output regardless of which model you use.

More hallucination on noisy audio

Community reports indicate Turbo hallucinates more on very short clips or noisy recordings vs V3. Expected given the reduced decoder (4 layers vs 32).

Whisper Notes — runs Pyannote VAD before transcription, detecting speech segments and stripping silence/noise so the model only processes real voice.

Which Model Should You Use?

English / European	Parakeet V3 — 10× faster than Whisper, better accuracy
Chinese / Japanese / Korean	SenseVoice — purpose-built for CJK, 52× speed
Other languages	Whisper Large V3 Turbo — 99 languages, high accuracy, slower

Whisper Large-v3 Turbo FAQ

What is the difference between Whisper Large-v3 and Large-v3 Turbo?

Large-v3 Turbo keeps the Large-v3 encoder but reduces the decoder from 32 layers to 4. That is why it is much faster while staying close to Large-v3 accuracy for transcription. The trade-off is that Turbo does not support Whisper's built-in translation task.

Does faster-whisper support Large-v3 Turbo?

Yes. faster-whisper supports Large-v3 Turbo through CTranslate2 conversions, and community benchmarks show Turbo is a strong choice when VRAM is limited. In the benchmark above, Turbo int8 used about 1.5 GB VRAM.

Does whisper.cpp support Large-v3 Turbo?

Yes. whisper.cpp can run converted GGML/GGUF versions of Whisper Large-v3 Turbo. If you are building your own local transcription pipeline, Turbo is often easier to fit on consumer hardware than full Large-v3.

Where can I download openai/whisper-large-v3-turbo?

The official model weights are available from OpenAI on Hugging Face. Whisper Notes users do not need to download them manually: the Mac app handles local model setup through the app interface.

Comparing all the local options? Every on-device speech-to-text model — Whisper variants, Parakeet V3, SenseVoice, and Voxtral — is compared side by side on our Whisper models comparison page. New to Whisper itself? Start with the Whisper Transcription Guide — what the model is, every way to run it, and what it costs.

Download for iOS

Download for macOS