Whisper transcription means converting speech to text with OpenAI's Whisper — an open-source AI model you can run in the cloud, on a server, or entirely on your own device. This guide covers how Whisper works, which model size to pick, how accurate it really is, and the fastest way to run it offline on a Mac or iPhone.
What Is Whisper, Exactly?
Whisper is an automatic speech recognition (ASR) model that OpenAI released in September 2022 under the MIT license. It's an encoder-decoder transformer trained on over 680,000 hours of multilingual audio, and it handles transcription in roughly 100 languages plus translation to English.
The part that matters for you: the model weights are open. Unlike Google's or Amazon's speech APIs, Whisper doesn't have to run on someone else's server. An entire ecosystem exists to run it locally — whisper.cpp, faster-whisper, and native apps like Whisper Notes. That's what makes truly offline, private transcription possible.
Whisper Model Sizes: Which One to Use
Whisper comes in six main sizes. Bigger means more accurate and slower:
| Model | Parameters | Speed | Best for |
|---|---|---|---|
| tiny | 39M | Fastest | Quick drafts, weak hardware |
| base | 74M | Very fast | Simple, clean audio |
| small | 244M | Fast | Good speed/accuracy balance on mobile |
| medium | 769M | Moderate | Rarely the right pick today |
| large-v3 | 1.55B | Slowest | Maximum accuracy, difficult audio |
| large-v3-turbo | 809M | ~5x faster than large-v3 | The default choice in 2026 |
For almost everyone, large-v3-turbo is the answer: it keeps large-v3's encoder but cuts decoder layers from 32 to 4, delivering near-identical accuracy at a fraction of the compute. We benchmarked it in detail in Whisper Large V3 Turbo vs V3.
How Accurate Is Whisper Transcription?
On clean English audio, the large models reach a word error rate (WER) of roughly 5-8% — comparable to professional human transcription for most practical purposes. Accuracy drops with background noise, heavy accents, crosstalk, and low-resource languages.
Whisper's one famous failure mode: hallucinations during silence. Its autoregressive decoder sometimes invents repeated phrases or subtitle credits when nobody is speaking. Newer models fix this — NVIDIA's Parakeet V3 was explicitly trained on non-speech audio and produces zero hallucinations in our tests (full Parakeet V3 vs Whisper benchmark).
For Chinese, Japanese, Korean, and Cantonese, a specialized model beats Whisper on both speed and punctuation: see SenseVoice vs Whisper for CJK languages.
5 Ways to Run Whisper Transcription
| Method | Cost | Privacy | Setup |
|---|---|---|---|
| OpenAI API | Pay per audio minute | Audio uploaded | API key + code |
| openai-whisper (reference Python) | Free | 100% local | Python env, GPU recommended |
| whisper.cpp / faster-whisper | Free | 100% local | Command line |
| Native app (Whisper Notes) | $6.99 once, free trial on Mac | 100% on-device | None |
| Web demo tools | Free tiers | Audio uploaded | None |
The rule of thumb: if you live in a terminal, faster-whisper is excellent. If you're building a product, the API makes sense. If you just want your recordings transcribed privately without touching Python, use a native app — that's the entire reason Whisper Mac apps exist.
Weighing offline tools more broadly — including Windows and Android options? See our complete offline speech-to-text guide.
Whisper vs Newer Local Models (2026)
Whisper started the local transcription era, but it's no longer alone. Speeds below measured on an M4 Pro Mac:
| Model | Languages | Speed | Standout |
|---|---|---|---|
| Whisper Large V3 Turbo | 100+ | ~12x realtime | Widest language coverage |
| Parakeet V3 | 25 (European) | ~100x realtime | 6.32% WER, no silence hallucinations |
| SenseVoice Small | zh, ja, ko, yue, en | ~52x realtime | Best for Chinese, Japanese, Korean |
All three run locally in Whisper Notes, and you can switch per recording. Side-by-side benchmarks live on our Whisper models comparison page.
How to Run Whisper Transcription Offline on Mac & iPhone
No command line, no Python, no cloud:
- Download Whisper Notes for Mac (free trial) or for iPhone ($6.99 once).
- Pick a model: Whisper Large V3 Turbo for broad language coverage, Parakeet V3 for English speed, SenseVoice for CJK. It downloads once and then works forever offline.
- Record directly, dictate system-wide by holding Fn, or drop in audio and video files (MP3, WAV, M4A, MP4).
- Text streams in as it processes. Export as TXT or SRT.
Skeptical about "offline"? Turn on airplane mode first. Transcription runs at full speed — nothing is uploaded, ever.
Frequently Asked Questions
Is Whisper transcription free?
The model itself is free and open source (MIT license). Running it via command-line tools like whisper.cpp costs nothing but requires setup. OpenAI's API charges per audio minute. Native apps package the models for a small fee — Whisper Notes is $6.99 once, with a free trial on Mac.
Can Whisper transcription run offline?
Yes — that's the point of open weights. Once the model file is on your device, no internet is needed. Whisper Notes runs Whisper Large V3 Turbo on Apple Silicon via CoreML/Metal, fully offline. You can verify with airplane mode.
Which Whisper model is the most accurate?
large-v3 has the best raw accuracy. large-v3-turbo matches it within a fraction of a percent WER while running about 5x faster, which is why it's the default in most tools today.
Does Whisper support my language?
Whisper covers roughly 100 languages, strongest in high-resource ones (English, Spanish, German, French, etc.). For Chinese, Japanese, Korean, and Cantonese, SenseVoice delivers better punctuation and much higher speed on Apple Silicon.
Is there a Whisper transcription app for iPhone?
Yes. Whisper Notes runs Whisper models optimized for the iPhone's Neural Engine (iPhone 12 and newer) — record, import from Voice Memos or Files, and transcribe entirely on-device for $6.99, no subscription.