Back to Blog

Introducing Whisper Large V3 Turbo

2024-11-06
6 min read

OpenAI has released Whisper large-v3-turbo, a groundbreaking optimization of the Whisper large-v3 model. This new variant promises to revolutionize speech recognition by maintaining impressive accuracy while delivering significantly faster inference times.

Whisper V3 Turbo Architecture

Whisper Large V3 Turbo Architecture Overview

Architecture & Improvements

The key innovation in Whisper Large V3 Turbo lies in its streamlined architecture. The model reduces the number of decoder layers from 32 to just 4, matching the tiny model's decoder size while maintaining remarkable accuracy. This approach was inspired by Distil-Whisper's findings that a smaller decoder can significantly boost speed without compromising performance.

Key Technical Features

  • • Reduced decoder layers (32 → 4)
  • • Significantly faster inference speed
  • • Smaller model size (809M vs 1550M parameters)
  • • Retrained on original Whisper dataset
  • • Compatible with Whisper Large V3 inference code
Whisper Model Comparison

Performance Comparison Between Whisper Variants

Performance Analysis

The model achieves inference speeds between tiny and base models while maintaining accuracy levels comparable to Large V2. However, it's worth noting that there are some trade-offs:

  • • Translation performance has decreased due to excluded translation data during training
  • • Slight accuracy drops in certain languages (e.g., Thai and Cantonese)
  • • Maintains strong performance for major languages like English and Japanese

Technical Capabilities

The model retains core Whisper features:

  • ✨ Multilingual transcription support
  • 🚀 Timestamp prediction
  • 📊 Zero-shot transfer capabilities
  • ⚡️ Batch processing support

Current Status in Whisper Notes

While Whisper Large V3 Turbo represents an exciting advancement in speech recognition technology, Whisper Notes currently does not support this model due to its significant size and resource requirements for mobile devices. We are actively monitoring its development and optimization progress, and we plan to integrate it into our app once further speed improvements make it more suitable for mobile deployment.

In the meantime, Whisper Notes continues to use our optimized models that provide the best balance of accuracy and performance for mobile users. Stay tuned for updates as we work to bring you the latest advancements in speech recognition technology!