26 February 2025 / NEWS

ElevenLabs' Breakthrough in Speech-to-Text Technology is called Scribe

ElevenLabs has introduced Scribe, a state-of-the-art Automatic Speech Recognition (ASR) model that transcribes speech in 99 languages with unparalleled accuracy.

ElevenLabs unveiled Scribe, their inaugural Speech-to-Text model, acclaimed as the world's most accurate transcription system. Engineered to handle the unpredictability of real-world audio, Scribe transcribes speech across 99 languages, offering features such as word-level timestamps, speaker diarization, and audio-event tagging, all delivered in a structured format for seamless integration.

Scribe's precision is evident through its performance in benchmark tests. In both FLEURS and Common Voice assessments across 99 languages, Scribe consistently outperformed leading models like Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova-3. Notably, it achieved the lowest automated transcription word error rates in Italian (98.7%), English (96.7%), and 97 other languages, underscoring its exceptional accuracy.

One of Scribe's significant contributions is its ability to make ASR universally accessible. It dramatically reduces errors in traditionally underserved languages such as Serbian, Cantonese, and Malayalam, where competing models often exhibit word error rates exceeding 40%. This advancement ensures more inclusive and accurate transcription services across diverse linguistic communities.

Developers can integrate Scribe into their applications via ElevenLabs' Speech-to-Text API, receiving structured JSON transcripts that include speaker diarization, word-level timestamps, and non-speech event markers like laughter. A low-latency version tailored for real-time applications is slated for future release, expanding Scribe's utility in various contexts.

Creators and businesses also stand to benefit from Scribe's capabilities. Through the ElevenLabs dashboard, users can upload audio or video files and generate formatted transcripts, streamlining content creation processes such as meeting summaries, movie subtitles, or even song lyrics. This user-friendly interface ensures that Scribe's advanced features are accessible to a broad audience.

In summary, Scribe represents a significant advancement in speech-to-text technology. Its unparalleled accuracy across a vast array of languages, coupled with features like speaker diarization and audio-event tagging, positions it as a valuable tool for developers, creators, and businesses seeking reliable transcription solutions. By making ASR more accessible and reducing errors in underserved languages, Scribe contributes to a more inclusive and efficient digital communication landscape.

ElevenLabs' Breakthrough in Speech-to-Text Technology is called Scribe

The Next Generation of the Phi Family

Alexa+, Amazon's Next-Generation AI Assistant

Subscribe to Kavour

Subscribe to Kavour