In today’s connected world, language should never be a barrier. Yet, out of more than 7,000 languages spoken worldwide, only a small fraction are supported by artificial intelligence. This gap has left millions of people unable to fully benefit from advanced tools like speech recognition and translation.
Now, NVIDIA is stepping in to close that gap. The tech giant has released a massive new dataset and two powerful models aimed at boosting multilingual speech AI. These tools support 25 European languages, including less common ones such as Croatian, Estonian, and Maltese.
For developers, this is more than just another AI release it’s a set of building blocks to create faster, more accurate, and more inclusive speech technology. One of the biggest problems in AI has been the lack of data for smaller languages.
While English, French, and Spanish enjoy huge datasets, many languages are underrepresented. Without enough training data, speech AI systems can’t accurately understand or translate them.
NVIDIA’s new release aims to change that. It includes, Granary Dataset Over 1 million hours of multilingual audio, with 650,000 hours for speech recognition and 350,000 hours for speech translation.
NVIDIA Canary 1b v2 A billion parameter model for high quality transcription and translation between English and 24 European languages. NVIDIA Parakeet tdt 0.6b v3 A lighter, faster model for real time or large scale transcription tasks.
All of these are now freely available on Hugging Face, making them accessible to researchers, businesses, and independent developers.
Why Granary Is Different
Creating speech datasets is not just about recording people talking. The audio must be cleaned, labeled, and structured in a way that AI can learn from. For smaller languages, this process is often too expensive and time consuming.
To solve this, NVIDIA worked with Carnegie Mellon University and Fondazione Bruno Kessler. They used the NVIDIA NeMo Speech Data Processor toolkit to turn large amounts of unlabeled audio into high quality, structured data without requiring huge amounts of manual work.
Dr. Elena Tamburini, a computational linguist, explains, Transforming raw speech into usable AI training data is one of the most challenging steps. NVIDIA’s automated approach with Granary is a breakthrough for languages with limited resources. Better multilingual speech AI has endless possibilities. Here are a few examples.
Customer Service Voice agents can serve customers in their native language with better accuracy. Education Schools can offer real time translations so students can learn in the language they’re most comfortable with. Healthcare Doctors can communicate effectively with patients from different language backgrounds. Emergency Services In crises, accurate translation can save lives.
A European fintech company struggled to provide voice based support in Maltese because existing AI tools misunderstood words and phrases. After adopting NVIDIA’s Parakeet model, transcription accuracy jumped to 90%, and call resolution time dropped by 25%.

The Two New Models
Canary 1b v2 is designed for maximum accuracy. It’s ideal for important tasks like medical or legal transcription, international meetings, and high quality media translations.
Parakeet tdt 0.6b v3 is smaller and faster, making it perfect for live events, real time captions, and high volume call centers. By offering both options, NVIDIA is giving developers the flexibility to choose the right balance between speed and precision.
This release is not just about Europe. The techniques used to create Granary can be applied to other regions and languages around the world. By focusing on automation and open access, NVIDIA is showing that AI can be inclusive and practical at the same time.
As someone who has worked on speech recognition for less common languages, I’ve seen how frustrating it can be when a system constantly misunderstands a speaker. This technology could finally change that, giving smaller languages the same AI support as global ones.
Of course, there are still obstacles, Dialects and Accents Even within one language, variations can affect accuracy. Ethics and Privacy Real time speech translation could be misused for surveillance. Computing Costs Running large AI models can still be expensive for small companies.
Voices from the AI Community
Prof. Samuel Richter, an AI researcher, believes this is just the beginning, We’re moving toward a future where multilingual speech AI is a core part of everyday life. Open datasets like Granary will make that future come faster.
Linda Carvajal, a voice AI entrepreneur, adds, Within a few years, near perfect speech recognition will be available for hundreds of languages. NVIDIA’s release is a big push toward that goal.
The release of the Granary dataset and the Canary and Parakeet models marks a significant step toward a truly connected world. By making multilingual speech AI more accurate and more accessible, NVIDIA is enabling developers to build tools that can break language barriers for millions of people.
If the global AI community uses these tools responsibly, we could soon see a future where every language, no matter how small, is represented in the digital space. Language is more than just words it’s culture, identity, and connection.
With Granary, NVIDIA is not only advancing technology but also helping preserve linguistic diversity in an increasingly digital world. The next time you speak to someone in another language and get a perfect real time translation, it might just be powered by this very breakthrough.