The first family of open science models for speech recognition and speech translation

A speech recognition and translation system developed entirely from scratch—without relying on pre-trained models from major tech companies and built exclusively using open-source data and tools. This is the achievement of SpeechTek and Machine Translation, two research units at Fondazione Bruno Kessler, developed through the project “FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian.” The project stands out for its innovative approach, clear vision, and significant impact, and is part of the broader efforts of the FAIR Foundation – Future Artificial Intelligence Research.
The true innovation lies not only in the model’s performance but in its complete transparency: it was trained on over 150,000 hours of freely available audio data, all distributed under permissive licenses. In addition to this open data, the team created a large volume of “synthetic data”—automatically generated transcriptions and translations in English and Italian—specifically for the project, and released through the MOSEL dataset.
“All the code, data, and procedures are fully public and well-documented, enabling anyone to replicate or build upon the system. The expertise developed through this collaborative effort, along with the model’s potential applications and ongoing evolution, makes FAMA a valuable asset for FBK,”explain Alessio Brutti, head of the SpeechTek unit, and Luisa Bentivogli, head of the Machine Translation unit.