Meta has created an AI model, ‘SeamlessM4T,’ that can translate and transcribe close to 100 languages across text and speech
Aug. 24, 2023.
1 min. read Interactions
First all-in-one multilingual multimodal AI translation and transcription model," says Meta
The model is “the first all-in-one multilingual multimodal AI translation and transcription model,” Meta claims.
“It can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task … without having to first convert to text behind the scenes, among other. We’re developing AI to eliminate language barriers in the physical world and in the metaverse.”
“In keeping with our approach to open science, we’re publicly releasing SeamlessM4T under a research license to allow researchers and developers to build on this work. We’re also releasing the metadata of SeamlessAlign, the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments.”
“Beyond the wealth of commercial services and open source models already available from Amazon, Microsoft, OpenAI and a number of startups, Google is creating what it calls the Universal Speech Model, a part of the tech giant’s larger effort to build a model that can understand the world’s 1,000 most-spoken languages,” says TechCrunch.
“Tech Mozilla meanwhile spearheaded Common Voice, one of the largest multi-language collections of voices for training automatic speech recognition algorithms. But SeamlessM4T is among the more ambitious efforts to date to combine translation and transcription capabilities into a single model.
In developing it, Meta says that it scraped publicly available text (on the order of ‘tens of billions’ of sentences) and speech (4 million hours) from the web.”