SERVERS

oxygen/Getty Images

Meta, owner of Facebook, Instagram, and WhatsApp, on Tuesday unveiled its latest effort in machine translation, this one geared toward speech translation.

The program, SeamlessM4T, surpasses existing models that are trained specifically for speech-to-speech translation between languages, as well as models that convert between speech and text in multiple language pairs. Hence, SeamlessM4T is an example not just of generality but of what is called multi-modality -- the ability for one program to operate on multiple data types, in this case, both speech and text data.

Also: Meta to release open-source commercial AI model to compete with OpenAI and Google

Previously, Meta has focused on large language models that can translate text between 200 different languages. That focus on text is a problem, say lead author Lo?c Barrault and colleagues at both Meta and UC California at Berkeley.

"While single, unimodal models such as No Language Left Behind (NLLB) push text-to-text translation (T2TT) coverage to more than 200 languages, unified S2ST [speech-to-speech-to-text] models are far from achieving similar scope or performance," write Barrault and team.

The formal paper, "SeamlessM4T -- Massively Multilingual & Multimodal Machine Translation," is posted on Meta's dedicated site for the overall project, Seamless Communication. There is also a companion GitHub site.

Speech has been left behind partly because less speech data is readily available in the public domain to train neural networks, write the authors. But there's a deeper point: Speech data is fundamentally richer as a signal for neural networks.

"The very challenge around why speech is harder to tackle from a machine translation standpoint -- that it encodes more information and expressive components -- is also why it is superior at conveying intent and forging stronger social bonds between interlocutors," they write.

The goal of SeamlessM4T is to create one program that is trained on both speech data and text data at the same time. The "M4T" stands for "Massively Multilingual & Multimodal Machine Translation." Multi-modality is an explicit part of the program.

Also: Meta's latest AI model will make content available in hundreds of languages

Such a program is sometimes referred to as an "end-to-end" program because it doesn't break up the parts that are about text and the parts that are about speech into separate functions, as in the case of "cascaded models," where the program first is trained on one thing, such as speech to text, and then another thing, such as speech to speech.

As the program's authors put it, "most S2ST [speech-to-speech translation] systems today rely heavily on cascaded systems composed of multiple subsystems that perform translation progressively -- e.g., from automatic speech recognition (ASR) to T2TT [text-to-text translation], and subsequently text-to-speech (TTS) synthesis in a 3-stage system."

Instead, the authors built a program that combines multiple existing parts trained together. They included "SeamlessM4T-NLLB a massively multilingual T2TT model," plus a program called w2v-BERT 2.0, "a speech representation learning model that leverages unlabeled speech audio data," plus T2U, "a text-to-unit sequence-to-sequence model," and multilingual HiFi-GAN, a "unit vocoder for synthesizing speech from units."

Also: Meta's 'data2vec' is a step toward One Neural Network to Rule Them All

All four components are plugged together like a Lego set into a single program, also introduced this year by Meta, called UnitY, which can be described as "a two-pass modeling framework that first generates text and subsequently predicts discrete acoustic units."

The whole organization is visible in the diagram below.

The authors built a program that combines multiple existing parts trained together, all of which are plugged together like a Lego set in a single program.

Meta AI Research 2023

The program manages to do better than multiple other kinds of programs on tests of speech recognition, speech translation, and speech-to-text, the authors report. That includes beating both taint programs that are also end-to-end, as well as programs designed for speech explicitly:

We find that SeamlessM4T-Large, the larger model of the two we release, outper- forms the previous state-of-the-art (SOTA) end-to-end S2TT model (AudioPaLM-2-8B- AST [Rubenstein et al., 2023]) by 4.2 BLEU points on Fleurs [Conneau et al., 2022] when translating into English (i.e., an improvement of 20%). Compared to cascaded mod- els, SeamlessM4T-Large improves translation accuracy by over 2 BLEU points. When translating from English, SeamlessM4T-Large improves on the previous SOTA (XLS- R-2B-S2T [Babu et al., 2022]) by 2.8 BLEU points on CoVoST 2 [Wang et al., 2021c], and its performance is on par with cascaded systems on Fleurs. On the S2ST task, SeamlessM4T-Large outperforms strong 3-stage cascaded models (ASR, T2TT and TTS) by 2.6 ASR-BLEU points on Fleurs. On CVSS, SeamlessM4T-Large outperforms a 2-stage cascaded model (Whisper-Large-v2 + YourTTS [Casanova et al., 2022]) by a large margin of 8.5 ASR-BLEU points (a 50% improvement). Preliminary human evalua- tions of S2TT outputs evinced similarly impressive results. For translations from English, XSTS scores for 24 evaluated languages are consistently above 4 (out of 5); for into English directions, we see significant improvement over Whisper-Large-v2's baseline for 7 out of 24 languages.

Also: Google's 'translation glasses' were actually at I/O 2023, and right in front of our eyes

The companion GitHub site offers not just the program code but also SONAR, a new technology for "embedding" multi-modal data, and BLASAR 2.0, a new version of a metric by which to automatically evaluate multi-modal tasks.

Artificial Intelligence

Generative AI will far surpass what ChatGPT can do. Here's everything on how the tech advancesChatGPT's new web browsing feature is a big disappointment. Use this plugin insteadWhat is Amazon Bedrock? 4 ways it can help businesses use generative AI toolsCan generative AI solve computer science's greatest unsolved problem?

Generative AI will far surpass what ChatGPT can do. Here's everything on how the tech advances
ChatGPT's new web browsing feature is a big disappointment. Use this plugin instead
What is Amazon Bedrock? 4 ways it can help businesses use generative AI tools
Can generative AI solve computer science's greatest unsolved problem?

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

Meta unveils 'Seamless' speech-to-speech translator

Artificial Intelligence

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVERS

HOT NEWS

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Huawei S6730-S24X6Q: Advanced Ethernet Switch for Modern Networks

Comprehensive Guide to the S6730-H48X6C-V2 High-Performance Switch

Huawei CloudEngine S6730-H28Y4C: High-Performance Switch for Modern Networks

Overview of the S6730-H24X6C-V2

Unveiling the Huawei CloudEngine S6730 Series: Advanced Switching for Modern Networks

Huawei S6730-H48X6C: A Comprehensive Overview

Comprehensive Guide to Huawei S6730-H24X6C

Huawei Switches Visio Stencils

Meta unveils 'Seamless' speech-to-speech translator

Artificial Intelligence

Hot Tags : Artificial Intelligence Innovation

Ordering Guide

Resources

About Us

Introduction to Huawei CloudEngine S6730-H Series Switches