Direct Speech-to-Speech Translation

Published
Reading time
1 min read
Model architecture that generates English speech from Spanish speech

Systems that translate between spoken languages typically take the intermediate step of translating speech into text. A new approach shows that neural networks can translate speech directly without first representing the words as text.

What’s new: Researchers at Google built a system that performs speech-to-speech language translation based on an end-to-end model. Their approach not only translates, it does so in a rough facsimile of the speaker’s voice. You can listen to examples here.

How it works: Known as Translatotron, the system has three main components: An attentive sequence-to-sequence model takes spectrograms as input and generates spectrograms in a new language. A neural vocoder converts the output spectrograms into audio waveforms. And a pre-trained speaker encoder maintains the character of the speaker’s voice. Translatotron was trained end-to-end on a large corpus of matched spoken phrases in Spanish and English, as well as phoneme transcripts.

Why it matters: The architecture devised by Ye Jia, Ron J. Weiss, and their colleagues offers a number of advantages:

  • It retains the speaker’s vocal characteristics in the spoken output.
  • It doesn't trip over words that require no translation, such as proper names.
  • It delivers faster translations, since it eliminates a decoding step.
  • Training end-to-end eliminates errors that can compound in speech-to-text and text-to-speech conversions.

Results: The end-to-end system performs slightly below par translating Spanish to English. But it produces more realistic audio than previous systems and plants a stake in the ground for the end-to-end approach.

The hitch: Training it requires an immense corpus of matched phrases. That may not be so easy to come by, depending on the languages you need.

Takeaway: Automatic speech-to-speech translation is a sci-fi dream come true. Google’s work suggests that such systems could become faster and more accurate before long.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox