A Lost Voice Regained Brain implants paired with neural network reconstruct speech for ALS patient

Published
Reading time
3 min read
A man with electrodes connected through his skull is connected to a machine.

A man who lost the ability to speak four years ago is sounding like his earlier self, thanks to a collection of brain implants and machine learning models.

What’s new: Researchers built a system that decodes speech signals from the brain of a man who lost the ability to speak clearly due to amyotrophic lateral sclerosis, also known as ALS, and enables him to speak through a synthetic version of his former voice. At the start of the study, his efforts to speak were intelligible only to his personal caregiver. Now he converses regularly with family and friends, The New York Times reported. Nicholas Card built the system with colleagues University of California-Davis, Stanford University, Washington University, Brown University, VA Providence Healthcare, and Harvard Medical School.

How it works: The authors surgically implanted four electrode arrays into areas of the brain that are responsible for speech. The system learned to decode the patient’s brain signals, decide the most likely phonemes he intended to speak, determine the words those phonemes express, and display and speak the words aloud using a personalized speech synthesizer. 

  • After the patient recovered from the implantation surgery, the authors collected data for training and evaluating the system. They recorded his brain signals while he tried to speak during 84 sessions, each between 5 and 30 minutes, over 32 weeks. The sessions were split into two tasks: copying, in which the patient spoke sentences shown on a screen, and conversation, in which he spoke about whatever he wanted. Initial sessions focused on copying. Later, when the authors had accrued paired brain signals and known sentences, they focused on conversation.
  • gated recurrent unit (GRU) learned to translate brain signals into a sequence of phonemes. The authors trained the model after each session on all recordings made during that session. To adapt it to day-to-day changes in brain activity, they also fine-tuned it during later sessions: After they recorded a new sentence, they fine-tuned the GRU on a 60/40 mix of sentences from the current session and previous sessions.
  • A weighted finite-state transducer (WFST), based on a pretrained 5-gram language model and described in the supplementary information here), translated sequences of phonemes into sentences. Given a sequence, it generated the 100 most likely sentences. 
  • Given the likely sentences, the authors ranked them according to the probability that the GRU, WFST, and OPT, a pretrained large language model, would generate them.  
  • A pretrained StyleTTS 2 text-to-speech model turned the highest-ranking sentence into speech. The authors fine-tuned the model on recordings of the patient’s voice from before the onset of his illness, such as podcasts.

Results: After two hours of recording the patient’s brain signals and training on that data, the system achieved 90.2 percent accuracy in the copying task. By the final session, the system achieved 97.5 percent accuracy and enabled the patient to speak on average 31.6 words per minute using a vocabulary of 125,000 words.

Behind the news: Previous work either had much lower accuracy or generated a limited vocabulary. The new work improved upon a 2023 study that enabled ALS patients to speak with 76.2 percent accuracy using a vocabulary of equal size. 

Why it matters: Relative to the 2023 study on which this one was based, the authors changed the positions of the electrodes in the brain and continued to update the GRU throughout the recording/training sessions. It’s unclear which changes contributed most to the improved outcome. As language models improve, new models potentially could act as drop-in replacements for the models in the authors’ system, further improving accuracy. Likewise, improvements in speech-to-text systems could increase the similarity between the synthetic voice and the patient’s former voice. 

We’re thinking: Enabling someone to speak again restores agency. Enabling someone to speak again in their own voice restores identity.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox