New AI noise-canceling headphone tech could let you pick which sounds you hear
Nov. 09, 2023.
2 min. read Interactions
“Semantic hearing” deep learning algorithms could allow you to hear selected sounds in real time
If you’ve used noise-canceling headphones, you know that hearing the right noise at the right time can be vital. You might want to block car horns when working indoors, but not when walking along busy streets. This feature is not currently available.
So a team led by researchers at the University of Washington has developed deep-learning algorithms that could let you pick which sounds filter through your headphones in real time. By using voice commands or a smartphone app, you could select which sounds you want to hear in 20 classes, such as sirens, baby cries, speech, vacuum cleaners and bird chirps.
“Understanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today’s noise-canceling headphones haven’t achieved,” explained senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering.
“The sounds headphone wearers hear need to sync with their visual senses. You can’t be hearing someone’s voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second.”
So instead of cloud servers, a local device such as a connected smartphone is required. Also, sounds from different directions arrive in your ears at different times, so the system must preserve these delays and other spatial cues.
Training on real-world data
Tested in environments such as offices, streets and parks, the system was able to extract sirens, bird chirps, alarms and other target sounds while removing all other real-world noise. When 22 participants rated the system’s audio output for the target sound, they said that on average the quality improved, compared to the original recording.
In some cases, the system struggled to distinguish between sounds that share many properties, such as vocal music and human speech. The researchers note that training the models on more real-world data might improve these outcomes.
Additional co-authors on the paper were Bandhav Veluri and Malek Itani, both UW doctoral students in the Allen School; Justin Chan, who completed this research as a doctoral student in the Allen School and is now at Carnegie Mellon University; and Takuya Yoshioka, director of research at AssemblyAI.
Citation: Bandhav Veluri, Malek Itani,Justin Chan, Takuya Yoshioka, Shyamnath Gollakota. 29 October 2023. Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables. UIST ’23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology https://doi.org/10.1145/3586183.3606779 (open-access)