MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a deep-learning system that can watch (yes, watch - read on) a video of a musical performance and isolate the sounds of specific instruments.
The ‘PixelPlayer' system was trained on more than 60 hours of videos, and is now capable of identifying instruments ‘at pixel level' after viewing a video just one time. It can then extract the sounds associated with those instruments and play them back, without any other noise.
At the moment PixelPlayer can identify sounds from more than 20 instruments, which could be increased with more training data.
The system uses three neural networks: one to analyse the visuals of the video; one to analyse the audio; and a third ‘synthesiser', which associates specific pixels with specific soundwaves to separate the different sounds.
So far we have only seen videos of two instruments being played at once, so we do wonder how the system would handle a larger-scale performance like an orchestra.
Lead author Hang Zhao said that the system may have trouble handling subtle differences between subclasses of instruments (such as an alto sax versus a tenor).
The system can adjust the volume of individual instruments after the fact, which has implications for cleaning up old recordings, or previewing how certain instruments sound in a new composition.
A similar system could even be used on robots, to better understand environmental sounds made by animals or vehicles.
In the past, attempts to isolate sounds have focused on audio alone, which MIT says ‘often requires extensive human labeling'; but PixelPlayer's reliance on vision makes labelling unnecessary.
The system locates the image regions that produce sounds (implying that concert recordings would be a much more difficult proposition; we have asked MIT about this), then separates the input sounds into a set of components that represent the sound from each pixel.
The downside to this ‘self-supervised' deep learning is that MIT doesn't exactly understand how PixelPlayer learns which instruments make which sounds.
Cotton seedling freezes to death as Chang'e-4 shuts down for the Moon's 14-day lunar night
Fortnite easily out-earns PUBG, Assassin's Creed Odyssey and Red Dead Redemption 2 in 2018
Meteor showers as a service will be visible for about 100 kilometres in all directions
Saturn's rings only formed in the past 100 million years, suggests analysis of Cassini space probe data
New findings contradict conventional belief that Saturn's rings were formed along with the planet about 4.5 billion years ago