Dr. Paris Smaragdis
The MIT Technology Review article Hearing Machines: While hearing in machines lags far behind vision in machines, the potential is great, and researchers are beginning to make impressive progress said
Our knowledge of human hearing is relatively limited. We know how our ears work, but we mostly improvise in our descriptions as neural signals move deeper into the brain. The study of auditory psychology is not even close to where we want it to be. Machine learning, AI, and classical computer-science algorithms are deeply rooted in a visual way of thinking that does not extend naturally to reasoning about sound. Our own ability to describe sounds and the process of hearing is predominantly limited to vocabulary developed for music. These problems share the common cause that hearing (whether human or machine) is not something that has attracted adequate attention.
Because of this, the process of creating a new technology in this field from finding bibliographical references and abstracting to simpler problems, to actually explaining the point of it all in a business or technical meeting is a fight against the unknown. Things are getting better, though: in the past few years an increasing number of researchers interested in computational perception have started showing interest in hearing (as well as in the senses taste and olfaction), and we have seen some amazing progress in our field as well as the slow emergence of relevant products in the mainstream.
So keep your mind and ears open. You might not see much of hearing machines today, but you’ll be hearing about them soon.
Dr. Paris Smaragdis was the author of this article and is a
research scientist at
Mitsubishi Electric Research Laboratories (MERL).
His main interests are
auditory scene analysis and self-organizing computational perception.
Before coming to MERL he was a postdoctoral associate at MIT, where he
also obtained his PhD degree in perceptual computing. His most recent
work has been on sound source separation, multimodal statistics and
His goal is to make computational audition as mainstream as computer vision. He most famously did some of the earliest work on source separation using ICA in the frequency domain, and some more work on self-organizing auditory perception (preprocessing, grouping and scene analysis). His recent projects include Acoustic Doppler for Denoising Speech Signals, Audio-Assisted Cameras, MPEG-7 Music Player, Secret Audio, Sound-Based Traffic Incident Detection, and Sound Spotter: Recognition and Extraction from Mixed Audio.
Paris authored Convolutive Speech Bases and their Application to Supervised Speech Separation, Discovering Auditory Objects Through Non-Negativity Constraints, and Non-negative Matrix Factor Deconvolution; Extracation of Multiple Sound Sources from Monophonic Inputs, and coauthored Position and Trajector Learning for Microphone Arrays, Secure Sound Classification: Gaussian Mixture Models, Learning Source Trajectories Using Wrapped-Phase Hidden Markov Models, and Bandwidth Expansion of Narrowband Speech Using non-Negative Matrix Factorization. Read his full list of publications!
He earned his postdoc, Ph.D. and M.S. at the Machine Listening Group (currently known as the Music, Mind, and Machine group) of the MIT Media Lab. Watch the MIT Technology Review vlog of computer scientist Paris Smaragdis on “machine listening”. (Works best in Internet Explorer.)
Lifeboat Tidbit: His first name has nothing to do with France, the city, or the omnipresent heiress du jour. It has a lot to do with Greek mythology instead. Paris was born in Athens, Greece and grew up in Hania, Crete.