Dr. Paris Smaragdis
The MIT Technology Review article Hearing Machines: While hearing in machines lags far behind vision in machines, the potential is great, and researchers are beginning to make impressive progress said
Our knowledge of human hearing is relatively limited. We know how our ears work, but we mostly improvise in our descriptions as neural signals move deeper into the brain. The study of auditory psychology is not even close to where we want it to be. Machine learning, AI, and classical computer-science algorithms are deeply rooted in a visual way of thinking that does not extend naturally to reasoning about sound. Our own ability to describe sounds and the process of hearing is predominantly limited to vocabulary developed for music. These problems share the common cause that hearing (whether human or machine) is not something that has attracted adequate attention.
Because of this, the process of creating a new technology in this field from finding bibliographical references and abstracting to simpler problems, to actually explaining the point of it all in a business or technical meeting is a fight against the unknown. Things are getting better, though: in the past few years an increasing number of researchers interested in computational perception have started showing interest in hearing (as well as in the senses taste and olfaction), and we have seen some amazing progress in our field as well as the slow emergence of relevant products in the mainstream.
So keep your mind and ears open. You might not see much of hearing machines today, but you’ll be hearing about them soon.
Dr. Paris Smaragdis was the author of this article and is a
research scientist at
Mitsubishi Electric Research Laboratories (MERL).
His main interests are
auditory scene analysis and self-organizing computational perception.
Before coming to MERL he was a postdoctoral associate at MIT, where he
also obtained his PhD degree in perceptual computing. His most recent
work has been on sound source separation, multimodal statistics and
audio classification.
His goal is to make computational audition as mainstream as computer
vision.
He most famously did some of the earliest work on source separation
using ICA in the frequency domain, and some more work on
self-organizing auditory perception (preprocessing, grouping and scene
analysis).
His recent projects include
Acoustic Doppler for Denoising
Speech Signals,
Audio-Assisted
Cameras,
MPEG-7 Music
Player,
Secret
Audio,
Sound-Based Traffic
Incident
Detection, and
Sound Spotter:
Recognition and
Extraction from Mixed Audio.
Paris authored
Convolutive Speech Bases and their Application to Supervised Speech
Separation,
Discovering Auditory Objects Through Non-Negativity
Constraints, and
Non-negative Matrix Factor Deconvolution; Extracation of
Multiple Sound Sources from Monophonic Inputs, and
coauthored
Position and Trajector Learning for Microphone Arrays,
Secure Sound Classification: Gaussian Mixture Models,
Learning Source Trajectories Using Wrapped-Phase Hidden Markov
Models, and
Bandwidth Expansion of Narrowband Speech Using non-Negative Matrix
Factorization. Read his
full list of
publications!
He earned his postdoc, Ph.D. and M.S. at the Machine Listening Group
(currently known as the Music, Mind, and Machine group) of the MIT
Media
Lab.
Watch the MIT Technology Review vlog of computer scientist Paris
Smaragdis on “machine listening”. (Works best in Internet Explorer.)
Lifeboat Tidbit: His first name has nothing to do with France, the
city,
or the omnipresent
heiress du jour. It has a lot to do with Greek
mythology instead.
Paris was born in Athens, Greece and grew up in Hania, Crete.
