{"id":77923,"date":"2018-04-19T10:23:06","date_gmt":"2018-04-19T17:23:06","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2018\/04\/google-ai-can-pick-out-voices-in-a-crowd"},"modified":"2018-04-19T10:23:06","modified_gmt":"2018-04-19T17:23:06","slug":"google-ai-can-pick-out-voices-in-a-crowd","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2018\/04\/google-ai-can-pick-out-voices-in-a-crowd","title":{"rendered":"Google AI can pick out voices in a crowd"},"content":{"rendered":"<p><\/p>\n<p><iframe style=\"display: block; margin: 0 auto; width: 100%; aspect-ratio: 4\/3; object-fit: contain;\" src=\"https:\/\/www.youtube.com\/embed\/NzZDnRni-8A?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope;\n   picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p>Humans are usually good at isolating a single voice in a crowd, but computers? Not so much \u2014 just ask anyone trying to talk to a smart speaker at a house party. Google may have a surprisingly straightforward solution, however. Its researchers have <a href=\"https:\/\/research.googleblog.com\/2018\/04\/looking-to-listen-audio-visual-speech.html\">developed<\/a> a deep learning system that can pick out specific voices by looking at people\u2019s faces when they\u2019re speaking. The team trained its neural network model to recognize individual people speaking by themselves, and then created virtual \u201cparties\u201d (complete with background noise) to teach the AI how to isolate multiple voices into distinct audio tracks.<\/p>\n<p>The results, as you can see below, are uncanny. Even when people are clearly trying to compete with each other (such as comedians Jon Dore and Rory Scovel in the <a href=\"https:\/\/www.youtube.com\/watch?v=UT7h4nRcWjU\">Team Coco clip<\/a> above), the AI can generate a clean audio track for one person just by focusing on their face. That\u2019s true even if the person partially obscures their face with hand gestures or a microphone.<\/p>\n<p>Google is currently \u201cexploring opportunities\u201d to use this feature in its products, but there are more than a few prime candidates. It\u2019s potentially ideal for video chat services like <a href=\"https:\/\/www.engadget.com\/2018\/01\/22\/googles-hangouts-meet-video-calls-are-now-available-on-tablets\/\">Hangouts<\/a> or <a href=\"https:\/\/www.engadget.com\/2017\/12\/12\/google-finally-integrates-duo-android-call-screen\/\">Duo<\/a>, where it could help you understand someone talking in a crowded room. It could also be helpful for speech enhancement in video recording. And there are big implications for accessibility: it could lead to camera-linked hearing aids that boost the sound of whoever\u2019s in front of you, and more effective closed captioning. There are potential privacy issues (this could be used for public eavesdropping), but it wouldn\u2019t be too difficult to limit the voice separation to people who\u2019ve clearly given their consent.<\/p>\n<p><!-- Link: <a href=\"https:\/\/www.engadget.com\/2018\/04\/12\/google-ai-looking-to-listen-voice-separation\/\">https:\/\/www.engadget.com\/2018\/04\/12\/google-ai-looking-to-lis...eparation\/<\/a> --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Humans are usually good at isolating a single voice in a crowd, but computers? Not so much \u2014 just ask anyone trying to talk to a smart speaker at a house party. Google may have a surprisingly straightforward solution, however. Its researchers have developed a deep learning system that can pick out specific voices by [\u2026]<\/p>\n","protected":false},"author":367,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,6],"tags":[],"class_list":["post-77923","post","type-post","status-publish","format-standard","hentry","category-biotech-medical","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/77923","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/367"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=77923"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/77923\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=77923"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=77923"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=77923"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}