Toggle light / dark theme

This article was published as a part of the Data Science Blogathon.

Introduction

Computer Vision is evolving from the emerging stage and the result is incredibly useful in various applications. It is in our mobile phone cameras which are able to recognize faces. It is available in self-driving cars to recognize traffic signals, signs, and pedestrians. Also, it is in industrial robots to monitor problems and navigating around co-workers.

You’ve probably heard us say this countless times: GPT-3, the gargantuan AI that spews uncannily human-like language, is a marvel. It’s also largely a mirage. You can tell with a simple trick: Ask it the color of sheep, and it will suggest “black” as often as “white”—reflecting the phrase “black sheep” in our vernacular.

That’s the problem with language models: because they’re only trained on text, they lack common sense. Now researchers from the University of North Carolina, Chapel Hill, have designed a new technique to change that. They call it “vokenization,” and it gives language models like GPT-3 the ability to “see.”

It’s not the first time people have sought to combine language models with computer vision. This is actually a rapidly growing area of AI research. The idea is that both types of AI have different strengths. Language models like GPT-3 are trained through unsupervised learning, which requires no manual data labeling, making them easy to scale. Image models like object recognition systems, by contrast, learn more directly from reality. In other words, their understanding doesn’t rely on the kind of abstraction of the world that text provides. They can “see” from pictures of sheep that they are in fact white.

Recently, a team of researchers from Facebook AI and Tel Aviv University proposed an AI system that solves the multiple-choice intelligence test, Raven’s Progressive Matrices. The proposed AI system is a neural network model that combines multiple advances in generative models, including employing multiple pathways through the same network.

Raven’s Progressive Matrices, also known as Raven’s Matrices, are multiple-choice intelligence tests. The test is used to measure abstract reasoning and is regarded as a non-verbal estimate of fluid intelligence.

In this test, a person tries to finish the missing location in a 3X3 grid of abstract images. According to the researchers, there have been various similar researches, where the main focus entirely on choosing the right answer out of the various choices. However, in this research, the researchers focussed on generating a correct answer given the grid, without seeing the choices.

Imagine this scenario: You’re deployed to some godforsaken hellhole downrange, in desperate need of additional ammo and chow. You call your superior officer for a very special airdrop: with approval from the U.S. Transportation Command, your cargo is launched into low earth orbit in a rocket-assisted payload. Within an hour, voila — fresh 5.56mm rounds and some delicious pizza MRE, ready to refresh your arsenal and renew your spirit.

The idea of space-borne resupply pods may seem like something out of science-fiction but it could someday be a reality for troops downrange, according to Army officials.

If Facebook’s AI research objectives are successful, it may not be long before home assistants take on a whole new range of capabilities. Last week the company announced new work focused on advancing what it calls “embodied AI”: basically, a smart robot that will be able to move around your house to help you remember things, find things, and maybe even do things.

Robots That Hear, Home Assistants That See

In Facebook’s blog post about audio-visual navigation for embodied AI, the authors point out that most of today’s robots are “deaf”; they move through spaces based purely on visual perception. The company’s new research aims to train AI using both visual and audio data, letting smart robots detect and follow objects that make noise as well as use sounds to understand a physical space.

A team of scientists and students from the University of Sheffield has designed and built a specialist microscope, and shared the build instructions to help make this equipment available to many labs across the world.

The microscope, called the smfBox, is capable of single-molecule measurements allowing scientists to look at one molecule at a time rather than generating an average result from bulk samples and works just as well as commercially available instruments.

This single-molecule method is currently only available at a few specialist labs throughout the world due to the cost of commercially available microscopes.