Perception-Action loops are at the core of most our daily life activities. Subconsciously, our brains use sensory inputs to trigger specific motor actions in real time and this becomes a continuous activity that in all sorts of activities from playing sports to watching TV. In the context of artificial intelligence(AI), perception-action loops are the cornerstone of autonomous systems such as self-driving vehicles. While disciplines such as imitation learning or reinforcement learning have certainly made progress in this area, the current generation of autonomous systems are still nowhere near human skill in making those decisions directly from visual data. Recently, AI researchers from Microsoft published a paper proposing a transfer learning method to learn perception-action policies from in a simulated environment and apply the knowledge to fly an autonomous drone.
The challenge of learning which actions to take based on sensory input is not so much related to theory as to practical implementations. In recent years, methods like reinforcement learning and imitation learning have shown tremendous promise in this area but they remain constrained by the need of large amounts of difficult-to-collect labeled real world data. Simulated data, on the other hand, is easy to generate, but generally does not render safe behaviors in diverse real-life scenarios. Being able to learn policies in simulated environments and extrapolate the knowledge to real world environments remains one of the main challenges of autonomous systems. To advance research in this area, the AI community has created many benchmarks for real world autonomous systems. One of the most challenging is known as first person view drone racing.
In first-person view(FPV) done racing, expert pilots are able to plan and control a quadrotor with high agility using a potentially noisy monocular camera feed, without comprising safety. The Microsoft Research team attempted to build an autonomous agent that can control a drone in FPV racing.