Gambling Meets Quantum Physics — New “Bandit” Algorithm Uses Light for Better Bets

How does a gambler maximize winnings from a row of slot machines? This question inspired the “multi-armed bandit problem,” a common task in reinforcement learning in which “agents” make choices to earn rewards. Recently, an international team of researchers, led by Hiroaki Shinkawa from the University of Tokyo, introduced an advanced photonic reinforcement learning method that transitions from the static bandit problem to a more intricate dynamic setting. Their findings were recently published in the journal, Intelligent Computing.

The success of the scheme relies on both a photonic system to enhance the learning quality and a supporting algorithm. Looking at a “potential photonic implementation,” the authors developed a modified bandit Q-learning algorithm and validated its effectiveness through numerical simulations. They also tested their algorithm with a parallel architecture, where multiple agents operate at the same time, and found that the key to accelerating the parallel learning process is to avoid conflicting decisions by taking advantage of the quantum interference of photons.

Although using the quantum interference of photons is not new in this field, the authors believe this study is “the first to connect the notion of photonic cooperative decision-making with Q-learning and apply it to a dynamic environment.” Reinforcement learning problems are generally set in a dynamic environment that changes with the agents’ actions and are thus more complex than the static environment in a bandit problem.

Blog