A new type of transformer can check its work. Instead of guessing the next output token in one shot like a typical transformer, it starts with a rough version of the token and improves it step by step.
What’s new: Alexi Gladstone and colleagues at University of Virginia, University of Illinois Urbana-Champaign, Amazon, Stanford, and Harvard proposed the Energy-Based Transformer (EBT). Early experiments show that it scales more efficiently than transformers at relatively small sizes.
Energy-based model basics: For a given input context paired with a candidate response (for example, a prompt and potential next token), an energy-based model produces a number called “energy” that represents how likely the potential next token would follow the prompt. During training, the model learns to assign low energy if a context/potential-response pair is very likely and high energy if it’s not.