Human preferences on any topic have become diverse. Coming up with a statement that the majority of the population agrees with seems to be a challenge. Researchers at DeepMind, an AI company, accepted this challenge, trained a large language model, and fine-tuned it. They have to assume that human preferences are static and homogeneous to build the model.
The model generates statements to maximize approval among a group of people with diverse preferences. The research team fine-tuned the 70 billion parameter model, which was provided by thousand moral and political questions, and human written responses were provided for those questions. Then a reward model was trained in order to give weight to different opinions. Their best model was able to achieve more than a 65 percent preference rate.
The model was very sensitive when they tested it by just feeding part of the responses of the group of people then, the rest of the people’s opinion, which was not included, had a significant variance. Thus, the individual contribution of each consensus is equally important. There are many complicated NLP tasks like reading comprehension, fluent language generation, etc., which helped form the foundations for this LLM.