Nov 23, 2023

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Q* appears to apply a RL technique that uses AI generated data and teaches LLMs how to solve multi step logic problems Q* techniques can be applied to GPT-5 endowing it with excellent reasoning and retrieval skills This may not be AGI but it is an extremely powerful LLM.

  1. Lance says:

    What happens when “it” causes you to do something you don’t want to do?

