Comments on: Over-optimization in RL is well-known, but it even occurs when KL(policy || base model) is constrained fairly tightly https://lifeboat.com/blog/2024/10/over-optimization-in-rl-is-well-known-but-it-even-occurs-when-klpolicy-base-model-is-constrained-fairly-tightly Safeguarding Humanity Sat, 12 Oct 2024 04:25:09 +0000 hourly 1 https://wordpress.org/?v=6.6.2