Offline reinforcement learning, where an agent tries to improve a behavior policy by observing another agent without actually playing, is a harder problem than it appears. The challenge isn’t to mimic the provided play, but to learn something better than what you have seen. The
Offline Reinforcement Learning: Beyond Mimicking to Improvement
By
–