Offline RL is a mess: unclear goals, tangled code, and sneaky online tuning A team from University of Oxford fixs it with: Unifloral — one clean framework, shared hyperparams
Result? New SOTA algorithms: TD3-AWR & MoBRAC.
Oxford Unifloral Framework Advances Offline Reinforcement Learning
By
–
Leave a Reply