#PaperADay 9
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL https://arxiv.org/pdf/2509.06863 In theory, value based reinforcement learning is a regression problem, which is most naturally addressed with an MSE loss. However, there are a bunch of subtle
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
By
–
Leave a Reply