It turns out we can easily apply the material we’ve covered to get the policy gradient for fine tuning a diffusion model, see eg https://
arxiv.org/pdf/2305.13301 It becomes a multi step RL problem with the reward only happening at the end. It’s not very efficient I think, but I’d love to
Policy Gradient for Diffusion Model Fine-tuning
By
–