From a quick look, this is 1-step RL using policy gradients to regress towards the majority voter using 0-1 loss.
One-Step RL with Policy Gradients and Majority Voting
By
–
By
–
From a quick look, this is 1-step RL using policy gradients to regress towards the majority voter using 0-1 loss.