It may not be of general interest, but for mathematicians there is valuable knowledge. The authors introduce Retrace: an off-policy return-based RL algorithm that has low variance and is proven theoretically to converge (in the tabular case) to the value function of the target