DoLa was just merged in transformers! This new decoding method works by contrasting token logits between the final layer and earlier layers, with the premise that high level knowledge builds in top layers rather than the first ones. It significantly reduces hallucinations!
