Redundancy in language, eg “I am Sam” vs “am Sam”, makes communication robust to errors. Is there a crisp argument for why this error-correction helps in learning auto-regressive language models? The intuition seems right but I’d love to see a formal proof or experiment.
Redundancy and Error Correction in Autoregressive Language Model Learning
By
–
Leave a Reply