What if LLMs could tune their own decoding—no more guesswork with temperature and top‑p? Enter AutoDeco: the first architecture that makes LLM decoding truly end-to-end. By adding lightweight heads, the model predicts its own context-aware temperature and top‑p at every token
AutoDeco: LLMs Learn Self-Tuning Decoding Parameters
By
–
