Good catch! The numbers in the old table were pointwise estimates – pointwise performance is a bucketed estimate over context lengths, while the paper reports a cumulative average over context lengths. Pointwise and cumulative metrics are naturally incomparable and the pointwise
Pointwise vs Cumulative Metrics in Language Model Evaluation
By
–
Leave a Reply