Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws very interesting (if somewhat controversial) they declare that models, after visiting data 1000 times during training, can memorize 2 bits/param (they arrive at this number via quantization with AutoGPTQ)
Language Models Memory Capacity Scaling Laws Physics
By
–
