I agree, but it’s tricky for models to learn this. It would make a good PhD thesis Curiously, the old PAQ8 language models back in 2010 used sequences of bits, as you suggest, because we had to rely on binary C operations to make them efficient enough. See
PAQ8 bit sequences language models efficiency 2010
By
–
Leave a Reply