I'd be interested too but I don't expect much. Still a cool approach, with lots of potential gains in scaling more efficiently (training compute =>> inference compute)
Efficient scaling: training compute versus inference compute gains
By
–
By
–
I'd be interested too but I don't expect much. Still a cool approach, with lots of potential gains in scaling more efficiently (training compute =>> inference compute)