5/ While the experiments done here were on small scale models (100M parameters), the fundamental effects we’re seeing here are likely to show up over time on the larger models too. For example, most of today’s models can’t produce a blog post in the style of Slate Star Codex,
Small Language Models Show Emerging Style Imitation Capabilities
By
–