Interesting, do you mean torch.compiling here? And if yes, why/how does that work in terms of speed-up since nn.Embedding is basically just a tensor for cheap look-up? (But also, if you are already compiling the model, why would you skip the embedding layer in the first place?)