The original transformer is an encoder-decoder arch for translation. T5 is a great encoder-encoder that’s pretty good at translation. ChatGPT / GPT-4 is a decoder-only that’s pretty good at translation too. How does it compare to encoder-decoder architectures of similar size?
Comparing Decoder-Only Models to Encoder-Decoder Architectures for Translation
By
–
Leave a Reply