When we started this project, our goal was to reproduce Chinchilla, so we did not move too much away from the original architecture. Retrospectively, we should probably have done what PALM did: multi query + increase FeedForward hidden size to make up for the loss of parameters
Chinchilla Reproduction: Architectural Decisions and Retrospective Improvements
By
–
Leave a Reply