Not that I am aware of. On top of that several models don’t even have thorough papers themselves because people are currently rushing them out. I guess the best way is really some empirical analysis coupled with some knowledge bits like falcon and llama2 use multiquery attention
Model Documentation Gaps and Empirical Analysis Importance
By
–
Leave a Reply