Can we merge the query and key weight matrices in an LLM into a single covariance matrix and still train effectively? Here are some promising early results from a reader: https://
github.com/rasbt/LLMs-fro
m-scratch/discussions/517
…
Anyone else familiar with projects that tried this?
Merging Query and Key Weight Matrices in LLM Training
By
–
Leave a Reply