I'd say it barely works tbh… 🙂 It really depends on the type of merge and whether you retrain it or not. In this case, it's not really "working". The additional layers seem to push the hidden representations into new areas, which may or may not be interesting, depending on
Model Merging Techniques and Hidden Representation Dynamics
By
–