@drummatick the power of these transformer models often come from the data it’s trained on, I’d try fine tuning from an OSS checkpoint. The power of the transformer architecture is that (a) it’s easy to adapt to different problems/modalities, (b) lends itself well to broad pre
Fine-tuning Transformers: Leveraging OSS Checkpoints for Model Adaptation
By
–
Leave a Reply