By "help from SAM to make it happen" do you mean SAM was used as part of the training process but the visual grounding features now execute using just Muse Spark with no SAM involvement at all?
SAM Training Integration in Muse Spark Visual Grounding
By
–