Meta presents MoDE
CLIP Data Experts via Clustering.
The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data.
Join the discussion on this paper page.
Comments are closed.