Menu

Blog

Mar 9, 2024

Enhancing Vision-Language Pre-training with Rich Supervisions

Posted by in category: futurism

We propose Strongly Supervised pre-training with ScreenShots (S4) — a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.


Join the discussion on this paper page.

Leave a reply