Blog

Mar 9
2024

Enhancing Vision-Language Pre-training with Rich Supervisions

We propose Strongly Supervised pre-training with ScreenShots (S4) — a novel pre-training paradigm for Vision-Language Models using data from large-scale web screenshot rendering.

Join the discussion on this paper page.

/* */