From sensetime, shanghai #AI lab, & tsinghua U
InternLM-XComposer2-4KHD
A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD https://huggingface.co/papers/2404.
The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression…
Join the discussion on this paper page.
Comments are closed.