Toggle light / dark theme

Google announces Streaming Dense Video Captioning.

An ideal model for dense video captioning — predicting captions localized temporally in a video — should be able to handle long input videos, predict rich, detailed textual descriptions, and be able to…


Join the discussion on this paper page.

Unsolvable problem detection evaluating trustworthiness of vision language models.

Evaluating trustworthiness of vision language models.

This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD).


Join the discussion on this paper page.