So, today we are releasing the Crossmodal 3,600 dataset, which provides 261,375 reference captions in 36 languages for a geographically diverse set of 3,600 images.
Image captioning datasets can be limited for languages other than English
Posted in futurism