Toggle light / dark theme

Meta announces MA-LMM

Memory-augmented large multimodal model for long-term video understanding.

With the success of large language models (#LLMs), integrating the vision model into LLMs to build vision-language #foundation models has gained much more interest…


Join the discussion on this paper page.

“Larger Model Is Not All You Need”

Google announces Scaling (Down) CLIP

A comprehensive analysis of data, architecture, and training strategies.

This paper investigates the performance of the Contrastive Language-Image Pre-training (CLIP) when scaled down to…