AI systems, such as GPT-4, can now learn and use human language, but they learn from astronomical amounts of language input—much more than children receive when learning how to understand and speak a language. The best AI systems train on text with a word count in the trillions, whereas children receive just millions per year.
Due to this enormous data gap, researchers have been skeptical that recent AI advances can tell us much about human learning and development. An ideal test for demonstrating a connection would involve training an AI model, not on massive data from the web, but on only the input that a single child receives. What would the model be able to learn then?
A team of New York University researchers ran this exact experiment. They trained a multimodal AI system through the eyes and ears of a single child, using headcam video recordings from when the child was 6 months and through their second birthday. They examined if the AI model could learn words and concepts present in a child’s everyday experience.
Comments are closed.