Toggle light / dark theme

By Watching Unlabeled Videos.


Recent advances in machine learning (ML) and artificial intelligence (AI) are increasingly being adopted by people worldwide to make decisions in their daily lives. Many studies are now focusing on developing ML agents that can make acceptable predictions about the future over various timescales. This would help them anticipate changes in the world around them, including the actions of other agents, and plan their next steps. Making judgments require accurate future prediction necessitates both collecting important environmental transitions and responding to how changes develop over time.

Previous work in visual observation-based future prediction has been limited by the output format or a manually defined set of human activities. These are either overly detailed and difficult to forecast, or they are missing crucial information about the richness of the real world. Predicting “someone jumping” does not account for why they are jumping, what they are jumping onto, and so on. Previous models were also meant to make predictions at a fixed offset into the future, which is a limiting assumption because we rarely know when relevant future states would occur.

A new Google study introduces a Multi-Modal Cycle Consistency (MMCC) method, which uses narrated instructional video to train a strong future prediction model. It is a self-supervised technique that was developed utilizing a huge unlabeled dataset of various human actions. The resulting model operates at a high degree of abstraction, can anticipate arbitrarily far into the future, and decides how far to predict based on context.

Are we governed by donkeys? COP26 was just a farce of vested interests kissing the butts of fossil fuel legacy industries that are so out of date that they cannot compete anymore and need underhand, secret handshake deals just to keep themselves in the luxury they enjoy…at our expense. So here is my Manifesto for the next decade. It is time to start voting for the right people and harassing your representatives to get them to make the right decisions that will benefit the majority, not a few CEO’s who are so corrupt it is like the plot of a new film…

Laughter is a ubiquitous social signal. Recent work has highlighted distinctions between spontaneous and volitional laughter, which differ in terms of both production mechanisms and perceptual features. Here, we test listeners’ ability to infer group identity from volitional and spontaneous laughter, as well as the perceived positivity of these laughs across cultures. Dutch (n = 273) and Japanese (n = 131) participants listened to decontextualized laughter clips and judged (i) whether the laughing person was from their cultural in-group or an out-group; and (ii) whether they thought the laughter was produced spontaneously or volitionally. They also rated the positivity of each laughter clip. Using frequentist and Bayesian analyses, we show that listeners were able to infer group membership from both spontaneous and volitional laughter, and that performance was equivalent for both types of laughter. Spontaneous laughter was rated as more positive than volitional laughter across the two cultures, and in-group laughs were perceived as more positive than out-group laughs by Dutch but not Japanese listeners. Our results demonstrate that both spontaneous and volitional laughter can be used by listeners to infer laughers’ cultural group identity.

This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part II)’.

Laughter is a frequently occurring and socially potent nonverbal vocalization, which is frequently used to signal affiliation, reward or cooperative intent, and often helps to maintain and strengthen social bonds [1,2]. A key distinction is whether laughs are spontaneous or volitional [3,4]. Spontaneous and volitional laughs are thought to be generated by different vocal production mechanisms. We often laugh spontaneously with little volitional control, which is thought to typically reflect an internal emotional state. Yet laughter can also be produced with volitional modulation of vocal output, which is more likely to express polite agreement in conversation [5,6]. Recent research has shown that listeners’ ability to differentiate individual speakers is impaired for spontaneous, as compared to volitional, laughter [7,8].