Project Starline is becoming Google Beam, a platform that uses AI to turn 2D video streams into realistic, immersive video calls.

Two RIKEN researchers have used a scheme for simplifying data to mimic how the brain of a fruit fly reduces the complexity of information about smells it perceives. This could also help enhance our understanding of how the human brain processes sensory data.
The work is published in the journal Science Advances.
Sensors related to our five senses are constantly providing huge amounts of information to the human brain. It would quickly become overloaded if it tried to process that sensory information without first simplifying it by reducing its number of dimensions.
A team of roboticists at the University of Canberra’s Collaborative Robotics Lab, working with a sociologist colleague from The Australian National University, has found humans interacting with an LLM-enabled humanoid robot had mixed reactions. In their paper published in the journal Scientific Reports, the group describes what they saw as they watched interactions between an LLM-enabled humanoid robot posted at an innovation festival and reviewed feedback given by people participating in the interactions.
Over the past couple of years, LLMs such as ChatGPT have taken the world by storm, with some going so far as to suggest that the new technology will soon make many human workers obsolete. Despite such fears, scientists continue to improve such technology, sometimes employing it in new places—such as inside an existing humanoid robot. That is what the team in Australia did—they added ChatGPT to the interaction facilities of a robot named Pepper and then posted the robot at an innovation festival in Canberra, where attendees were encouraged to interact with it.
Before it was given an LLM, Pepper was already capable of moving around autonomously and interacting with people on a relatively simple level. One of its hallmarks is its ability to maintain eye contact. Such abilities, the team suggested, made the robot a good target for testing human interactions with LLM-enabled humanoid robots “in the wild.”
#neuralnetworks #videogeneration #VFX #generativevideo
Over the past few decades, robots have gradually started making their way into various real-world settings, including some malls, airports and hospitals, as well as a few offices and households.
For robots to be deployed on a larger scale, serving as reliable everyday assistants, they should be able to complete a wide range of common manual tasks and chores, such as cleaning, washing the dishes, cooking and doing the laundry.
Training machine learning algorithms that allow robots to successfully complete these tasks can be challenging, as it often requires extensive annotated data and/or demonstration videos showing humans the tasks. Devising more effective methods to collect data to train robotics algorithms could thus be highly advantageous, as it could help to further broaden the capabilities of robots.
Facial morphology is a distinctive biometric marker, offering invaluable insights into personal identity, especially in forensic science. In the context of high-throughput sequencing, the reconstruction of 3D human facial images from DNA is becoming a revolutionary approach for identifying individuals based on unknown biological specimens. Inspired by artificial intelligence techniques in text-to-image synthesis, it proposes Difface, a multi-modality model designed to reconstruct 3D facial images only from DNA. Specifically, Difface first utilizes a transformer and a spiral convolution network to map high-dimensional Single Nucleotide Polymorphisms and 3D facial images to the same low-dimensional features, respectively, while establishing the association between both modalities in the latent features in a contrastive manner; and then incorporates a diffusion model to reconstruct facial structures from the characteristics of SNPs. Applying Difface to the Han Chinese database with 9,674 paired SNP phenotypes and 3D facial images demonstrates excellent performance in DNA-to-3D image alignment and reconstruction and characterizes the individual genomics. Also, including phenotype information in Difface further improves the quality of 3D reconstruction, i.e. Difface can generate 3D facial images of individuals solely from their DNA data, projecting their appearance at various future ages. This work represents pioneer research in de novo generating human facial images from individual genomics information.
(Repost)
This study has introduced Difface, a de novo multi-modality model to reconstruct 3D facial images from DNA with remarkable precision, by a generative diffusion process and a contrastive learning scheme. Through comprehensive analysis and SNP-FACE matching tasks, Difface demonstrated superior performance in generating accurate facial reconstructions from genetic data. In particularly, Difface could generate/predict 3D facial images of individuals solely from their DNA data at various future ages. Notably, the model’s integration of transformer networks with spiral convolution and diffusion networks has set a new benchmark in the fidelity of generated images to their real images, as evidenced by its outstanding accuracy in critical facial landmarks and diverse facial feature reproduction.
Difface’s novel approach, combining advanced neural network architectures, significantly outperforms existing models in genetic-to-phenotypic facial reconstruction. This superiority is attributed to its unique contrastive learning method of aligning high-dimensional SNP data with 3D facial point clouds in a unified low-dimensional feature space, a process further enhanced by adopting diffusion networks for phenotypic characteristic generation. Such advancements contribute to the model’s exceptional precision and ability to capture the subtle genetic variations influencing facial morphology, a feat less pronounced in previous methodologies.
Despite Difface’s demonstrated strengths, there remain directions for improvement. Addressing these limitations will require a focused effort to increase the model’s robustness and adaptability to diverse datasets. Future research should aim to incorporating variables like age and BMI would allow Difface to simulate age-related changes, enabling the generation of facial images at different life stages an application that holds significant potential in both forensic science and medical diagnostics. Similarly, BMI could help the model account for variations in body composition, improving its ability to generate accurate facial reconstructions across a range of body types.
Researchers at Apple have released an eyebrow-raising paper that throws cold water on the “reasoning” capabilities of the latest, most powerful large language models.
In the paper, a team of machine learning experts makes the case that the AI industry is grossly overstating the ability of its top AI models, including OpenAI’s o3, Anthropic’s Claude 3.7, and Google’s Gemini.