Toggle light / dark theme

How programmers turned the internet into a paintbrush. DALL-E 2, Midjourney, Imagen, explained.

Subscribe and turn on notifications 🔔 so you don’t miss any videos: http://goo.gl/0bsAjO

Beginning in January 2021, advances in AI research have produced a plethora of deep-learning models capable of generating original images from simple text prompts, effectively extending the human imagination. Researchers at OpenAI, Google, Facebook, and others have developed text-to-image tools that they have not yet released to the public, and similar models have proliferated online in the open-source arena and at smaller companies like Midjourney.

These tools represent a massive cultural shift because they remove the requirement for technical labor from the process of image-making. Instead, they select for creative ideation, skillful use of language, and curatorial taste. The ultimate consequences are difficult to predict, but — like the invention of the camera, and the digital camera thereafter — these algorithms herald a new, democratized form of expression that will commence another explosion in the volume of imagery produced by humans. But, like other automated systems trained on historical data and internet images, they also come with risks that have not been resolved.

The video above is a primer on how we got here, how this technology works, and some of the implications. And for an extended discussion about what this means for human artists, designers, and illustrators, check out this bonus video: https://youtu.be/sFBfrZ-N3G4

Midjourney: www.midjourney.com.

The laws of physics do not exist, a theoretical physicist named Sankar Das Sarma argues in a new column published by New Scientist. While we define the laws as the “ultimate laws” of our universe, Sarma says they are merely working descriptions, and that they are nothing more than mathematical equations that match with parts of nature.

Both animals and people use high-dimensional inputs (like eyesight) to accomplish various shifting survival-related objectives. A crucial aspect of this is learning via mistakes. A brute-force approach to trial and error by performing every action for every potential goal is intractable even in the smallest contexts. Memory-based methods for compositional thinking are motivated by the difficulty of this search. These processes include, for instance, the ability to: recall pertinent portions of prior experience; (ii) reassemble them into new counterfactual plans, and (iii) carry out such plans as part of a focused search strategy. Compared to equally sampling every action, such techniques for recycling prior successful behavior can considerably speed up trial-and-error. This is because the intrinsic compositional structure of real-world objectives and the similarity of the physical laws that control real-world settings allow the same behavior (i.e., sequence of actions) to remain valid for many purposes and situations. What guiding principles enable memory processes to retain and reassemble experience fragments? This debate is strongly connected to the idea of dynamic programming (DP), which using the principle of optimality significantly lowers the computing cost of trial-and-error. This idea may be expressed informally as considering new, complicated issues as a recomposition of previously solved, smaller subproblems.

This viewpoint has recently been used to create hierarchical reinforcement learning (RL) algorithms for goal-achieving tasks. These techniques develop edges between states in a planning graph using a distance regression model, compute the shortest pathways across it using DP-based graph search, and then use a learning-based local policy to follow the shortest paths. Their essay advances this field of study. The following is a summary of their contributions: They provide a strategy for long-term planning that acts directly on high-dimensional sensory data that an agent may see on its own (e.g., images from an onboard camera). Their solution blends traditional sampling-based planning algorithms with learning-based perceptual representations to recover and reassemble previously recorded state transitions in a replay buffer.

The two-step method makes this possible. To determine how many timesteps it takes for an optimum policy to move from one state to the next, they first learn a latent space where the distance between two states is the measure. They know contrastive representations using goal-conditioned Q-values acquired through offline hindsight relabeling. To establish neighborhood criteria across states, the second threshold this developed latent distance metric. They go on to design sampling-based planning algorithms that scan the replay buffer for trajectory segments—previously recorded successions of transitions—whose ends are adjacent states.

Switch-Science has just announced a trio of quantum computing products that the company claims are the world’s first portable quantum computers. Sourced from SpinQ Technology, a Chinese quantum computing company based in Shenzen, the new quantum computing products have been designed for educational purposes. The aim is to democratize access to physical quantum computing solutions that can be deployed (and redeployed) at will. But considering the actual quantum machinery on offer, none of these (which we’re internally calling “quantops”) are likely to be a part of the future of quantum.

The new products being developed with education in mind shows in their qubit counts, which top out at three (compare that to Google’s Sycamore or IBM’s 433-qubit Osprey Quantum Processing Unit [QPU], both based on superconducting qubits). That’s not enough a number for any viable, problem-solving quantum computing to take place within these machines, but it’s enough that users can program and run quantum circuits — either the integrated, educational ones, or a single custom algorithm.

Self-supervised learning is a form of unsupervised learning in which the supervised learning task is constructed from raw, unlabeled data. Supervised learning is effective but usually requires a large amount of labeled data. Getting high-quality labeled data is time-consuming and resource-intensive, especially for sophisticated tasks like object detection and instance segmentation, where more in-depth annotations are sought.

Self-supervised learning aims to first learn usable representations of the data from an unlabeled pool of data by self-supervision and then to refine these representations with few labels for the supervised downstream tasks such as image classification, semantic segmentation, etc.

Self-supervised learning is at the heart of many recent advances in artificial intelligence. However, existing algorithms focus on a particular modality (such as images or text) and a high computer resource requirement. Humans, on the other hand, appear to learn significantly more efficiently than existing AI and to learn from diverse types of information consistently rather than requiring distinct learning systems for text, speech, and other modalities.

DALL-E 2 transformed the world of art in 2022.

DALL-E is a system that has been around for years, but its successor, DALL-E 2, was launched this year.


Ibrahim Can/Interesting Engineering.

DALL-E and DALL-E 2 are machine-learning models created by OpenAI to produce images from language descriptions. These text-to-image descriptions are known as prompts. The system could generate realistic images just from a description of the scene. DALL-E is a neural network algorithm that creates accurate pictures from short phrases provided by the user. It comprehends language through textual descriptions and from “learning” information provided in its datasets by users and developers.

Researchers have developed a new all-optical method for driving multiple highly dense nanolaser arrays. The approach could enable chip-based optical communication links that process and move data faster than today’s electronic-based devices.

“The development of optical interconnects equipped with high-density nanolasers would improve information processing in the that move information across the internet,” said research team leader Myung-Ki Kim from Korea University.

“This could allow streaming of ultra-high-definition movies, enable larger-scale interactive online encounters and games, accelerate the expansion of the Internet of Things and provide the fast connectivity needed for big data analytics.”

Turing Award winner and deep learning pioneer Geoffrey Hinton, one of the original proponents of backpropagation, has argued in recent years that backpropagation does not explain how the brain works. In his NeurIPS 2022 keynote speech, Hinton proposes a new approach to neural network learning: the Forward-Forward algorithm.

After six decades we have finally reached controlled fusion “ignition.” Here is how it works and what it means (and doesn’t mean):

At the Lawrence Livermore National Lab (LLNL) the National Ignition Facility (NIF) starts with the Injection Laser System (ILS), a ytterbium-doped optical fiber laser (Master Oscillator) that produces a single very lower power, 1,053 nanometer (Infrared Light) beam. This single beam is split into 48 Pre-Amplifiers Modules (PAMs) that create four beams each (192 total). Each PAM conducts a two-stage amplification process via xenon flash lamps.


Self-coding and self-updating AI algorithms appear to be on the horizon. There are talks about Pitchfork AI, a top-secret Google Labs project that can independently code, refactor, and use both its own and other people’s code.

This type of AI has actually been discussed for a long time, and DeepMind mentioned it at the beginning of the year along with the AlphaCode AI, which, according to them, “code programs in competitive level” as a middle developer. However, since February, there hasn’t been any more interesting news.

Deep-learning models have proven to be highly valuable tools for making predictions and solving real-world tasks that involve the analysis of data. Despite their advantages, before they are deployed in real software and devices such as cell phones, these models require extensive training in physical data centers, which can be both time and energy consuming.

Researchers at Texas A&M University, Rain Neuromorphics and Sandia National Laboratories have recently devised a new system for deep learning models more efficiently and on a larger scale. This system, introduced in a paper published in Nature Electronics, relies on the use of new training algorithms and memristor crossbar , that can carry out multiple operations at once.

“Most people associate AI with health monitoring in smart watches, face recognition in smart phones, etc., but most of AI, in terms of energy spent, entails the training of AI models to perform these tasks,” Suhas Kumar, the senior author of the study, told TechXplore.