Toggle light / dark theme

Since the DeepSpeed optimization library was introduced last year, it has rolled out numerous novel optimizations for training large AI models—improving scale, speed, cost, and usability. As large models have quickly evolved over the last year, so too has DeepSpeed. Whether enabling researchers to create the 17-billion-parameter Microsoft Turing Natural Language Generation (Turing-NLG) with state-of-the-art accuracy, achieving the fastest BERT training record, or supporting 10x larger model training using a single GPU, DeepSpeed continues to tackle challenges in AI at Scale with the latest advancements for large-scale model training. Now, the novel memory optimization technology ZeRO (Zero Redundancy Optimizer), included in DeepSpeed, is undergoing a further transformation of its own. The improved ZeRO-Infinity offers the system capability to go beyond the GPU memory wall and train models with tens of trillions of parameters, an order of magnitude bigger than state-of-the-art systems can support. It also offers a promising path toward training 100-trillion-parameter models.

ZeRO-Infinity at a glance: ZeRO-Infinity is a novel deep learning (DL) training technology for scaling model training, from a single GPU to massive supercomputers with thousands of GPUs. It powers unprecedented model sizes by leveraging the full memory capacity of a system, concurrently exploiting all heterogeneous memory (GPU, CPU, and Non-Volatile Memory express or NVMe for short). Learn more in our paper, “ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning.” The highlights of ZeRO-Infinity include:

Just as microelectronics transformed the modern world through the creation of the integrated circuit, which is now at the heart of most electronic devices, quantum photonics needs an equivalent platform to fulfil its application potential. In this special focus issue of Nature Photonics, we report on the progress in making this a reality with the developments in integrated quantum photonics (IQP).

In a Review Article, Jianwei Wang and colleagues provide a general overview and introduction to IQP circuits and summarize the present development of quantum hardware based on IQP chips. They remark that the challenge for measurement-based quantum computation may shift from the need for deterministic gates to constructing a generic entangled cluster-state, on which any quantum computation could be mapped by a sequence of measurements.

IQP circuits are also a desirable platform for chip-based quantum communications. However, fully integrated chip-based quantum communication has not yet been realized, largely because of the integration difficulties between silicon wafers that feature optical waveguides and other passive components and light sources and photodetectors that are made from different semiconductors. Key components such as transmitters and receivers for quantum key distribution and quantum random number generators are instead individually fabricated.

A free-floating planet (FFP) is a planetary-mass object that orbits around a non-stellar massive object (e.g. a brown dwarf) or around the Galactic Centre. The presence of exomoons orbiting FFPs has been theoretically predicted by several models. Under specific conditions, these moons are able to retain an atmosphere capable of ensuring the long-term thermal stability of liquid water on their surface. We model this environment with a one-dimensional radiative-convective code coupled to a gas-phase chemical network including cosmic rays and ion-neutral reactions. We find that, under specific conditions and assuming stable orbital parameters over time, liquid water can be formed on the surface of the exomoon. The final amount of water for an Earth-mass exomoon is smaller than the amount of water in Earth oceans, but enough to host the potential development of primordial life.

Dark matter may self-interact through a continuum of low-mass states. This happens if dark matter couples to a strongly-coupled nearly-conformal hidden sector. This type of theory is holographically described by brane-localized dark matter interacting with bulk fields in a slice of 5D anti-de Sitter space. The long-range potential in this scenario depends on a non-integer power of the spatial separation, in contrast to the Yukawa potential generated by the exchange of a single 4D mediator. The resulting self-interaction cross section scales like a non-integer power of velocity. We identify the Born, classical and resonant regimes and investigate them using state-of-the-art numerical methods. We demonstrate the viability of our continuum-mediated framework to address the astrophysical small-scale structure anomalies. Investigating the continuum-mediated Sommerfeld enhancement, we demonstrate that a pattern of resonances can occur depending on the non-integer power. We conclude that continuum mediators introduce novel power-law scalings which open new possibilities for dark matter self-interaction phenomenology.

A preprint version of the article is available at ArXiv.

The creation, transfer, and stabilization of localized excitations are studied in a donor–acceptor Frenkel exciton model in an atomistic treatment of reduced-size double quantum dots (QDs) of various sizes. The explicit time-dependent dynamics simulations carried out by hybrid time-dependent density functional theory/configuration interaction show that laser-controlled hole trapping in stacked, coupled germanium/silicon quantum dots can be achieved by a UV/IR pump–dump pulse sequence. The first UV excitation creates an exciton localized on the topmost QD and after some coherent transfer time, an IR pulse dumps and localizes an exciton in the bottom QD. While hole trapping is observed in each excitation step, we show that the stability of the localized electron depends on its multiexcitonic character.

In principle, any pitch-shifting technique may be employed, provided that the frequency-dependent parameters analysed from the ultrasonic sound-field are mapped correctly to the frequency scale of the pitch-shifted signal. Since the spatial parameters are averaged over frequency in the currently employed configuration of the device, the frequency mapping is not required in this case. Instead, each time frame of the pitch shifted signal is spatialised according to a frequency-averaged direction. The pitch-shifting technique used for the application targeted in this article should be capable of large pitch-shifting ratios, while also operating within an acceptable latency. Based on these requirements, the phase-vocoder approach15,16 was selected for the real-time rendering in this study, due to its low processing latency and acceptable signal quality with large pitch-shifting ratios. However, the application of other pitch-shifting methods is also demonstrated with recordings processed off-line and described in the Results section.

In summary, the proposed processing approach permits frequency-modified signals to be synthesised with plausible binaural and monaural cues, which may subsequently be delivered to the listener to enable the localisation of ultrasonic sound sources. Furthermore, since the super-hearing device turns with the head of the listener, and the processing latency of the device was constrained to 44 ms, the dynamic cues should also be preserved. Note that the effect of processing latency has been previously studied in the context of head-tracked binaural reproduction systems, where it has been found that a system latency above 50–100 ms can impair the spatial perception17,18. Therefore, it should be noted that a trade-off must be made between: attaining high spatial image and timbral quality (which are improved through longer temporal windows and a higher level of overlapping) and having low processing latency (which relies on shorter windows and reduced overlapping). The current processing latency has been engineered so that both the spatial image and audio quality after pitch-shifting, as determined based on informal listening, remain reasonably high.

One additional advantage of the proposed approach is that only a single signal is pitch shifted, which is inherently more computationally efficient than pitch-shifting multiple signals; as would be required by the three alternative suggestions described in the Introduction section. Furthermore, the imprinting of the spatial information onto the signal only after pitch-shifting, ensures that the directional cues reproduced for the listener are not distorted by the pitch-shifting operation. The requirements for the size of microphone array are also less stringent compared to the requirements for an Ambisonics-based system. In this work, an array with a diameter of 11 mm was employed, which has a spatial aliasing frequency of approximately 17 kHz. This therefore prohibits the use of Ambisonics for the ultrasonic frequencies with the present array. By contrast, the employed spatial parameter analysis can be conducted above the spatial aliasing frequency; provided that the geometry of the array is known and that the sensors are arranged uniformly on the sphere.

When you put these three factors together—the bounty of technological advances, the compressed restructuring timetable due to covid-19, and an economy finally running at full capacity—the ingredients are in place for a productivity boom. This will not only boost living standards directly, but also frees up resources for a more ambitious policy agenda.


AI and other digital technologies have been surprisingly slow to improve economic growth. But that could be about to change.