Sep 14, 2022

The AI Hardware Problem

Posted by in category: robotics/AI

▶ Check out Brilliant with this link to receive a 20% discount!

The millennia-old idea of expressing signals and data as a series of discrete states had ignited a revolution in the semiconductor industry during the second half of the 20th century. This new information age thrived on the robust and rapidly evolving field of digital electronics. The abundance of automation and tooling made it relatively manageable to scale designs in complexity and performance as demand grew. However, the power being consumed by AI and machine learning applications cannot feasibly grow as is on existing processing architectures.

In a digital neural network implementation, the weights and input data are stored in system memory and must be fetched and stored continuously through the sea of multiple-accumulate operations within the network. This approach results in most of the power being dissipated in fetching and storing model parameters and input data to the arithmetic logic unit of the CPU, where the actual multiply-accumulate operation takes place. A typical multiply-accumulate operation within a general-purpose CPU consumes more than two orders of magnitude greater than the computation itself.

Their ability to processes 3D graphics requires a larger number of arithmetic logic units coupled to high-speed memory interfaces. This characteristic inherently made them far more efficient and faster for machine learning by allowing hundreds of multiple-accumulate operations to process simultaneously. GPUs tend to utilize floating-point arithmetic, using 32 bits to represent a number by its mantissa, exponent, and sign. Because of this, GPU targeted machine learning applications have been forced to use floating-point numbers.

These dedicated AI chips are offer dramatically larger amounts of data movement per joule when compared to GPUs and general-purpose CPUs. This came as a result of the discovery that with certain types of neural networks, the dramatic reduction in computational precision only reduced network accuracy by a small amount. It will soon become infeasible to increase the number of multiply-accumulate units integrated onto a chip, or reduce bit-precision further.


Leave a reply