Computing scheme accelerates machine learning while improving energy efficiency of traditional data operations

Computing scheme accelerates machine learning while improving ...

Artificial intelligence (AI) models like ChatGPT run on algorithms and have great appetites for data, which they process through machine learning, but what about the limits of their data-processing abilities? Researchers led by Professor Sun Zhong from Peking University’s School of Integrated Circuits and Institute for Artificial Intelligence set out to solve the von Neumann bottleneck that limits data-processing.

In their paper published in the journal Device on September 12, 2024, the team developed the dual-IMC (in-memory computing) scheme, which not only accelerates the machine learning process, but also improves the energy efficiency of traditional data operations.

When curating algorithms, software engineers and computer scientists rely on data operations known as matrix-vector multiplication (MVM), which supports neural networks. A neural network is a computing architecture often found in AI models that mimics the function and structure of a human brain.

As the scale of datasets grows rapidly, computing performance is often limited by data movement and speed mismatch between processing and transferring data. This is known as the von Neumann bottleneck. The conventional solution is a single in-memory computing (single-IMC) scheme, in which neural network weights are stored in the memory chip while input (such as images) is provided externally.

However, the caveat to the single-IMC is the switch between on-chip and off-chip data transportation, as well as the use of digital-to-analog converters (DACs), which cause a large circuit footprint and high power consumption.

New computing scheme could enhance machine learning, facilitate breakthroughs in AI


Dual in-memory computing enables fully in-memory MVM operations. © Device (2024). DOI: 10.1016/j.device.2024.100546

To fully realize the potential of the IMC principle, the team developed a dual-IMC scheme that stores both the weight and input of a neural network in the memory array, thus performing data operations in a fully in-memory manner.

The team then tested the dual-IMC on resistive random-access memory (RRAM) devices for signal recovery and image processing. These are some benefits of the dual-IMC scheme when applied to MVM operations:

Greater efficiency is achieved due to fully in-memory computations, which saves time and energy caused by off-chip dynamic random-access memory (DRAM) and on-chip static random-access memory (SRAM)
Computing performance is optimized as data movement, which was a limiting factor, is eliminated through a fully in-memory manner.
Lower production cost due to the elimination of DACs, which are required in the single-IMC scheme. This also means saving on chip area, computing latency and power requirements.

With a rapidly growing demand for data-processing in today’s digital era, the discoveries made in this research could bring about new breakthroughs in computing architecture and artificial intelligence.

More information:
Shiqing Wang et al, Dual in-memory computing of matrix-vector multiplication for accelerating neural networks, Device (2024). DOI: 10.1016/j.device.2024.100546

Provided by
Peking University

Citation:
Computing scheme accelerates machine learning while improving energy efficiency of traditional data operations (2024, September 26)

Subscribe
Don't miss the best news ! Subscribe to our free newsletter :