Workshop #2 (RP1)

Computing-in-Memory for Large-scale AI Models: Circuits to System

"Toward Reliable and Cost-effective Compound AI Systems"

Dr. Abbas RAHIMI (IBM Zurich Research Laboratory)

Abstract

Advances in AI today are undeniably impressive, yet modern AI solutions still falter when faced with out-of-distribution data, even as computational and energy demands soar. This challenge calls for a paradigm shift, particularly for power-constrained, safety-critical applications. To address this, we demonstrate how designing compound AI systems can ensure reliable operation while optimizing both cost and scalability. These systems seamlessly orchestrate neural networks with robust tools and external verifiers, enabling highly integrated and unified neural-symbolic computations and interfaces that collectively reduce cost and improve scalability. Crucially, the compound AI system’s realization can be informed and benefitted by unconventional physical properties of emerging hardware technologies by leveraging in-memory computing. This synergy unlocks powerful computational primitives, allowing compound AI systems to excel in learning and complex reasoning, even under out-of-distribution conditions, with unprecedented computational and energy efficiency.

Dr. Abbas RAHIMI (IBM Zurich Research Laboratory)

"Scalable Compute-in/near-Memory Systems with 2.5D/3D/3.5D Integration"

Prof. Chixiao CHEN (Fudan University (FDU))

Abstract

The rapid expansion of large AI models, such as ChatGPT, has strained traditional computing architectures reliant on von Neumann-based GPUs and CPUs. To break the memory wall, memory-centric architectures like Computing in/near Memory (CIM/PNM) have emerged, integrating computation and memory to reduce latency and energy consumption. This talk explores the scaling method of CIM through 2.5D/3D/3.5D heterogeneous integration using adavanced packaging. For example, active interposers enable high-density vertical stacking of memory and compute units, bypassing reliance on transistor scaling. This architecture promises enhanced bandwidth, reduced interconnect delays, and scalable performance for AI workloads. The discussion will address architectural and circuit-level challenges in designing active interposer-based systems. Practical insights from prototype implementations will be shared, highlighting performance benchmarks and energy efficiency gains. By bridging memory and computation in 3D space, this approach offers a viable path to sustaining scaling-law advancements in the post-transistor-scaling era, with implications for AI infrastructure, edge computing, and high-performance systems.

Prof. Chixiao CHEN (Fudan University (FDU))

"Transforming AI: The Impact of Computing-in-Memory on Future Technologies"

Prof. Tony Tae-Hyoung KIM (Nanyang Technological University (NTU))

Abstract

The recent development in neural networks has required massive data transfer between memory and processing elements for data processing. This heavy data transfer leads to substantial energy overhead and limits the overall performance of the neural networks. Computing-in-memory (CIM) has attracted the research community’s attention because of the significant energy efficiency improvement by minimizing the energy-hungry data transfer. CIM designs can employ either analog computing or digital computing, while each has its pros and cons. In this talk, I will present the basics of CIM design and various challenges. After that, several state-of-the-art CIM macros will be introduced. I will also discuss the pros and cons of analog and digital CIM macros.

Prof. Tony Tae-Hyoung KIM (Nanyang Technological University (NTU))

Back