Workshop #2

Circuit and Architecture Design Techniques for AI Applications

"In-sensor Computing for Artificial Vision"

Prof. Yang CHAI (PolyU)

Abstract

According to the projection by Semiconductor Research Corporation and Semiconductor Industry Association, the number of sensor nodes exponentially increases with the development of the Internet of Things. By 2032, the number of sensors is expected to be ~45 trillion, which will generate >1 million zettabytes (1027 bytes) of data per year. The massive data from sensor nodes obscure valuable information that we need it most. Abundant data movement between sensor and processing unit greatly increases power consumption and time latency, which poses grand challenges for the power-constraint and widely distributed sensor nodes in the Internet of Things. Therefore, it urgently requires a computation paradigm that can efficiently process information near or inside sensors, eliminate redundant data, reduce frequent data transfer, and enhance data security and privacy. We propose bioinspired in-sensor computing paradigm to reduce data transfer and decrease the high computing complexity by processing data locally. In this talk, we will discuss the hardware implementation of the in-sensor computing paradigms at the device and array levels. We will illustrate the physical mechanisms that lead to unique sensory response characteristics and their corresponding computing functions. In particular, bioinspired device characteristics enable the fusion of the sensor and computation functionalities, providing a way for intelligent information processing with low power consumption.

Reference

Nature Electronics, 2022, 5, 84-91
Nature, 2022, 602, 364
Nature Electronics, 2020, 3, 664-671
Nature, 2020, 579, 32-33
Nature Nanotechnology, 2019, 14, 776-782
Nature Electronics, 2022, 5, 483-484
Nature Nanotechnology, 2023, DOI: https://doi.org/10.1038/s41565-023-01379-2

Prof. Yang CHAI (PolyU)

"Digital Computing-In-Memory Architecture and Design Automation"

Prof. Fengbin TU (HKUST)

Abstract

With the increasing size of AI models, AI chips usually suffer from massive data movements between compute and memory. Computing-In-Memory (CIM) eliminates this bottleneck by integrating compute into memory, which has proved to be a promising architecture for energy-efficient AI chips. However, the mainstream analog CIM architectures have limited accuracy due to non-ideal issues. In recent years, digital CIM emerges with a good balance of efficiency and accuracy, and has been applied to many modern AI applications such as Transformers and recommendation models. In this talk, Dr. Tu will first discuss the advances in digital CIM architecture, and then introduce AutoDCIM, the first automated DCIM compiler. AutoDCIM can generate digital CIM macros according to user-defined architecture parameters with an optimized layout that best satisfies the given hardware constraints. With the growing interest in digital CIM, AutoDCIM will play an important role in developing an ecosystem for digital CIM-based AI computing.

Reference

[DAC'23] J. Chen, F. Tu, K. Shao, F. Tian, X. Huo, C.-Y. Tsui, K.-T. Cheng, "AutoDCIM: An Automated Digital CIM Compiler," Design Automation Conference (DAC), 2023.
[ISSCC'23] F. Tu, Y. Wang, Z. Wu, W. Wu, L. Liu, Y. Hu, S. Wei, S. Yin, "TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-based Beyond-NN Acceleration," International Solid-State Circuits Conference (ISSCC), 2023.
[ISSCC'22] F. Tu, Y. Wang, Z. Wu, L. Liang, Y. Ding, B. Kim, L. Liu, S. Wei, Y. Xie, S. Yin, "A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise in-Memory Booth Multiplication for Cloud Deep Learning Acceleration," International Solid-State Circuits Conference (ISSCC), 2022.

Prof. Fengbin TU (HKUST)

"Mixing Signed Digit Representations for Low-bitwidth DNN Inference on FPGAs"

Prof. Hayden SO (HKU)

Abstract

While quantizing deep neural network (DNN) weights to 8-bit fixed point representations has become the de facto technique in modern inference accelerator designs, the quest to further improve hardware efficiency by reducing the bitwidth remains on-going. In this talk, a novel restricted signed digit (RSD) representation of weights is introduced that enables hardware-efficient low bitwidth DNN inference that maintains accuracy on par with or exceed equivalent network quantized to standard 8-bit fixed point (int8). RSD utilizes ternary signed digits to represent fixed point numbers and can be flexibly restricted to employ only a small number of effectual bits. This flexibility allows efficient bit-serial operations to be implemented in hardware that takes advantage of bit-level sparsity to reduce latency of bis-serial operations. In addition, a system level optimization framework is developed that allows RSD weights with bit-serial operations to operate in tandem with traditional int8 weights that operate with existing bit-parallel hardware on FPGAs. Experiments show that the proposed mixed signed digit (MSD) framework can achieve a 1.23x speedup on the ResNet-18 model over the state-of-the-art, and a remarkable 4.91% higher accuracy on MobileNet-V2 when compared to equivalent state-of-the-art int8 implementations.

Prof. Hayden SO (HKU)

Back