Workshop #2
Hardware Architectures and Designs for Machine Learning and Beyond
"Design Automation for Processing-in-Memory Architectures "
Prof. Xiaoming CHEN (CAS)
Abstract
In the past decade, processing-in-memory (PIM) architectures have widely been studied for deep neural network (DNN) acceleration. This technology is also gradually being commercialized and there are some emerging PIM products. Currently, the design of PIM accelerators and the deployment of DNN workloads are facing great challenges. The scale of DNN models, the diversity of PIM devices and micro-architectures, the huge design space, and the complexity of the deployment problem are far beyond the human design capability. Automatic tools are indispensable for designing PIM accelerators and deploying DNN models on PIM hardware. This talk will introduce a toolchain for PIM accelerators. The toolchain includes an architecture synthesizer, an algorithm compiler, and a system-level simulator. The synthesizer automatically generates PIM architectures under given constraints. During the synthesis process, design space is explored to optimize the performance and energy efficiency. The compiler converts an ONNX-described DNN model to an instruction stream for the auto-generated architecture. During the compilation process, workload partitioning, task mapping, and operator scheduling are optimized. The simulator estimates the performance, power dissipation, and energy consumption of the instruction stream running on the architecture. The toolchain can adapt to various devices, PIM architectures, and DNN models.
Prof. Xiaoming CHEN (CAS)
"RaDe-GS: Rasterizing Depth in Gaussian Splatting"
Prof. Ping TAN (HKUST)
Abstract
Gaussian Splatting (GS) has proved highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often revise the Gaussian primitives, which reduces both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. Our approach achieves a Chamfer distance error comparable to NeuraLangelo[Li et al. 2023] on the DTU dataset and similar training and rendering time as traditional Gaussian Splatting on the Tanks & Temples dataset. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods.
Prof. Ping TAN (HKUST)
"Stochastic Multivariate Universal-Radix Finite-State Machine: A New and Hardware-Friendly Architecture for Multivariate Nonlinear Function Approximation "
Prof. Ngai WONG (HKU)
Abstract
Nonlinear relationships are critical for modelling complex systems, but they often come at a cost of higher hardware complexity and computational overhead. To address this issue, stochastic computing has emerged as a promising solution that leverages probabilistic bitstreams to generate nonlinear functions with reduced hardware complexity and energy consumption. In this talk, we present SMURF, a novel and first-of-its-kind Stochastic Multivariate Universal-Radix Finite-state machine, that utilizes stochastic computing to approximate multivariate nonlinear functions at a tunable accuracy.
We will discuss the theoretical underpinnings of SMURF and provide practical insights into its hardware implementation. Moreover, we will compare SMURF's performance against conventional Taylor-series approximations and look-up table schemes through a series of experiments. Our results demonstrate that SMURF outperforms these schemes in terms of both accuracy and hardware simplicity.
Prof. Ngai WONG (HKU)