Workshop #1 (RP1)
Advanced AI Chip Architecture
"Next-Generation Memory and Computing Enabled by 3D Integration Technology"
Prof. Jianguo YAN (Fudan University)
Abstract
The rapid advancement of large language models and artificial intelligence has created unprecedented demands for computing density and memory bandwidth in next-generation architectures. Traditional integration faces severe “memory wall” bottlenecks, struggling to meet the requirements of high throughput and energy efficiency. 3D integration offers a crucial solution by vertically stacking chips with high-density interconnects. This talk focuses on 3D integration as an enabling approach and presents our team’s latest progress in high-bandwidth memory, near-memory computing architectures, and heterogeneous integration processes. We demonstrate the co-design achievements of 3D-stacked computing arrays and memory chip, effectively overcoming bandwidth and power efficiency constraints, thereby laying a solid foundation for next-generation high-performance, low-energy computing systems.
Prof. Jianguo YANG (Fudan University)
"An Energy-Efficient Accelerator for PNN-based Semantic LiDAR-SLAM and 3D Interaction in Autonomous Vehicles"
Prof. Kyuho LEE (Yonsei University)
Abstract
Emerging mobile robots require SLAM systems with advanced 3-D perception and long-range 360-degree interaction. While LiDAR offers precise depth and environmental robustness, real-time SoC implementation of semantic LiDAR SLAM remains challenging due to the
memory-intensive and compute-intensive nature of executing simultaneous algorithms, which often overwhelms even high-performance CPU+GPU setups. This talk presents the LSPU, a fully-integrated, real-time semantic LiDAR SLAM processor featuring the LP-SLAM system. It provides PNN-based 3-D segmentation, localization, and mapping simultaneously through four key architectural innovations:
1. K-Nearest Neighbor Cluster: Uses 2-D/3-D spherical coordinate-based bin searching and dynamic memory allocation to eliminate external memory access.
2. PNN Engine: Features a global point-level task scheduler that maximizes core utilization via two-step workload balancing.
3. Keypoint Extraction Core: Optimizes sorting operations to skip redundant computations.
4. Optimization Cluster: Supports reconfigurable computation modes for keypoint-level pipelining and parallel processing in non-linear optimization. By integrating these features, the LSPU achieves the real-time performance necessary for autonomous driving systems that was previously unattainable in conventional architectures.
Prof. Kyuho LEE (Yonsei University)