Workshop #4

Hardware-Software Co-design

"Automatic Optimization for Efficient LLM Systems"

Prof. Bingsheng HE (National University of Singapore)

Abstract

Modern LLM systems are bottlenecked not by a single layer, but by mismatches across compute, memory, and hardware. This talk explores how automatic optimization—across arithmetic units, data representation, and system execution—can close these gaps. We present a unified MAC design for mixed precision, near-optimal lossless compression for model weights, and Mars Compute, an agentic cross-platform kernel optimization engine spanning GPUs and TPUs, achieving substantial improvements in efficiency, utilization, and scalability.

Prof. Bingsheng HE (National University of Singapore)

"The Physical Intelligence of Legged Robots"

Prof. Peng LU (The University of Hong Kong)

Abstract

Unlike traditional wheeled robots, legged robots leverage sophisticated reinforcement learning algorithms and compliant structures that enable them to navigate challenging environments, mimic biological locomotion, and respond dynamically to external disturbances. This talk will introduce several developments in legged robots. The talk will emphasize how to improve the agility, dexterity, stability, and safety of these robots such that they can better adapt to the environment, perform more complex tasks, protect themselves while executing tasks under failure situations, and do not harm the surroundings.

Prof. Peng LU (HKU)

"Hybrid Foundation Models: Towards Robust and Scalable Agentic Systems"

Dr. Abbas RAHIMI (IBM Zurich Research Laboratory)

Abstract

Foundation models achieve strong in-distribution performance, yet their reliance on attention mechanism leads to high computational cost and fragility under out-of-distribution conditions. A promising path forward is hybridization by augmenting attention with complementary generative or sequential modules that improve efficiency and robustness without sacrificing quality. In this talk, we highlight two such directions: 1) We introduce structured sparse transition matrices in state space models (SSMs), enabling both formal and natural state tracking with superior length generalization at linear complexity. 2) We present soft-masking diffusion blocks, which selectively preserve informative regions during masked diffusion, leading to faster denoising. We further discuss integration and hardware-aware optimization of these components into scalable hybrid foundation models, with particular emphasis on coding and agentic applications.

Dr. Abbas RAHIMI (IBM Zurich Research Laboratory)

"Towards Efficient LLM Inference with Speculative Computation"

Prof. Meng LI (Peking University)

Abstract

The rapid development of large language models (LLMs), e.g., ChatGPT, has brought significant technological innovations to fields such as Natural Language Processing (NLP) and multi-modal AI. However, the autoregressive decoding nature of LLMs, together with the scaling law, leads to severe memory and bandwidth bottlenecks, particularly for emerging architectures and agent paradigms. Speculative computation provides a viable way to reduce decoding iterations and significantly mitigate these bandwidth bottlenecks by decoupling draft generation from target verification.

Prof. Meng LI (Peking University)

Back