Sections
Text Area

Workshop #4

 Hardware-Software Co-design

"Automatic Optimization for Efficient LLM Systems"

 Prof. Bingsheng HE (National University of Singapore)


Abstract

Modern LLM systems are bottlenecked not by a single layer, but by mismatches across compute, memory, and hardware. This talk explores how automatic optimization—across arithmetic units, data representation, and system execution—can close these gaps. We present a unified MAC design for mixed precision, near-optimal lossless compression for model weights, and Mars Compute, an agentic cross-platform kernel optimization engine spanning GPUs and TPUs, achieving substantial improvements in efficiency, utilization, and scalability. 

 

Image
Prof. Bingsheng HE

Prof. Bingsheng HE (National University of Singapore)

"The Physical Intelligence of Legged Robots"

 Prof. Peng LU (The University of Hong Kong)


Abstract

Unlike traditional wheeled robots, legged robots leverage sophisticated reinforcement learning algorithms and compliant structures that enable them to navigate challenging environments, mimic biological locomotion, and respond dynamically to external disturbances. This talk will introduce several developments in legged robots. The talk will emphasize how to improve the agility, dexterity, stability, and safety of these robots such that they can better adapt to the environment, perform more complex tasks, protect themselves while executing tasks under failure situations, and do not harm the surroundings.

 

Image
Prof. Peng LU (HKU)

Prof. Peng LU (HKU)

"Towards Efficient LLM Inference with Speculative Computation"

 Prof. Meng LI (Peking University)


Abstract

The rapid development of large language models (LLMs), e.g., ChatGPT, has brought significant technological innovations to fields such as Natural Language Processing (NLP) and multi-modal AI. However, the autoregressive decoding nature of LLMs, together with the scaling law, leads to severe memory and bandwidth bottlenecks, particularly for emerging architectures and agent paradigms. Speculative computation provides a viable way to reduce decoding iterations and significantly mitigate these bandwidth bottlenecks by decoupling draft generation from target verification.

 

Prof. Meng LI (Peking University)

Text Area

Back