Doctoral Dissertation Oral Defense, Shaoyi Huang

Monday, July 8, 2024 2:00–3:00 PM

Description
Abstract In recent years, significant advancements in artificial intelligence have been driven by the development of Deep Neural Networks (DNNs) and Transformer-based models, including BERT, GPT-3, and other Large Language Models (LLMs). These technologies have catalyzed innovations in various fields such as autonomous driving, recommendation systems, and chatbot applications. The models are increasingly designed with deeper, more complex structures and require larger computational resources. As computational demands escalate, model sparsification has emerged as a promising method to reduce model size and computational load during execution. Given the evolution of high-performance computing platforms, particularly advanced GPUs, end-to-end DNNs runtime speedup with model sparsification is an ideal but difficult goal due to the intricacies involved in sparsity which may need the change of matrix and kernel settings. In this proposal, I will present my works in model inference and training acceleration from both algorithm and hardware levels. It mainly focuses on three innovative aspects: (1) an advanced sparse progressive pruning method which show for the first time that reducing the risk of overfitting can help the effectiveness of pruning on language models; (2) an novel self-attention architecture with attention-specific primitives and an attention-aware pruning design for Transformer-based models inference acceleration; (3) our recent works on sparse training via weights importance exploitation and weights coverage exploration which unlock the sparsity potential and enable the different CNN and GNN models to achieve extremely high sparsity and it's application on spiking neural network.
Website
https://events.uconn.edu/engineering/event/67167-doctoral-dissertation-oral-defense-shaoyi-huang
Categories
Conferences & Speakers

More from Master Calendar

View all