About
I am a third-year Ph.D. student in Microelectronics Thrust at the Hong Kong University of Science and Technology (Guangzhou), supervised by Prof. Jiayi Huang and Prof. Xiaowen Chu. My research interests lie in the fields of ML Systems, Efficient AI and HPC, with a special focus on the system-algorithm co-design of Mixture-of-Experts.
Awards
2025 - đ HKUST(GZ) FUNH Deanâs Award for PhD Research Excellence 2025/26, Merit Prize
2021 - đ„ The 5th âSunway Cupâ China Parallel Application Challenge on Domestic CPU (CPC2021), National Champion
2021 - đ„ The 9th âIntel Cupâ Parallel Application Challenge (PAC2021), Third Prize
Publications
[MLSys â26] FlexTrain: Scalable HybridâParallel Training with Elastic Resource Utilization and Consistent Accuracy
Weilin Cai*, Diandian Gu*, Baoquan Zhong*, Jun Wang*, Zhuolin Zheng*, Gaohong Liu, Kaihua Jiang, Shuguang Wang, Wencong Xiao, Jiayi Huang (*: Equal contribution)
Ninth Annual Conference on Machine Learning and Systems, May 2026.
[ICLR â26] CapacityâAware Inference: Mitigating the Straggler Effect in Mixture of Experts
Shwai He, Weilin Cai, Jiayi Huang, Ang Li
The Fourteenth International Conference on Learning Representations, April 2026.
[MICRO â25] Optimizing All-to-All Collective Communication with Fault Tolerance on Torus Networks
Le Qin, Junwei Cui, Weilin Cai, Meng Niu, Yan Yang, Jiayi Huang
IEEE/ACM International Symposium on Microarchitecture, October 2025.
[ICML â25] Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts
Weilin Cai, Juyong Jiang, Le Qin, Junwei Cui, Sunhung Kim, Jiayi Huang
Forty-Second International Conference on Machine Learning, July 2025.
Impact: Adopted in Meituanâs LongCat-Flash language model
[ISCA â25] Chimera: Communication Fusion for Hybrid Parallelism in Large Language Models
Le Qin, Junwei Cui, Weilin Cai, Jiayi Huang
ACM/IEEE International Symposium on Computer Architecture, June 2025.
Distinguished Artifact Award
[ASPLOS â25] MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training
Weilin Cai, Le Qin, Jiayi Huang
ACM International Conference on Architectural Support for Programming Languages and Operating Systems, April 2025.
[TKDE â25] A Survey on Mixture of Experts in Large Language Models
Weilin Cai*, Juyong Jiang*, Fan Wang*, Jing Tang, Sunghun Kim, Jiayi Huang (*: Equal contribution)
IEEE Tranactions on Knowledge and Data Engineering, 37(7):3896â3915, March 2025.
[NPC â22] Flexible Supervision System: A Fast FaultâTolerance Strategy for Cloud Applications in CloudâEdge Collaborative Environments
Weilin Cai, Heng Chen, Zhimin Zhuo, Ziheng Wang, Ninggang An
IFIP International Conference on Network and Parallel Computing, 2022.
[TJSC â22] Implementation and optimization of ChaCha20 stream cipher on sunway taihuLight supercomputer
Weilin Cai, Heng Chen, Ziheng Wang, Xingjun Zhang
The Journal of Supercomputing pp. 4199â4216. Springer, 2022.
Preprints
[ArXiv â25] DualSparseâMoE: Coordinating Tensor/NeuronâLevel Sparsity with Expert Partition and Reconstruction
Weilin Cai, Le Qin, Shwai He, Junwei Cui, Ang Li, Jiayi Huang
arXiv preprint arXiv:2508.18376. 2025.
[ArXiv â25] Accelerating MixtureâofâExperts Inference by Hiding Offloading Latency with Speculative Decoding
Zhibin Wang, Zhonghui Zhang, Yuhang Zhou, Zibo Wang, Mo Zhou, Peng Jiang, Weilin Cai, Chengying Huan, Rong Gu, Sheng Zhong, Chen Tian
arXiv preprint arXiv:2508.21706. 2025.
