About

I’m a senior engineer at NVIDIA, focused on accelerating Deep Learning training applications on modern GPUs. My current research interests include High Performance Computing, Artificial Intelligence, and Computer Architecture. I’m actively working on pushing the performance of hot DL models to industry-leading levels, including fields like Self-Driving Cars, Diffusion Models, large-scale training, etc. In my spare time, I’m a keen tool developer in the fields like Python, CUDA, and PyTorch, targeting improving the efficiency of both daily work and Deep Learning.

Part of Deep Learning models for training I optimized that have been published for public users:

MLPerf Stable Diffusion (2023): MLPerf Training Benchmark Suite Results, round v3.1;
SE(3)-Transformer (2022): DGLPyTorch/DrugDiscovery/SE3Transformer;
EfficientNet and EfficientDet series (2020-2021): TensorFlow2/Classification/ConvNets, PyTorch/Detection/Efficientdet;
MLPerf GNMT (2018-2020): MLPerf Training Benchmark Suite Results, round v0.5, v0.6, v0.7;

Part of key Deep Learning building blocks I developed/researched that have been published for public users:

Focal loss (2021): focal_loss;
Distributed optimizer (2019): DistributedFusedAdam;
Softmax cross entropy and label smoothing (2019): xentropy;

You can contact me at Zhihu, LinkedIn or by leaving any comments.