I’m a senior engineer at NVIDIA, focused on accelerating Deep Learning training applications on modern GPUs. My current research interests include High Performance Computing, Artificial Intelligence, and Computer Architecture. I’m actively working on pushing the performance of hot DL models to industry-leading levels, including fields like Self-Driving Cars, Diffusion Models, large-scale training, etc. In my spare time, I’m a keen tool developer in the fields like Python, CUDA, and PyTorch, targeting improving the efficiency of both daily work and Deep Learning.

Part of Deep Learning models for training I optimized that have been published for public users:

Part of key Deep Learning building blocks I developed/researched that have been published for public users:

You can contact me at Zhihu, LinkedIn or by leaving any comments.