|

About

I’m a performance engineer focused on squeezing every last cycle out of GPU workloads. My interests span high-performance computing, deep learning optimization, and computer architecture โ€” pushing state-of-the-art models to peak performance across speech recognition, machine translation, image classification, and generative AI.

This blog is where I document my learning notes in my spare time: GPU performance insights, source code deep dives, and hard-won optimization techniques. I also build developer tools in Python, CUDA, and PyTorch to make deep learning research faster and more productive.

๐Ÿš€ Deep Learning Models Link to heading

Selected training optimizations I’ve contributed to:

๐Ÿ”ง Open Source Contributions Link to heading

Key deep learning building blocks I’ve developed:

๐Ÿ“ฌ Contact Link to heading

Feel free to reach out via Zhihu, LinkedIn, or leave a comment on any post.


The views and opinions expressed in this blog are those of my own and do not represent those of my employer, NVIDIA.