Sheng Zha

About

I lead the pretraining research team at Amazon AGI that supports Amazon Nova, focusing on principled scaling, model architecture, optimization, and novel pretraining objectives. My work centers on the co-evolution of algorithms and systems - the critical intersection that makes AI more capable, efficient, and accessible. My team also built the models behind Amazon Q, Titan, and the distributed training infrastructure powering Amazon Bedrock and SageMaker HyperPod.

I built this team from zero, starting in 2018 with a focus on distributed training and shared representations. What started as a small group of tech leaders and hackers grew into the engine behind foundation models serving millions through AWS.

Before that, I shaped the open-source AI ecosystem as VP and PMC Chair of Apache MXNet, where I co-authored the Gluon interface. I founded GluonNLP - the first toolkit to reproduce BERT with record-setting training speeds. I served on the ONNX Steering Committee and co-founded the Python Data API Standards Consortium. I believe accessible tools and open standards are essential for an AI future that benefits everyone.

Throughout this journey, I’ve maintained a core belief: AI should amplify human agency and ingenuity, not replace it. This principle guides my approach to both research and leadership. My leadership philosophy centers on coaching and enabling team members to grow into leaders themselves, creating a multiplier effect that has accelerated our innovation.

I hold an MS in Computer Science from the University of Maryland and a BS from Shanghai Jiao Tong University.

What I’ve Built

Amazon Nova & Foundation Model Stack

Amazon AGI, 2024–present

Leading the research team focused on pretraining that supports Amazon Nova. The team drives principled scaling, model architecture, optimization, and novel pretraining objectives - co-designing algorithms and systems to reduce the cost of intelligence across the stack.

Foundation Models for AWS AI Services

AWS, 2018–2024

Built the team and models from zero. Developed and deployed foundation models underpinning Amazon Q (CodeWhisperer), Titan, Lex, Comprehend, and Kendra.

Distributed Training Infrastructure

AWS, 2018–2023

Contributed core technology for scalable, fault-resilient training infrastructure including SageMaker HyperPod. Designed systems for training at scale with efficient resource utilization.

Apache MXNet

VP & PMC Chair, 2016–2023

Co-authored the Gluon API - an imperative, Pythonic interface that enabled just-in-time compilation for high performance without sacrificing usability. Led project maintenance and releases, project management, and community engagement as VP and PMC Chair.

20.8k GitHub stars

GluonNLP

Founded, 2018–2023

Created a deep-learning NLP toolkit for the Gluon interface. First to reproduce BERT with record-setting training speeds, accelerating NLP research across the community.

2.5k GitHub stars

ONNX

Steering Committee

Open standard for ML model interoperability. LFAI graduate project enabling framework-agnostic model deployment with hardware-optimized runtimes.

20.5k GitHub stars

Python Data API Standards

Founding Member

Consortium standardizing array APIs across NumPy, PyTorch, JAX, TensorFlow, CuPy, and Dask. Demonstrated 45x GPU speedup for scikit-learn via interoperability. Published at SciPy 2023.

ML Platform for Fraud Detection

Amazon TRMS, 2013–2015

Designed horizontally scalable machine learning platform and graph-based ML solutions for fraud and abuse detection. Built high-availability key-value stores with expressive transformation DSL for real-time feature engineering.

See all projects on GitHub →

About

What I’ve Built

Amazon Nova & Foundation Model Stack

Foundation Models for AWS AI Services

Distributed Training Infrastructure

Apache MXNet

GluonNLP

ONNX

Python Data API Standards

ML Platform for Fraud Detection

Selected Publications

DEM: Distribution Edited Model for Training with Mixed Data Distributions

Differentially Private Bias-term Fine-tuning of Foundation Models

HyTrel: Hypergraph-enhanced Tabular Data Representation Learning

Zero Redundancy Distributed Learning with Differential Privacy

Meta-learning via Language Model In-context Tuning

Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger

Differentially Private Optimization on Large Model at Small Cost

The Amazon Nova Family of Models: Technical Report and Model Card

Large Language Models of Code Fail at Completing Code with Potential Bugs

Sequence-level Large Language Model Training with Contrastive Preference Optimization

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Talks & Interviews

大模型训练的一线实践 (Frontline Practices in Large Model Training)

Accelerate the Bridging of ML and DL with NVIDIA-Accelerated Apache MXNet 2.0

Apache MXNet 2.0: Synergistic ML and DL with Standardization and High Performance

Apache MXNet 2.0: Bridging the Gap between DL and ML

Dive into Deep Learning for Natural Language Processing

From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

Deep Learning Lectures