Tips and Tricks
Getting Started at CAI (Students)
Some places to start learning AI & ML by Thilo Stadelmann
A Recipe for Training Neural Networks by Andrej Karpathy
Understanding Deep Learning (Book) by Simon J.D. Prince
Advice for (prospective) students by Thilo Stadelmann
When to publish one’s work by Thilo Stadelmann
Writing a draft by Thilo Stadelmann
Great methodology delivers great theses by Thilo Stadelmann
For the PhD students: - A Survival Guide to a PhD by Andrej Karpathy - Just know stuff. (Or, how to achieve success in a machine learning PhD.) by Patrick Kidger
Deep Learning: Advanced
Improve model speed and/or performance:
Speed up your model: Making Deep Learning Go Brrrr From First Principles by He Horace
PyTorch Performance Tuning Guide (Video): PyTorch Performance Tuning Guide by Szymon Migacz (NVIDIA), some notes from this video below:
DataLoader
has bad default settings, tunenum_workers > 0
and default topin_memory = True
use
torch.backends.cudnn.benchmark = True
to autotune cudnn kernel choicemax out the batch size for each GPU to ammortize compute
do not forget
bias = False
in weight layers beforeBatchNorm
layers, it’s a noop that bloats modeluse
for p in model.parameters: p.grad = None
instead ofmodel.zero_grad()
careful to disable debug APIs in production (detect_anomaly, profiler, emit_nvtx, gradcheck, …)
use
DistributedDataParallel
notDataParallel
, even if not running distributedcareful to load balance compute on all GPUs if variably-sized inputs or GPUs will idle
use an
apex
fused optimizer (default PyTorch optimizer for loop iterates individual parameters, yikes)use checkpointing to recompute memory-intensive compute-efficient ops in backward pass (e.g. activations, upsampling, …)
use
@torch.jit.script
, e.g. to fuse long sequences of point wise operations like inGELU
A good example how to train efficient (i.e. train GPT-2 on our A-100 cluster in 38 hours with 300 lines of code): nanoGPT by Andrej Karpathy
How to tune your deep learning models (maximize performance, hyperparameter tuning): Deep Learning Tuning Playbook by Godbole et al. (Google Brain, Harvard)
PyTorch Profilers: Introducing PyTorch Profiler - the new and improved performance tool, PyTorch Trace Analysis for the Masses, and Performance Debugging of Production PyTorch Models at Meta