Tips and Tricks

Getting Started at CAI (Students)

Some places to start learning AI & ML by Thilo Stadelmann
A Recipe for Training Neural Networks by Andrej Karpathy
Understanding Deep Learning (Book) by Simon J.D. Prince
Advice for (prospective) students by Thilo Stadelmann
When to publish one’s work by Thilo Stadelmann
Writing a draft by Thilo Stadelmann
Great methodology delivers great theses by Thilo Stadelmann

For the PhD students: - A Survival Guide to a PhD by Andrej Karpathy - Just know stuff. (Or, how to achieve success in a machine learning PhD.) by Patrick Kidger

Deep Learning: Advanced

Improve model speed and/or performance:

Speed up your model: Making Deep Learning Go Brrrr From First Principles by He Horace
PyTorch Performance Tuning Guide (Video): PyTorch Performance Tuning Guide by Szymon Migacz (NVIDIA), some notes from this video below:
- DataLoader has bad default settings, tune num_workers > 0 and default to pin_memory = True
- use torch.backends.cudnn.benchmark = True to autotune cudnn kernel choice
- max out the batch size for each GPU to ammortize compute
- do not forget bias = False in weight layers before BatchNorm layers, it’s a noop that bloats model
- use for p in model.parameters: p.grad = None instead of model.zero_grad()
- careful to disable debug APIs in production (detect_anomaly, profiler, emit_nvtx, gradcheck, …)
- use DistributedDataParallel not DataParallel, even if not running distributed
- careful to load balance compute on all GPUs if variably-sized inputs or GPUs will idle
- use an apex fused optimizer (default PyTorch optimizer for loop iterates individual parameters, yikes)
- use checkpointing to recompute memory-intensive compute-efficient ops in backward pass (e.g. activations, upsampling, …)
- use @torch.jit.script, e.g. to fuse long sequences of point wise operations like in GELU
A good example how to train efficient (i.e. train GPT-2 on our A-100 cluster in 38 hours with 300 lines of code): nanoGPT by Andrej Karpathy
How to tune your deep learning models (maximize performance, hyperparameter tuning): Deep Learning Tuning Playbook by Godbole et al. (Google Brain, Harvard)
PyTorch Profilers: Introducing PyTorch Profiler - the new and improved performance tool, PyTorch Trace Analysis for the Masses, and Performance Debugging of Production PyTorch Models at Meta