Tips and Tricks

Getting Started at CAI (Students)

For the PhD students: - A Survival Guide to a PhD by Andrej Karpathy - Just know stuff. (Or, how to achieve success in a machine learning PhD.) by Patrick Kidger

Deep Learning: Advanced

Improve model speed and/or performance:

  • Speed up your model: Making Deep Learning Go Brrrr From First Principles by He Horace

  • PyTorch Performance Tuning Guide (Video): PyTorch Performance Tuning Guide by Szymon Migacz (NVIDIA), some notes from this video below:

    • DataLoader has bad default settings, tune num_workers > 0 and default to pin_memory = True

    • use torch.backends.cudnn.benchmark = True to autotune cudnn kernel choice

    • max out the batch size for each GPU to ammortize compute

    • do not forget bias = False in weight layers before BatchNorm layers, it’s a noop that bloats model

    • use for p in model.parameters: p.grad = None instead of model.zero_grad()

    • careful to disable debug APIs in production (detect_anomaly, profiler, emit_nvtx, gradcheck, …)

    • use DistributedDataParallel not DataParallel, even if not running distributed

    • careful to load balance compute on all GPUs if variably-sized inputs or GPUs will idle

    • use an apex fused optimizer (default PyTorch optimizer for loop iterates individual parameters, yikes)

    • use checkpointing to recompute memory-intensive compute-efficient ops in backward pass (e.g. activations, upsampling, …)

    • use @torch.jit.script, e.g. to fuse long sequences of point wise operations like in GELU

  • A good example how to train efficient (i.e. train GPT-2 on our A-100 cluster in 38 hours with 300 lines of code): nanoGPT by Andrej Karpathy

  • How to tune your deep learning models (maximize performance, hyperparameter tuning): Deep Learning Tuning Playbook by Godbole et al. (Google Brain, Harvard)

  • PyTorch Profilers: Introducing PyTorch Profiler - the new and improved performance tool, PyTorch Trace Analysis for the Masses, and Performance Debugging of Production PyTorch Models at Meta