Skip to main content

Deep Learning

These notes follow Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola's Dive into Deep Learning and emphasize the book's central style: concepts, mathematics, and runnable code together. The path starts with tensors, data preparation, linear algebra, calculus, probability, and automatic differentiation, then builds complete training loops for regression and classification before moving to modern architectures.

An artificial neural network diagram shows input, hidden, and output layers connected by weights.

Figure: Layered neural networks make differentiable function approximation visible. Image: Wikimedia Commons, Cburnett, CC BY-SA 3.0/GFDL.

A grid of MNIST handwritten digits shows the small grayscale examples used in many ML tutorials.

Figure: MNIST gives classification, vision, and neural-network pages a familiar benchmark image. Image: Wikimedia Commons, Suvanjanprasai, CC BY-SA 4.0.

A simplified neural network diagram shows units connected from input to output.

Figure: A compact network diagram gives deep-learning pages a quick visual model of learned weights. Image: Wikimedia Commons, Dake and Mysid, CC BY 1.0.

The later pages cover the main deep learning families: multilayer perceptrons, convolutional networks, recurrent networks, attention, transformers, NLP applications, computer vision systems, recommender systems, GANs, reinforcement learning, Gaussian processes, and hyperparameter optimization. Code examples use PyTorch for portability. For classical context, compare these notes with machine learning; for prerequisites, see linear algebra and probability.

This overview diagram shows the repeated contract behind the chapter sequence. Data enters as shaped tensors, a differentiable model produces task-specific predictions, a scalar loss drives automatic differentiation, and an optimizer updates parameters for the next minibatch. The validation loop is separate from the gradient path because evaluation should measure behavior rather than train the model.

  1. Tensors and Data Preprocessing
  2. Math for Deep Learning
  3. Linear Regression and Training Loops
  4. Softmax Classification and Generalization
  5. Multilayer Perceptrons and Regularization
  6. PyTorch Builders Guide
  7. Convolutional Neural Networks
  8. Modern CNNs
  9. Sequence Modeling and RNNs
  10. Gated RNNs and Sequence-to-Sequence
  11. Attention and Transformers
  12. Efficient Sequence Modeling
  13. Pretrained Transformers and BERT
  14. Optimization Algorithms
  15. Computational Performance
  16. Computer Vision Applications
  17. NLP Pretraining and Applications
  18. Generative Adversarial Networks
  19. Recommender Systems
  20. Reinforcement Learning and Bayesian Tuning