Deep Learning
These notes follow Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola's Dive into Deep Learning and emphasize the book's central style: concepts, mathematics, and runnable code together. The path starts with tensors, data preparation, linear algebra, calculus, probability, and automatic differentiation, then builds complete training loops for regression and classification before moving to modern architectures.
Figure: Layered neural networks make differentiable function approximation visible. Image: Wikimedia Commons, Cburnett, CC BY-SA 3.0/GFDL.

Figure: MNIST gives classification, vision, and neural-network pages a familiar benchmark image. Image: Wikimedia Commons, Suvanjanprasai, CC BY-SA 4.0.
Figure: A compact network diagram gives deep-learning pages a quick visual model of learned weights. Image: Wikimedia Commons, Dake and Mysid, CC BY 1.0.
The later pages cover the main deep learning families: multilayer perceptrons, convolutional networks, recurrent networks, attention, transformers, NLP applications, computer vision systems, recommender systems, GANs, reinforcement learning, Gaussian processes, and hyperparameter optimization. Code examples use PyTorch for portability. For classical context, compare these notes with machine learning; for prerequisites, see linear algebra and probability.
This overview diagram shows the repeated contract behind the chapter sequence. Data enters as shaped tensors, a differentiable model produces task-specific predictions, a scalar loss drives automatic differentiation, and an optimizer updates parameters for the next minibatch. The validation loop is separate from the gradient path because evaluation should measure behavior rather than train the model.
- Tensors and Data Preprocessing
- Math for Deep Learning
- Linear Regression and Training Loops
- Softmax Classification and Generalization
- Multilayer Perceptrons and Regularization
- PyTorch Builders Guide
- Convolutional Neural Networks
- Modern CNNs
- Sequence Modeling and RNNs
- Gated RNNs and Sequence-to-Sequence
- Attention and Transformers
- Efficient Sequence Modeling
- Pretrained Transformers and BERT
- Optimization Algorithms
- Computational Performance
- Computer Vision Applications
- NLP Pretraining and Applications
- Generative Adversarial Networks
- Recommender Systems
- Reinforcement Learning and Bayesian Tuning