Week 1: Vision Transformers, Diffusion Models & Graph Neural Networks
Advanced deep learning at MSc level: Vision Transformers, Diffusion Models, Graph Neural Networks, and cutting-edge optimization techniques.
- Implement a Vision Transformer (ViT) from scratch
- Train a diffusion model for image generation
- Build and train a Graph Neural Network
- Implement advanced optimizers: Adam, LAMB, Lion
This first lecture establishes the foundational framework for Deep Learning & Neural Networks. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: Vision Transformers (ViT, SWIN), Diffusion Models: DDPM & DDIM, Graph Neural Networks, Advanced Optimization Algorithms. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: Vision Transformers (ViT, SWIN) and Diffusion Models: DDPM & DDIM. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
DS505 Project 1: Custom Diffusion Model
Implement a denoising diffusion probabilistic model (DDPM) from scratch in PyTorch. Train on a small image dataset and generate novel samples. Analyze the FID score.
- Full DDPM implementation in PyTorch
- Training procedure with noise schedule analysis
- Generated image samples (20+)
- FID score computation and quality analysis
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
Explain the forward and reverse processes in a DDPM. What is being maximized during training?
How does a Vision Transformer differ architecturally from a ResNet-50?
What is a message passing algorithm in a Graph Neural Network? Write the mathematical update rule.