Week 1: LLMs, Diffusion Models & Multimodal AI
Explore the frontier of generative AI: large language models, diffusion models, multimodal systems, RLHF, and real industry applications.
- Understand the full LLM training pipeline (pretraining → RLHF)
- Fine-tune a diffusion model for custom image generation
- Build a multimodal system (text + image)
- Evaluate generative models: FID, IS, BLEU, human eval
This first lecture establishes the foundational framework for Advanced Topics: Generative AI. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: LLM Training: Pretraining, SFT, RLHF, Stable Diffusion Architecture, Multimodal Models: CLIP, GPT-4V, Evaluation of Generative Systems. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: LLM Training: Pretraining, SFT, RLHF and Stable Diffusion Architecture. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
DS601 Project 1: Fine-Tuned Generative Application
Fine-tune an open-source LLM (e.g., Mistral-7B or Llama 3) on a domain-specific dataset and build a complete application with RAG, evaluation pipeline, and deployment.
- Fine-tuned LLM with LoRA/PEFT
- RAG system with vector database
- Evaluation suite (BLEU, ROUGE, human eval)
- Production API endpoint with latency benchmarks
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
Explain the three stages of LLM training: pretraining, SFT (supervised fine-tuning), and RLHF.
What is the role of the KL divergence penalty in PPO-based RLHF training?
Compare LoRA and full fine-tuning for adapting a 7B parameter LLM. When would you choose each?