🎓 University of Aliens — Course Portal
Data ScienceDS601 › Week 1
📊 Data Science Week 1 of 14 MSc · S3 ⏱ ~50 min

Week 1: LLMs, Diffusion Models & Multimodal AI

Explore the frontier of generative AI: large language models, diffusion models, multimodal systems, RLHF, and real industry applications.

UA
University of Aliens
DS601 — Lecture 1 · MSc S3
🎬 CC Licensed Lecture
0:00 / —:—— 📺 MIT OpenCourseWare (CC BY-NC-SA)
🎯 Learning Objectives
  • Understand the full LLM training pipeline (pretraining → RLHF)
  • Fine-tune a diffusion model for custom image generation
  • Build a multimodal system (text + image)
  • Evaluate generative models: FID, IS, BLEU, human eval
Topics Covered This Lecture
LLM Training: Pretraining, SFT, RLHF
Stable Diffusion Architecture
Multimodal Models: CLIP, GPT-4V
Evaluation of Generative Systems
📖 Lecture Overview

This first lecture establishes the foundational framework for Advanced Topics: Generative AI. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.

Why this matters Explore the frontier of generative AI: large language models, diffusion models, multimodal systems, RLHF, and real industry applications. This lecture sets up everything that follows — make sure you understand the core concepts before proceeding to Week 2.

Key Concepts

The lecture introduces the four main pillars of this course: LLM Training: Pretraining, SFT, RLHF, Stable Diffusion Architecture, Multimodal Models: CLIP, GPT-4V, Evaluation of Generative Systems. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.

# Quick Start: verify your environment is ready for DS601 import sys print(f"Python {sys.version}") # Check key libraries are installed try: import numpy, pandas, matplotlib print("✅ Core libraries ready") except ImportError as e: print(f"❌ Missing: {e} — run: pip install numpy pandas matplotlib")

This Week's Focus

Focus on mastering: LLM Training: Pretraining, SFT, RLHF and Stable Diffusion Architecture. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.

📋 Project 1 of 3 50% of Final Grade

DS601 Project 1: Fine-Tuned Generative Application

Fine-tune an open-source LLM (e.g., Mistral-7B or Llama 3) on a domain-specific dataset and build a complete application with RAG, evaluation pipeline, and deployment.

  • Fine-tuned LLM with LoRA/PEFT
  • RAG system with vector database
  • Evaluation suite (BLEU, ROUGE, human eval)
  • Production API endpoint with latency benchmarks
50%
3 Projects
20%
Midterm Exam
30%
Final Exam
📝 Sample Exam Questions

These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.

Conceptual Short Answer

Explain the three stages of LLM training: pretraining, SFT (supervised fine-tuning), and RLHF.

Analysis Short Answer

What is the role of the KL divergence penalty in PPO-based RLHF training?

Applied Code / Proof

Compare LoRA and full fine-tuning for adapting a 7B parameter LLM. When would you choose each?