🎓 University of America — Course Portal
🤖 Artificial Intelligence Week 1 of 14 BSc · Y3 ⏱ ~50 min

Week 1: Self-Attention, Positional Encoding & the Transformer Block

Deep dive into transformer architecture: self-attention, BERT and GPT, RLHF fine-tuning, prompt engineering, and the engineering of large language models.

UA
University of America
AI304 — Lecture 1 · BSc Y3
🎬 CC Licensed Lecture
0:00 / —:—— 📺 MIT OpenCourseWare (CC BY-NC-SA)
🎯 Learning Objectives
  • Implement multi-head self-attention from scratch
  • Understand BERT (encoder) vs GPT (decoder) architecture trade-offs
  • Apply RLHF and DPO for LLM alignment
  • Engineer effective prompts using few-shot and chain-of-thought techniques
Topics Covered This Lecture
Transformer Block: Attention, FFN, LayerNorm
BERT vs GPT: Encoder vs Decoder
Instruction Tuning, SFT & RLHF
Prompt Engineering & In-Context Learning
📖 Lecture Overview

This first lecture establishes the foundational framework for Transformer Architectures & LLMs. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.

Why this matters Deep dive into transformer architecture: self-attention, BERT and GPT, RLHF fine-tuning, prompt engineering, and the engineering of large language models. This lecture sets up everything that follows — make sure you understand the core concepts before proceeding to Week 2.

Key Concepts

The lecture introduces the four main pillars of this course: Transformer Block: Attention, FFN, LayerNorm, BERT vs GPT: Encoder vs Decoder, Instruction Tuning, SFT & RLHF, Prompt Engineering & In-Context Learning. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.

# Quick Start: verify your environment is ready for AI304 import sys print(f"Python {sys.version}") # Check key libraries are installed try: import numpy, pandas, matplotlib print("✅ Core libraries ready") except ImportError as e: print(f"❌ Missing: {e} — run: pip install numpy pandas matplotlib")

This Week's Focus

Focus on mastering: Transformer Block: Attention, FFN, LayerNorm and BERT vs GPT: Encoder vs Decoder. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.

📋 Project 1 of 3 50% of Final Grade

AI304 Project 1: Mini-GPT Language Model

Implement and train a small GPT-style language model from scratch on a domain corpus (e.g., Shakespeare or Python code). Implement BPE tokenization, causal attention, and sampling with temperature/top-k.

  • Full GPT implementation in PyTorch (<300 lines)
  • BPE tokenizer implementation
  • Training on custom corpus with perplexity tracking
  • Text generation with temperature/top-k sampling
50%
3 Projects
20%
Midterm Exam
30%
Final Exam
📝 Sample Exam Questions

These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.

Conceptual Short Answer

Explain the difference between an encoder-only (BERT), decoder-only (GPT), and encoder-decoder (T5) transformer.

Analysis Short Answer

What is RLHF? Describe the 3 stages: SFT, reward model training, and PPO optimization.

Applied Code / Proof

Why does causal language modeling require a triangular attention mask? Illustrate with an example.