📊 Data Science Week 1 of 14 BSc · Y2 S1 ⏱ ~50 min

Week 1: Supervised Learning, Bias-Variance & Model Evaluation

Understand the core ML toolkit: decision trees, SVMs, regularization, cross-validation, and the full model evaluation pipeline.

University of Aliens

DS201 — Lecture 1 · BSc Y2 S1

🎬 CC Licensed Lecture

0:00 / —:—— 📺 MIT OpenCourseWare (CC BY-NC-SA)

🎯 Learning Objectives

Implement linear and logistic regression from scratch
Understand bias-variance tradeoff and how to diagnose it
Apply k-fold cross-validation correctly
Evaluate models with precision, recall, F1, and ROC-AUC

Topics Covered This Lecture

Linear & Logistic Regression

Decision Trees & Random Forests

SVM & Kernel Methods

Model Evaluation Framework

📖 Lecture Overview

This first lecture establishes the foundational framework for Machine Learning Fundamentals. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.

        Why this matters
        Understand the core ML toolkit: decision trees, SVMs, regularization, cross-validation, and the full model evaluation pipeline. This lecture sets up everything that follows — make sure you understand the core concepts before proceeding to Week 2.
      

Key Concepts

The lecture introduces the four main pillars of this course: Linear & Logistic Regression, Decision Trees & Random Forests, SVM & Kernel Methods, Model Evaluation Framework. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.

# Quick Start: verify your environment is ready for DS201
import sys
print(f"Python {sys.version}")

# Check key libraries are installed
try:
    import numpy, pandas, matplotlib
    print("✅ Core libraries ready")
except ImportError as e:
    print(f"❌ Missing: {e} — run: pip install numpy pandas matplotlib")

This Week's Focus

Focus on mastering: Linear & Logistic Regression and Decision Trees & Random Forests. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.

📋 Project 1 of 3 50% of Final Grade

DS201 Project 1: Binary Classification Pipeline

Build a complete ML pipeline for a binary classification problem (fraud detection or churn prediction). Include feature engineering, model comparison, hyperparameter tuning, and final evaluation.

Feature engineering notebook
3+ model comparison with cross-validation
Hyperparameter tuning (GridSearchCV or RandomSearch)
Final model card with metrics and limitations

50%

3 Projects

20%

Midterm Exam

30%

Final Exam

📝 Sample Exam Questions

These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.

Conceptual Short Answer

Explain the bias-variance tradeoff and how regularization addresses it.

Analysis Short Answer

When would you prefer precision over recall? Give a medical diagnosis example.

Applied Code / Proof

Write scikit-learn code to implement a 5-fold cross-validated logistic regression pipeline.