Week 1: Feature Engineering, Training Pipelines & Reproducibility
The engineering perspective on ML: robust feature pipelines, automated model selection, reproducible training infrastructure, and production-readiness.
- Design reproducible ML training pipelines with experiment tracking
- Engineer features from raw data at production scale
- Implement automated model selection and hyperparameter optimization
- Package and version ML models for deployment
This first lecture establishes the foundational framework for Machine Learning Engineering. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: Experiment Tracking: MLflow & W&B, Feature Engineering at Scale, AutoML & Hyperparameter Optimization, Model Packaging & Versioning. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: Experiment Tracking: MLflow & W&B and Feature Engineering at Scale. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
AIE201 Project 1: Reproducible ML Training System
Build a fully reproducible ML training system for a classification task: feature store, experiment tracking (MLflow), hyperparameter optimization (Optuna), and model registry.
- Feature engineering pipeline with data validation
- MLflow experiment tracking integration
- Optuna hyperparameter optimization study
- Model registered in MLflow with performance report
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
What is experiment reproducibility in ML? List 5 sources of non-reproducibility and how to fix each.
Explain the difference between feature selection, feature extraction, and feature engineering.
How does Optuna's TPE sampler differ from random search for hyperparameter optimization?