Foundations of AI Engineering
The AI engineering landscape, roles, the complete model lifecycle, and the infrastructure that powers modern AI systems. By the end of this lecture, you'll understand what AI engineers actually build — and how they differ from data scientists and researchers.
After this lecture, you will be able to:
- Distinguish AI engineering from data science and AI research — and explain where each fits in an organization.
- Describe the complete machine learning lifecycle, from business requirement to production monitoring.
- Identify the core infrastructure components that power modern AI systems.
- Explain what MLOps means and why it exists.
- Set up a basic Python ML engineering environment with Docker, Git, and a model registry.
If a data scientist asks "what can the data tell us?", an AI engineer asks "how do we build something that delivers that insight reliably, at scale, to millions of users?" AI engineering is the discipline of designing, building, and operating AI-powered systems in production.
The Three Roles in AI
AI Researcher
Creates new algorithms and architectures. Publishes papers. Works at the frontier of what's possible. Tools: PyTorch, JAX, LaTeX.
Data Scientist
Applies existing techniques to solve business problems. Builds models and analyses. Tools: scikit-learn, notebooks, dashboards.
AI Engineer
Productionizes AI systems. Builds APIs, pipelines, monitoring, and infra. Tools: Docker, Kubernetes, MLflow, FastAPI, cloud platforms.
Building a production ML system is far more than training a model. Researchers at Google famously noted that in real-world ML systems, the actual model code is a tiny fraction of the total codebase. The rest is infrastructure.
MLOps (Machine Learning Operations) is a set of practices that combine ML, DevOps, and data engineering to deploy and maintain ML models in production reliably and efficiently. It's modeled after DevOps in software engineering.
Core MLOps Practices
Version Control
Track code, data, models, and experiments. Nothing is deployed without a version. Tools: Git, DVC, MLflow.
CI/CD for ML
Automated testing and deployment. Every model change triggers tests before it reaches production.
Monitoring
Track model performance, data drift, and system health in real-time. Alert on degradation.
Retraining
Scheduled or triggered retraining pipelines. Fresh data → better model → automated re-deployment.
A/B Testing
Route traffic between model versions. Compare performance on live users before full rollout.
Reproducibility
Any experiment from 6 months ago should be perfectly reproducible. Track every parameter and artifact.
Let's build a minimal but proper AI engineering environment. This is how production projects start.
Build Your First ML Pipeline
Set up a complete, production-style ML pipeline: data loading → feature engineering → model training → experiment tracking → model packaging. The pipeline must be reproducible from a single command.
Deliverables:- GitHub repository with proper project structure (see Week 1 code example).
- A training pipeline script that runs end-to-end with a single command:
python src/train.py - MLflow experiment tracking with at least 5 logged runs (different hyperparameters).
- A Dockerfile that packages your model and serves it via a FastAPI endpoint.
- A
README.mdexplaining setup, architecture, and how to run the pipeline. - A brief write-up: "What would break if this system went to production for 1M users?"
📅 Due: 3 weeks from today · Submit GitHub link via portal · Late: -5%/day
Sample questions for AIE101 midterm (20%) and final (30%).
Explain the difference between a data scientist and an AI engineer. Provide one example of a task each would own in a production ML project.
Draw the architecture of a production ML system for a recommendation engine. Label each component and describe its role. Include data flow arrows.
You are given a Python script that trains a model in a notebook. Your task: convert it into a production-grade ML pipeline with MLflow tracking, a FastAPI serving endpoint, and a Dockerfile. You have 90 minutes and access to your notes and the internet.