Week 1: Scalability, Trade-offs & Production Architecture Patterns
Design AI systems at scale: architectural trade-offs, scalability patterns, load balancing, system design case studies, and production architecture reviews.
- Apply CAP theorem trade-offs to AI system design decisions
- Design horizontally scalable AI serving architectures
- Evaluate build vs buy decisions for AI infrastructure
- Conduct architecture reviews and propose improvements
This first lecture establishes the foundational framework for AI System Architecture & Design. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: CAP Theorem & Consistency Models, Scalable AI System Patterns, Architecture Decision Records (ADRs), Case Studies: Recommendation, Vision, LLM Systems. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: CAP Theorem & Consistency Models and Scalable AI System Patterns. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
AIE401 Project 1: AI System Design Document
Write a complete system design document for a production-scale AI application (e.g., a personalized recommendation engine handling 100K QPS). Include architecture diagrams, trade-off analysis, and capacity planning.
- System design document (10-15 pages)
- Architecture diagrams (sequence, component, deployment)
- Capacity planning with cost estimates
- ADR for 3 key architectural decisions
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
Explain the CAP theorem. Give an example of how it applies to a distributed ML serving system.
Compare vertical and horizontal scaling for an ML inference service. When does each make sense?
Design the high-level architecture for a recommendation system serving 100K requests per second.