🎓 University of America — Course Portal
Data ScienceDS502 › Week 1
📊 Data Science Week 1 of 14 MSc · S1 ⏱ ~50 min

Week 1: Distributed Compute, Lambda Architecture & Cloud-Native Design

Design and architect large-scale data systems: distributed compute patterns, cloud-native data lakes, streaming systems, and performance engineering.

UA
University of America
DS502 — Lecture 1 · MSc S1
🎬 CC Licensed Lecture
0:00 / —:—— 📺 Creative Commons Licensed
🎯 Learning Objectives
  • Design Lambda and Kappa architectures for real-time + batch systems
  • Implement data lake patterns with Delta Lake or Iceberg
  • Optimize Spark jobs for performance (partitioning, caching, skew)
  • Architect multi-region, fault-tolerant data systems
Topics Covered This Lecture
Lambda & Kappa Architectures
Delta Lake & Iceberg: ACID on Data Lakes
Spark Performance Tuning
Data Mesh & Federated Architecture
📖 Lecture Overview

This first lecture establishes the foundational framework for Big Data Systems & Architecture. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.

Why this matters Design and architect large-scale data systems: distributed compute patterns, cloud-native data lakes, streaming systems, and performance engineering. This lecture sets up everything that follows — make sure you understand the core concepts before proceeding to Week 2.

Key Concepts

The lecture introduces the four main pillars of this course: Lambda & Kappa Architectures, Delta Lake & Iceberg: ACID on Data Lakes, Spark Performance Tuning, Data Mesh & Federated Architecture. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.

# Quick Start: verify your environment is ready for DS502 import sys print(f"Python {sys.version}") # Check key libraries are installed try: import numpy, pandas, matplotlib print("✅ Core libraries ready") except ImportError as e: print(f"❌ Missing: {e} — run: pip install numpy pandas matplotlib")

This Week's Focus

Focus on mastering: Lambda & Kappa Architectures and Delta Lake & Iceberg: ACID on Data Lakes. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.

📋 Project 1 of 3 50% of Final Grade

DS502 Project 1: Real-Time Analytics Platform

Design and partially implement a real-time analytics platform that handles both streaming (Kafka → Spark Streaming) and batch (Spark SQL) processing, unified in a Delta Lake.

  • Architecture diagram with data flow
  • Kafka producer + Spark Streaming consumer
  • Delta Lake integration with schema evolution
  • Performance benchmark and optimization report
50%
3 Projects
20%
Midterm Exam
30%
Final Exam
📝 Sample Exam Questions

These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.

Conceptual Short Answer

Compare Lambda and Kappa architectures. What are the operational trade-offs?

Analysis Short Answer

What is ACID compliance in a data lake context? How does Delta Lake achieve it?

Applied Code / Proof

Describe three common causes of Spark job slowness and their remedies.