🎓 University of America — Course Portal
Data ScienceDS204 › Week 1
📊 Data Science Week 1 of 14 BSc · Y2 S2 ⏱ ~50 min

Week 1: ETL Architecture, Apache Airflow & Data Lake Design

Build production data pipelines using Apache Airflow and Kafka, design data lake architectures, and implement pipeline monitoring.

UA
University of America
DS204 — Lecture 1 · BSc Y2 S2
🎬 CC Licensed Lecture
0:00 / —:—— 📺 Creative Commons Licensed
🎯 Learning Objectives
  • Design ETL pipelines using Apache Airflow DAGs
  • Understand data lake vs data warehouse architectures
  • Implement streaming ingestion with Apache Kafka
  • Monitor pipeline health and handle failures gracefully
Topics Covered This Lecture
ETL vs ELT Patterns
Apache Airflow: DAGs & Operators
Data Lake Architecture
Kafka Streaming Basics
📖 Lecture Overview

This first lecture establishes the foundational framework for Data Engineering Pipelines. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.

Why this matters Build production data pipelines using Apache Airflow and Kafka, design data lake architectures, and implement pipeline monitoring. This lecture sets up everything that follows — make sure you understand the core concepts before proceeding to Week 2.

Key Concepts

The lecture introduces the four main pillars of this course: ETL vs ELT Patterns, Apache Airflow: DAGs & Operators, Data Lake Architecture, Kafka Streaming Basics. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.

# Quick Start: verify your environment is ready for DS204 import sys print(f"Python {sys.version}") # Check key libraries are installed try: import numpy, pandas, matplotlib print("✅ Core libraries ready") except ImportError as e: print(f"❌ Missing: {e} — run: pip install numpy pandas matplotlib")

This Week's Focus

Focus on mastering: ETL vs ELT Patterns and Apache Airflow: DAGs & Operators. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.

📋 Project 1 of 3 50% of Final Grade

DS204 Project 1: End-to-End Data Pipeline

Build a complete Airflow DAG that ingests data from a public API, transforms it, and loads it into a local PostgreSQL database. Include error handling and alerting.

  • Airflow DAG with 5+ tasks (ingest, validate, transform, load, notify)
  • Data quality checks at each stage
  • Failure handling with retries and alerts
  • Pipeline documentation with lineage diagram
50%
3 Projects
20%
Midterm Exam
30%
Final Exam
📝 Sample Exam Questions

These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.

Conceptual Short Answer

What is the difference between a data lake and a data warehouse? When would you use each?

Analysis Short Answer

Explain what an Airflow DAG is and describe the key components of an Airflow task.

Applied Code / Proof

What are the challenges of exactly-once delivery in a streaming pipeline? How does Kafka address them?