📊 Data Science Week 1 of 14 BSc · Y1 S2 ⏱ ~50 min

Week 1: Pandas, Data Cleaning & Visual Storytelling

Master pandas for data cleaning, joining, and reshaping — then communicate your findings powerfully with Matplotlib, Seaborn, and Plotly.

University of Aliens

DS102 — Lecture 1 · BSc Y1 S2

🎬 CC Licensed Lecture

0:00 / —:—— 📺 Creative Commons Licensed

🎯 Learning Objectives

Clean messy datasets: missing values, duplicates, inconsistencies
Merge, reshape, and aggregate DataFrames with pandas
Choose the right visualization for different data types
Build interactive dashboards with Plotly

Topics Covered This Lecture

pandas: DataFrame Operations

Data Cleaning Techniques

Matplotlib & Seaborn

Interactive Plotly Dashboards

📖 Lecture Overview

This first lecture establishes the foundational framework for Data Wrangling & Visualization. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.

        Why this matters
        Master pandas for data cleaning, joining, and reshaping — then communicate your findings powerfully with Matplotlib, Seaborn, and Plotly. This lecture sets up everything that follows — make sure you understand the core concepts before proceeding to Week 2.
      

Key Concepts

The lecture introduces the four main pillars of this course: pandas: DataFrame Operations, Data Cleaning Techniques, Matplotlib & Seaborn, Interactive Plotly Dashboards. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.

# Quick Start: verify your environment is ready for DS102
import sys
print(f"Python {sys.version}")

# Check key libraries are installed
try:
    import numpy, pandas, matplotlib
    print("✅ Core libraries ready")
except ImportError as e:
    print(f"❌ Missing: {e} — run: pip install numpy pandas matplotlib")

This Week's Focus

Focus on mastering: pandas: DataFrame Operations and Data Cleaning Techniques. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.

📋 Project 1 of 3 50% of Final Grade

DS102 Project 1: Data Story Dashboard

Take a messy real-world dataset, clean it completely, and build an interactive Plotly dashboard that tells a compelling data story with at least 5 chart types.

Cleaning notebook with before/after comparisons
Interactive Plotly dashboard (HTML export)
5+ chart types with annotations
Written narrative: 3 key insights from the data

50%

3 Projects

20%

Midterm Exam

30%

Final Exam

📝 Sample Exam Questions

These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.

Conceptual Short Answer

Describe three strategies for handling missing values in pandas. When is each appropriate?

Analysis Short Answer

What is the difference between `merge()` and `concat()` in pandas?

Applied Code / Proof

Write pandas code to compute monthly total sales from a DataFrame with `date` and `amount` columns.