Week 1: Pandas, Data Cleaning & Visual Storytelling
Master pandas for data cleaning, joining, and reshaping — then communicate your findings powerfully with Matplotlib, Seaborn, and Plotly.
- Clean messy datasets: missing values, duplicates, inconsistencies
- Merge, reshape, and aggregate DataFrames with pandas
- Choose the right visualization for different data types
- Build interactive dashboards with Plotly
This first lecture establishes the foundational framework for Data Wrangling & Visualization. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: pandas: DataFrame Operations, Data Cleaning Techniques, Matplotlib & Seaborn, Interactive Plotly Dashboards. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: pandas: DataFrame Operations and Data Cleaning Techniques. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
DS102 Project 1: Data Story Dashboard
Take a messy real-world dataset, clean it completely, and build an interactive Plotly dashboard that tells a compelling data story with at least 5 chart types.
- Cleaning notebook with before/after comparisons
- Interactive Plotly dashboard (HTML export)
- 5+ chart types with annotations
- Written narrative: 3 key insights from the data
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
Describe three strategies for handling missing values in pandas. When is each appropriate?
What is the difference between `merge()` and `concat()` in pandas?
Write pandas code to compute monthly total sales from a DataFrame with `date` and `amount` columns.