Week 1: The Data Science Landscape & Lifecycle
Understand the data science process from raw data to actionable insights. Explore Python, Jupyter, and the full toolkit that powers modern data work.
- Describe the end-to-end data science lifecycle
- Set up a Python + Jupyter development environment
- Run your first exploratory analysis on a real dataset
- Identify the roles and responsibilities of a data scientist
This first lecture establishes the foundational framework for Introduction to Data Science. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: Data Lifecycle, Python & Jupyter Setup, EDA Fundamentals, Tools: pandas, numpy, matplotlib. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: Data Lifecycle and Python & Jupyter Setup. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
DS101 Project 1: Exploratory Profile of a Real Dataset
Download any public dataset (Kaggle, UCI, government data), build a complete exploratory profile — shape, missingness, distributions, correlations — and write a 2-page findings narrative.
- Jupyter notebook with full EDA (shape, dtypes, nulls, stats)
- 5+ visualizations (distributions, correlations, outliers)
- 2-page written narrative of findings
- Data dictionary and source citation
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
What are the four stages of the CRISP-DM data science process?
A dataset has 40% missing values in a feature column. What are 3 strategies for handling this?
Write a Python one-liner to compute the correlation matrix of a pandas DataFrame `df`.