Week 1: EDA Workflow, Distribution Analysis & Pattern Discovery
Learn the systematic EDA process: data profiling, distribution analysis, correlation studies, outlier detection, and business insight generation.
- Profile any dataset systematically using a structured checklist
- Identify distribution shapes and their implications
- Detect and handle outliers appropriately
- Generate and communicate business-relevant insights
This first lecture establishes the foundational framework for Exploratory Data Analysis. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: Data Profiling Checklist, Distribution Analysis, Correlation & Multivariate EDA, Outlier Detection Methods. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: Data Profiling Checklist and Distribution Analysis. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
DS203 Project 1: Deep EDA on a Business Dataset
Perform a comprehensive EDA on a business dataset (sales, customer, or operations data). Deliver a structured EDA report with visualizations and actionable recommendations.
- EDA notebook with systematic profiling
- Univariate, bivariate, and multivariate analyses
- Automated profiling with pandas-profiling or ydata
- Executive summary: top 5 findings with chart evidence
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
A feature has skewness of 3.2. What does this indicate and how would you transform it?
Describe the four steps in a systematic EDA workflow.
What is the IQR method for outlier detection? Write Python code to implement it.