📊 Data Science Week 1 of 14 BSc · Y1 S1 ⏱ ~50 min

Week 1: The Data Science Landscape & Lifecycle

Understand the data science process from raw data to actionable insights. Explore Python, Jupyter, and the full toolkit that powers modern data work.

University of America

DS101 — Lecture 1 · BSc Y1 S1

🎬 CC Licensed Lecture

0:00 / —:—— 📺 FOSSASIA (CC BY)

🎯 Learning Objectives

Describe the end-to-end data science lifecycle
Set up a Python + Jupyter development environment
Run your first exploratory analysis on a real dataset
Identify the roles and responsibilities of a data scientist

Topics Covered This Lecture

Data Lifecycle

Python & Jupyter Setup

EDA Fundamentals

Tools: pandas, numpy, matplotlib

📖 Lecture Overview

This first lecture establishes the foundational framework for Introduction to Data Science. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.

        Why this matters
        Understand the data science process from raw data to actionable insights. Explore Python, Jupyter, and the full toolkit that powers modern data work. This lecture sets up everything that follows — make sure you understand the core concepts before proceeding to Week 2.
      

Key Concepts

The lecture introduces the four main pillars of this course: Data Lifecycle, Python & Jupyter Setup, EDA Fundamentals, Tools: pandas, numpy, matplotlib. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.

# Quick Start: verify your environment is ready for DS101
import sys
print(f"Python {sys.version}")

# Check key libraries are installed
try:
    import numpy, pandas, matplotlib
    print("✅ Core libraries ready")
except ImportError as e:
    print(f"❌ Missing: {e} — run: pip install numpy pandas matplotlib")

This Week's Focus

Focus on mastering: Data Lifecycle and Python & Jupyter Setup. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.

📋 Project 1 of 3 50% of Final Grade

DS101 Project 1: Exploratory Profile of a Real Dataset

Download any public dataset (Kaggle, UCI, government data), build a complete exploratory profile — shape, missingness, distributions, correlations — and write a 2-page findings narrative.

Jupyter notebook with full EDA (shape, dtypes, nulls, stats)
5+ visualizations (distributions, correlations, outliers)
2-page written narrative of findings
Data dictionary and source citation

50%

3 Projects

20%

Midterm Exam

30%

Final Exam

📝 Sample Exam Questions

These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.

Conceptual Short Answer

What are the four stages of the CRISP-DM data science process?

Analysis Short Answer

A dataset has 40% missing values in a feature column. What are 3 strategies for handling this?

Applied Code / Proof

Write a Python one-liner to compute the correlation matrix of a pandas DataFrame `df`.