Week 1: Image Processing, CNNs & Object Detection
Learn how machines see: convolutional neural networks, image classification, object detection, segmentation, and real-time vision with OpenCV.
- Implement convolution and pooling operations from scratch
- Train a ResNet-style network on an image dataset
- Implement YOLO-style object detection
- Apply image segmentation techniques
This first lecture establishes the foundational framework for Computer Vision. By the end of this session, you will have the conceptual grounding and practical starting point needed for the rest of the course.
Key Concepts
The lecture introduces the four main pillars of this course: Convolution & Feature Maps, Classic Architectures: VGG, ResNet, EfficientNet, Object Detection: YOLO, Faster R-CNN, Semantic Segmentation: U-Net. Each will be explored in depth over the 14-week curriculum, with hands-on projects reinforcing theory at every stage.
This Week's Focus
Focus on mastering: Convolution & Feature Maps and Classic Architectures: VGG, ResNet, EfficientNet. These are the prerequisites for everything in Week 2. The concepts build on each other — do not skip the practice exercises.
DS303 Project 1: Real-Time Object Detector
Train an object detector on a custom dataset (or COCO subset) and build a real-time inference pipeline using OpenCV. Measure FPS, mAP, and inference latency.
- Object detection model (YOLOv5 or custom)
- Custom dataset collection and annotation
- Real-time inference script with OpenCV
- Benchmark: FPS, precision, recall, mAP@50
These represent the style and difficulty of questions you'll see on the midterm and final. Start thinking about them now.
What is the receptive field of a neuron in a CNN, and how does it grow with depth?
Explain the skip connections in ResNet. Why do they solve the degradation problem?
What is the difference between object detection and semantic segmentation?