HarvardX: Data Science: Wrangling course Syllabus
Full curriculum breakdown — modules, lessons, estimated time, and outcomes.
Overview: This course provides a comprehensive introduction to data wrangling, a foundational skill in data science. You'll learn how to clean, transform, and combine real-world datasets into reliable, analysis-ready formats. The curriculum is structured into five core modules followed by a final project, requiring approximately 6–10 hours per week over 8–10 weeks. Through hands-on exercises, you’ll gain practical experience using widely adopted tools and techniques essential for data analysts and scientists.
Module 1: Introduction to Data Wrangling
Estimated time: 8 hours
- Understand the role of data wrangling in data science workflows
- Identify common data quality issues in real-world datasets
- Learn the principles of tidy data
- Explore structured data formats and their importance
Module 2: Data Cleaning and Transformation
Estimated time: 14 hours
- Handle missing data using appropriate strategies
- Detect and manage outliers in datasets
- Standardize inconsistent values and formats
- Apply transformations to improve data usability
Module 3: Data Manipulation and Reshaping
Estimated time: 14 hours
- Filter, sort, and summarize data efficiently
- Group data and compute aggregations
- Reshape datasets between wide and long formats
- Prepare data for visualization and analysis
Module 4: Working with Multiple Data Sources
Estimated time: 14 hours
- Combine datasets using joins and merges
- Understand relational data concepts
- Resolve key alignment and schema issues
- Prepare integrated datasets for downstream tasks
Module 5: Best Practices and Workflow Efficiency
Estimated time: 10 hours
- Document data cleaning steps reproducibly
- Use versioning and logging for transparency
- Optimize wrangling pipelines for performance
Module 6: Final Project
Estimated time: 20 hours
- Apply wrangling techniques to a messy real-world dataset
- Produce a clean, well-documented analysis-ready dataset
- Submit a short report explaining key cleaning and transformation decisions
Prerequisites
- Familiarity with basic programming concepts (e.g., variables, loops)
- Basic understanding of data tables and spreadsheet software
- Some prior exposure to data analysis or statistics is helpful
What You'll Be Able to Do After
- Identify and resolve common data quality issues
- Clean and standardize messy datasets systematically
- Transform and reshape data for analysis and visualization
- Combine multiple data sources using joins and merges
- Build reproducible, efficient data preparation workflows