HarvardX: Data Science: Wrangling course Syllabus

Full curriculum breakdown — modules, lessons, estimated time, and outcomes.

Overview: This course provides a comprehensive introduction to data wrangling, a foundational skill in data science. You'll learn how to clean, transform, and combine real-world datasets into reliable, analysis-ready formats. The curriculum is structured into five core modules followed by a final project, requiring approximately 6–10 hours per week over 8–10 weeks. Through hands-on exercises, you’ll gain practical experience using widely adopted tools and techniques essential for data analysts and scientists.

Module 1: Introduction to Data Wrangling

Estimated time: 8 hours

Understand the role of data wrangling in data science workflows
Identify common data quality issues in real-world datasets
Learn the principles of tidy data
Explore structured data formats and their importance

Module 2: Data Cleaning and Transformation

Estimated time: 14 hours

Handle missing data using appropriate strategies
Detect and manage outliers in datasets
Standardize inconsistent values and formats
Apply transformations to improve data usability

Module 3: Data Manipulation and Reshaping

Estimated time: 14 hours

Filter, sort, and summarize data efficiently
Group data and compute aggregations
Reshape datasets between wide and long formats
Prepare data for visualization and analysis

Module 4: Working with Multiple Data Sources

Estimated time: 14 hours

Combine datasets using joins and merges
Understand relational data concepts
Resolve key alignment and schema issues
Prepare integrated datasets for downstream tasks

Module 5: Best Practices and Workflow Efficiency

Estimated time: 10 hours

Document data cleaning steps reproducibly
Use versioning and logging for transparency
Optimize wrangling pipelines for performance

Module 6: Final Project

Estimated time: 20 hours

Apply wrangling techniques to a messy real-world dataset
Produce a clean, well-documented analysis-ready dataset
Submit a short report explaining key cleaning and transformation decisions

Prerequisites

Familiarity with basic programming concepts (e.g., variables, loops)
Basic understanding of data tables and spreadsheet software
Some prior exposure to data analysis or statistics is helpful

What You'll Be Able to Do After

Identify and resolve common data quality issues
Clean and standardize messy datasets systematically
Transform and reshape data for analysis and visualization
Combine multiple data sources using joins and merges
Build reproducible, efficient data preparation workflows

View Full Course Review

HarvardX: Data Science: Wrangling course Syllabus

Module 1: Introduction to Data Wrangling

Module 2: Data Cleaning and Transformation

Module 3: Data Manipulation and Reshaping

Module 4: Working with Multiple Data Sources

Module 5: Best Practices and Workflow Efficiency

Module 6: Final Project

Prerequisites

What You'll Be Able to Do After

Course AI Assistant Beta