Building a Machine Learning Pipeline from Scratch Course Syllabus
Full curriculum breakdown — modules, lessons, estimated time, and outcomes.
Overview: This course provides a hands-on, project-driven introduction to building end-to-end machine learning pipelines from scratch. You'll learn to transform experimental code into production-grade systems using software engineering best practices, all within a browser-based interactive environment. With approximately 4 hours of total content, the course guides you through designing, structuring, testing, and extending ML pipelines—no setup required. Each module combines foundational concepts with immediate coding exercises to reinforce learning.
Module 1: Course Goals & Structure
Estimated time: 0.2 hours
- Intended audience and prerequisites
- Course goals and learning outcomes
- Structure and navigation
- Strengths of the interactive format
Module 2: Getting Started
Estimated time: 0.3 hours
- Why use ML pipelines over notebooks
- Defining ML training pipelines
- Understanding pipeline components
- Completing the Getting Started quiz
Module 3: Structuring the ML Pipeline
Estimated time: 0.5 hours
- System architecture for ML pipelines
- Directory layout and code organization
- Dependency management
- Project scaffolding
Module 4: Directed Acyclic Graphs (DAGs)
Estimated time: 0.3 hours
- DAG fundamentals in pipeline orchestration
- Topological sorting of tasks
- Implementing a DAG for workflow control
Module 5: Building the ML Library
Estimated time: 0.8 hours
- Object-oriented programming for ML components
- Using OmegaConf for configuration management
- Designing abstract base classes for datasets, models, and reports
Module 6: The Pipeline Core
Estimated time: 0.8 hours
- Command-line interface parsing with argparse
- Experiment tracking integration
- Logging and docstrings for maintainability
Module 7: Extending the Pipeline
Estimated time: 0.5 hours
- Adding support for new datasets
- Extending to new model types
- Hands-on extension to a second dataset
Module 8: Testing
Estimated time: 0.5 hours
- Unit testing principles
- Using pytest for function validation
- System testing pipeline components
Prerequisites
- Familiarity with Python programming
- Basic understanding of machine learning concepts
- Experience with Jupyter notebooks (helpful but not required)
What You'll Be Able to Do After
- Design and structure production-ready ML pipelines
- Orchestrate workflows using Directed Acyclic Graphs (DAGs)
- Build reusable and modular ML components
- Implement logging, configuration, and CLI interfaces
- Write and run tests for ML pipeline functions