Data Engineering Foundations in Python Course Syllabus

Full curriculum breakdown — modules, lessons, estimated time, and outcomes.

Overview (80-120 words) describing structure and time commitment.

Module 1: Getting Started

Estimated time: 0.5 hours

  • Introduction to data engineering roles and responsibilities
  • Understanding team structures in data organizations
  • Setting up Google Cloud Platform (GCP) environment
  • Reviewing the data engineering lifecycle stages

Module 2: Team Structures

Estimated time: 0.75 hours

  • Differences between embedded and centralized data teams
  • Role breakdown: Data Engineers, Analysts, and Data Scientists
  • Collaboration patterns across data functions
  • Strategic alignment of data teams with business goals

Module 3: Data Lifecycle & Cloud Arch

Estimated time: 1.25 hours

  • End-to-end data engineering lifecycle
  • Data lakes vs data warehouses
  • Cloud architecture patterns: Lambda and Kappa
  • Key checkpoints in pipeline development

Module 4: Data Ingestion

Estimated time: 1.5 hours

  • Batch vs streaming ingestion methods
  • Change Data Capture (CDC) techniques
  • API-based and file system data ingestion
  • Building ingestion pipelines with pandas and PySpark

Module 5: Data Modeling & SQL

Estimated time: 1 hour

  • Dimensional modeling using Kimball methodology
  • Writing DDL and DML statements in SQL
  • SQL query lifecycle in BigQuery
  • Solving real-world SQL challenges

Module 6: Orchestration Tools

Estimated time: 1.5 hours

  • Directed Acyclic Graphs (DAGs) in Apache Airflow
  • Introduction to Dagster for workflow orchestration
  • Using dbt for transformation workflows
  • Building and managing end-to-end DAG pipelines

Module 7: Data Quality

Estimated time: 0.75 hours

  • Schema validation with Avro and Protobuf
  • Implementing data quality checks in pipelines
  • Testing and monitoring data integrity
  • Integrating dbt for automated testing

Module 8: Capstone & Epilogue

Estimated time: 0.5 hours

  • Building an end-to-end Formula-1 data pipeline
  • Integrating ingestion, transformation, and orchestration
  • Reviewing GCP billing and cost management

Prerequisites

  • Familiarity with Python programming
  • Basic understanding of SQL
  • Access to a Google Cloud Platform account

What You'll Be Able to Do After

  • Design and implement data engineering pipelines on GCP
  • Use Python, PySpark, and SQL for data processing
  • Orchestrate workflows using Airflow and dbt
  • Apply data modeling and quality assurance practices
  • Build a production-grade data pipeline for portfolio use
View Full Course Review

Course AI Assistant Beta

Hi! I can help you find the perfect online course. Ask me something like “best Python course for beginners” or “compare data science courses”.