Big Data Integration and Processing Course Syllabus

Full curriculum breakdown — modules, lessons, estimated time, and outcomes.

This course provides a beginner-friendly introduction to big data integration and processing, designed to equip learners with practical skills using industry-standard tools. Through hands-on exercises, you'll learn to retrieve, manipulate, and analyze data from both relational and NoSQL databases, use integration platforms like Splunk and Datameer, and process large datasets on Hadoop and Spark. The course spans approximately 10–12 hours of content, divided into six modules, and includes assignments, discussions, and a final project. You’ll gain foundational experience applicable to real-world data engineering tasks, with lifetime access to materials and a certificate upon completion.

Module 1: Welcome

Estimated time: 1 hour

Introduction to big data integration and processing concepts
Setting up the learning environment using Docker
Working with Jupyter notebooks for hands-on exercises
Accessing course materials and navigating the platform

Module 2: Retrieving Big Data (Part 1)

Estimated time: 1 hour

Understanding relational databases in big data contexts
Connecting to PostgreSQL databases
Querying data using SQL in PostgreSQL
Retrieving structured data for analysis

Module 3: Retrieving Big Data (Part 2)

Estimated time: 2 hours

Introduction to NoSQL databases: MongoDB and Aerospike
Querying and aggregating data in MongoDB
Working with key-value data in Aerospike
Data manipulation using Pandas data frames

Module 4: Big Data Integration

Estimated time: 2 hours

Introduction to data integration concepts
Using Splunk for real-time data monitoring and analysis
Applying Datameer for large-scale data integration
Practical examples of integrating heterogeneous data sources

Module 5: Big Data Processing

Estimated time: 3 hours

Introduction to Hadoop for distributed data processing
Running processing tasks on Spark
Understanding when to use Hadoop vs. Spark
Hands-on exercises with big data processing workflows

Module 6: Final Project