Big Data Integration and Processing Course Syllabus
Full curriculum breakdown — modules, lessons, estimated time, and outcomes.
This course provides a beginner-friendly introduction to big data integration and processing, designed to equip learners with practical skills using industry-standard tools. Through hands-on exercises, you'll learn to retrieve, manipulate, and analyze data from both relational and NoSQL databases, use integration platforms like Splunk and Datameer, and process large datasets on Hadoop and Spark. The course spans approximately 10–12 hours of content, divided into six modules, and includes assignments, discussions, and a final project. You’ll gain foundational experience applicable to real-world data engineering tasks, with lifetime access to materials and a certificate upon completion.
Module 1: Welcome
Estimated time: 1 hour
- Introduction to big data integration and processing concepts
- Setting up the learning environment using Docker
- Working with Jupyter notebooks for hands-on exercises
- Accessing course materials and navigating the platform
Module 2: Retrieving Big Data (Part 1)
Estimated time: 1 hour
- Understanding relational databases in big data contexts
- Connecting to PostgreSQL databases
- Querying data using SQL in PostgreSQL
- Retrieving structured data for analysis
Module 3: Retrieving Big Data (Part 2)
Estimated time: 2 hours
- Introduction to NoSQL databases: MongoDB and Aerospike
- Querying and aggregating data in MongoDB
- Working with key-value data in Aerospike
- Data manipulation using Pandas data frames
Module 4: Big Data Integration
Estimated time: 2 hours
- Introduction to data integration concepts
- Using Splunk for real-time data monitoring and analysis
- Applying Datameer for large-scale data integration
- Practical examples of integrating heterogeneous data sources
Module 5: Big Data Processing
Estimated time: 3 hours
- Introduction to Hadoop for distributed data processing
- Running processing tasks on Spark
- Understanding when to use Hadoop vs. Spark
- Hands-on exercises with big data processing workflows
Module 6: Final Project
Estimated time: 3 hours
- Integrate data from PostgreSQL, MongoDB, and Aerospike
- Process and aggregate data using Pandas and Spark
- Submit a comprehensive report with insights and methodology
Prerequisites
- Basic understanding of databases and data structures
- Prior exposure to big data concepts (recommended)
- Ability to install and configure Docker and virtual machines
What You'll Be Able to Do After
- Retrieve and query data from relational and NoSQL databases
- Manipulate and analyze large datasets using Pandas
- Apply data integration tools like Splunk and Datameer
- Execute big data processing tasks on Hadoop and Spark
- Understand data integration needs in large-scale analytics