Algorithms for DNA Sequencing Course Syllabus
Full curriculum breakdown — modules, lessons, estimated time, and outcomes.
This course provides an excellent balance between biological context and computational technique, offering a practical, algorithm-rich experience using real DNA sequencing data and Python programming. Over approximately 13 hours, learners will progress through foundational and advanced topics in bioinformatics, combining theory with hands-on implementation. The course is structured into four core modules followed by a final project, allowing learners from both computer science and biology backgrounds to build interdisciplinary skills in genome analysis and algorithm application.
Module 1: DNA Sequencing, Strings, and Matching
Estimated time: 4 hours
- Overview of DNA sequencing technologies
- Genome representation as strings
- Understanding sequencing errors and quality scoring (FASTQ format)
- Implementation of naive exact string matching in Python
Module 2: Preprocessing, Indexing, and Approximate Matching
Estimated time: 3 hours
- Application of the Boyer-Moore algorithm
- Building k-mer indices and hash tables for genome search
- Understanding approximate matches using the pigeonhole principle
- Introduction to Hamming distance and edit distance
Module 3: Edit Distance, Assembly, and Overlaps
Estimated time: 3 hours
- Dynamic programming for edit distance calculation
- Local and global sequence alignment
- Principles of shotgun sequencing and read overlaps
- Construction and analysis of overlap graphs
Module 4: Algorithms for Assembly
Estimated time: 3 hours
- Shortest common superstring and greedy algorithms
- Introduction to de Bruijn graphs and their application in genome assembly
- Eulerian paths and practical genome assembly considerations
Module 5: Final Project
Estimated time: 3 hours
- Apply string matching and indexing techniques to real sequencing data
- Implement alignment and edit distance algorithms
- Perform genome assembly using de Bruijn or overlap graphs
Prerequisites
- Basic familiarity with Python programming
- Introductory knowledge of algorithms and data structures
- Some exposure to biological concepts (helpful but not required)
What You'll Be Able to Do After
- Understand the core principles of DNA sequencing and its computational challenges
- Implement and apply string matching and alignment algorithms to genomic data
- Calculate and interpret Hamming and edit distances for sequence comparison
- Build and use k-mer indexing, suffix arrays, and overlap graphs for genome analysis
- Perform genome assembly using de Bruijn graphs and evaluate results