The average data science job posting lists 12–15 required skills. Most self-taught beginners spend 8–14 months learning the wrong 8 of them. This roadmap is built around what actually gets you hired: the skills that appear most in entry-level job descriptions, in the order that makes each next step easier to learn.
This is not a list of every data science topic that exists. It's a sequenced data science roadmap for someone who wants a job, not a PhD. Phases are ordered by dependency — you can't skip Phase 2 and tackle Phase 4, but you can start applying for jobs partway through Phase 3.
Phase 1: The Data Science Roadmap Starts With Foundations (Weeks 1–6)
Before touching machine learning, you need two things: enough statistics to not fool yourself with data, and enough Python to actually manipulate it. Most people try to do both simultaneously from scratch. Do statistics first, even just two weeks of it, because it gives context that makes Python exercises make sense.
Statistics you actually need
Forget full-semester statistics courses for now. The concepts that show up daily in data work are:
- Distributions (normal, binomial, Poisson) — knowing which one fits your data
- Mean, median, variance — and when median is more honest than mean
- Correlation vs causation — still the most common analytical error in business
- Hypothesis testing and p-values — understanding what "statistically significant" actually means
- Confidence intervals — how certain are your estimates
You don't need calculus yet. You don't need linear algebra yet. Save those for when you hit gradient descent in Phase 3 and suddenly need them for a concrete reason.
Python for data work
Learn Python with data in mind from day one. The standard stack is: Python basics → NumPy → Pandas → Matplotlib/Seaborn. Treat SQL as a co-equal skill here — most data you'll ever work with lives in a database, not a CSV file.
The Python for Data Science, AI & Development course by IBM (Coursera, 9.8/10) is one of the few courses that covers Jupyter notebooks, Pandas, and NumPy in a single track without padding. It's structured for people who need working knowledge fast, not Python theory.
Phase 2: Data Wrangling and Exploration (Weeks 7–12)
This is the phase most roadmaps underweight, which is why most new data scientists are slow and error-prone in their first jobs. Real datasets are not clean. In practice, 60–80% of project time is data preparation — finding nulls, resolving schema conflicts, deciding how to handle outliers, joining tables that weren't designed to be joined.
Data cleaning
You need to be fast at: handling missing values without introducing bias, deduplication, type coercion, and spotting data entry errors. The mental skill here is skepticism — always ask "why does this column look this way" before trusting it.
Exploratory data analysis (EDA)
EDA is where you develop intuition for data. Before any modeling, you should be able to describe a dataset's shape, identify its outliers, understand relationships between variables, and know which features are likely to matter. This is also where visualization skills pay off — a histogram or scatter matrix often reveals in 10 seconds what a correlation matrix takes 10 minutes to interpret.
The Process Data from Dirty to Clean course (Coursera, 9.8/10) is the most practical treatment of this phase available — it works through real messy data scenarios rather than curated toy datasets. Pair it with the Prepare Data for Exploration course (Coursera, 9.8/10) to cover the upstream side: understanding data collection, schema design, and why downstream cleaning problems happen in the first place.
Phase 3: Machine Learning Core (Weeks 13–24)
Most people rush here. Most people who rush here plateau quickly because they can run sklearn code but can't explain what it's doing or debug it when it fails. If you have Phase 1 and 2 solid, Phase 3 goes faster than you'd expect.
Supervised learning first
Start with linear regression and logistic regression. Not because they're the most powerful models, but because they're interpretable — you can reason through every prediction. Once you understand those, tree-based methods (random forests, gradient boosting) will make intuitive sense. Save neural networks for after you've shipped at least one supervised learning project.
Model evaluation is more important than model selection
Beginners obsess over which algorithm to use. Practitioners obsess over whether they're measuring the right thing. Learn: train/test splits, cross-validation, bias-variance tradeoff, precision vs recall (and when each matters), and how to detect data leakage. These skills prevent the #1 failure mode: a model that looks great in development and fails in production.
Unsupervised learning
Clustering (k-means, DBSCAN) and dimensionality reduction (PCA, t-SNE) are used constantly for segmentation and visualization work. Learn these after supervised learning, not alongside it.
The Analyze Data to Answer Questions course (Coursera, 9.8/10) bridges the gap between data cleaning and modeling well — it focuses on the analytical thinking layer that's often missing from pure ML courses. The Python Data Science course on EDX (9.7/10) is worth adding for its depth on scientific Python libraries if you want more hands-on coverage.
Phase 4: Tools, Infrastructure, and SQL (Weeks 20–28, overlapping)
You can start this phase while finishing Phase 3. Data science in practice means working with databases, version control, cloud storage, and sometimes data pipelines. None of these are glamorous, but gaps here make you visibly junior in interviews and on the job.
SQL is non-negotiable
Most entry-level data science interviews include a SQL component. You need to be comfortable with: SELECT/WHERE/GROUP BY/HAVING, JOINs (inner, left, right, full), subqueries and CTEs, window functions (ROW_NUMBER, LAG, LEAD, SUM OVER PARTITION). Window functions specifically trip up a lot of candidates.
Cloud and data tools
AWS, GCP, or Azure — pick one and get comfortable with basic data storage and compute. Most companies use cloud-hosted data warehouses (Snowflake, BigQuery, Redshift). Knowing how a data warehouse differs from a transactional database, and why it matters for query performance, distinguishes candidates who understand the stack from those who only know Python.
The Snowflake for Data Engineers course (Udemy, 9.8/10) is worth a look here — it's specific enough to be immediately applicable, covering architecture concepts like clustering keys and micro-partitions that come up in data engineering interviews even for data science roles.
Git and notebooks
Version control every project. Notebooks are for exploration; scripts are for production. Learn to export clean, parameterized scripts from notebook explorations before you finish Phase 3.
Top Courses for This Data Science Roadmap
These are the specific courses that map to phases above, selected for rating, pacing, and practical focus — not for brand name or price.
Introduction to Data Analytics Course
Rated 9.8/10 on Coursera. A solid Phase 1/2 entry point covering analytical thinking, working with data tools, and building the mental models you need before touching ML. Better structured than most "intro" courses.
Tools for Data Science Course
Rated 9.8/10 on Coursera. Covers the tool layer — Jupyter, RStudio, GitHub, Watson — that most pure-Python courses skip. Worth doing early so you're not fumbling with environment setup when you want to focus on concepts.
Python for Data Science, AI & Development by IBM
Rated 9.8/10 on Coursera. IBM's track has better pacing than most Python courses for data use cases — it moves from syntax to Pandas and NumPy faster than generic Python courses and stays focused on data manipulation throughout.
Process Data from Dirty to Clean
Rated 9.8/10 on Coursera. This is the Phase 2 course most roadmaps don't include but should. Real data wrangling scenarios, not curated clean CSVs. The kind of work that makes up most of your first 6 months on the job.
Analyze Data to Answer Questions
Rated 9.8/10 on Coursera. Focuses on the analytical reasoning layer — how to frame a business question, translate it into data operations, and communicate findings. Bridges data prep and modeling in a way pure ML courses don't.
Snowflake for Data Engineers: Architecture & Performance
Rated 9.8/10 on Udemy. Optional but high-value if you're targeting data-heavy companies. Understanding how cloud warehouses work under the hood (query optimization, storage layout) shows up in senior conversations faster than most people expect.
FAQ
How long does it take to complete a data science roadmap?
For someone studying 10–15 hours per week with no prior background: roughly 12–18 months to be competitive for entry-level roles. People with a statistics or programming background often get there in 6–9 months. The variance is large because it depends heavily on whether you're building projects and applying for jobs while learning, or waiting until you feel "ready." Waiting too long is the more common mistake.
Do I need a degree to follow this roadmap?
No degree is required, but you do need something to replace it on a resume: a portfolio of 3–5 projects that solve real problems, ideally with data from domains that employers care about (business metrics, healthcare, finance, logistics). The degree question comes up less than it used to — employers in data science are more willing to evaluate work samples than many other fields.
Should I learn R or Python?
Python. The job market is clearer on this than it was five years ago — Python appears in roughly 75–80% of data science job descriptions. R still dominates in academic statistics and some pharmaceutical/biotech work. If you're targeting those specific fields, learn R. Otherwise, Python is the more transferable skill, and Python's data ecosystem (Pandas, sklearn, PyTorch) is broader.
Is machine learning required for data science roles?
It depends on the role. Analyst-track data science positions (often called "data analyst" or "business intelligence") rarely require ML. Research and engineering-track positions expect it. Most "data scientist" job titles sit somewhere in the middle — they want someone who can build a basic model, but the day-to-day is usually more SQL, dashboards, and A/B test analysis than deep learning. Know which track you're targeting before you invest months in ML.
What projects should I include in a data science portfolio?
Projects that solve an actual problem with actual data, not tutorial reproductions. Good themes: predicting something measurable (churn, price, demand), segmenting customers using clustering, analyzing a public dataset to answer a specific question no one else has answered. Put the code on GitHub, write a clear README, and — most importantly — include a section on what you found, not just how you ran the model.
What's the difference between data scientist and data engineer?
Data engineers build the infrastructure that data scientists use: pipelines, data warehouses, ingestion systems. Data scientists use that infrastructure to analyze data and build models. The skills overlap at the SQL and cloud layer. In small companies, one person often does both. In larger companies, they're distinct roles with distinct interview processes. If you like building systems, engineering might suit you better. If you like analysis and modeling, science is the direction.
Bottom Line
The data science roadmap that works is the one you actually finish. Most people don't fail because the material is too hard — they fail because they followed a curriculum that wasn't sequenced for their goal, spent months on topics with low job-market ROI, or waited too long before applying to jobs.
Follow the phase order here: statistics fundamentals → Python + data manipulation → machine learning → tooling and SQL. Start building projects in Phase 2, not after Phase 4. Apply for jobs when you have 2–3 solid portfolio projects, even if the roadmap isn't complete. The interview feedback you get from real applications is better signal than any course for figuring out what to study next.
The courses linked above are rated among the highest on Coursera and Udemy specifically for this sequence. You don't need all of them — the IBM Python course plus the two data cleaning courses and one analytics course covers 80% of the practical foundation. Add the Snowflake course if you're targeting data-engineering-adjacent roles.