Machine Learning Cheat Sheet: Algorithms, Metrics & When to Use Each

Most people searching for a machine learning cheat sheet already know what supervised vs. unsupervised learning means. What they actually need is a single reference that answers the decision question: given this dataset and this problem, which algorithm do I reach for first, and how do I know if it's working? That's what this guide covers.

Machine Learning Cheat Sheet: Algorithm Selection

The single most common mistake beginners make is treating algorithm selection as a prestige contest—reaching for gradient boosting or neural nets before checking whether logistic regression would solve the problem in five minutes. Start simple. Complexity earns its place only when simpler models demonstrably fail.

Supervised Learning Algorithms

  • Linear / Logistic Regression — First choice for structured tabular data when you need an interpretable baseline. Logistic regression is particularly strong for binary classification when features are roughly linearly separable. Regularize with L1 (Lasso) to do feature selection, L2 (Ridge) to handle collinearity.
  • Decision Trees — Interpretable, handle mixed feature types, zero need for scaling. Overfit aggressively on their own; treat them as building blocks, not final models.
  • Random Forest — Bagging of decision trees. Robust default for most tabular classification and regression tasks. Handles missing values (with imputation), scales to moderate dataset sizes without much tuning.
  • Gradient Boosted Trees (XGBoost / LightGBM / CatBoost) — State of the art on structured/tabular data. Wins most Kaggle competitions involving non-image, non-text data. Requires more careful hyperparameter tuning than random forest. LightGBM is fastest on large datasets; CatBoost handles categoricals natively.
  • Support Vector Machines (SVM) — Strong in high-dimensional spaces (text classification, bioinformatics). Kernel trick allows non-linear decision boundaries. Slow on large datasets; replaced in practice by gradient boosting for most tabular tasks.
  • k-Nearest Neighbors (kNN) — No training phase, but prediction is expensive at scale. Useful for anomaly detection and as a sanity check. Sensitive to feature scaling—always normalize inputs.
  • Neural Networks — Necessary for images, audio, text, and time-series at scale. Overkill for small tabular datasets. Require more data, compute, and tuning than tree-based methods.

Unsupervised Learning Algorithms

  • k-Means — Fast, interpretable, works well when clusters are roughly spherical and similarly sized. Use the elbow method or silhouette score to pick k. Fails on elongated or irregularly shaped clusters.
  • DBSCAN — Finds clusters of arbitrary shape, identifies noise/outliers natively. Parameters (eps, min_samples) require domain knowledge to set well.
  • Hierarchical Clustering — Produces a dendrogram showing relationships at all granularities. Computationally expensive (O(n²)); practical only for smaller datasets.
  • PCA (Principal Component Analysis) — Linear dimensionality reduction. Use for visualization, noise reduction, or as preprocessing before clustering. Explained variance ratio tells you how many components to keep.
  • t-SNE / UMAP — Non-linear dimensionality reduction for visualization only. Do not use t-SNE outputs as features for downstream models; distances in 2D are not preserved meaningfully.

Evaluation Metrics Machine Learning Cheat Sheet

Picking the wrong metric is how models pass review and fail in production. Accuracy on a 99:1 imbalanced dataset is meaningless. Here's a quick reference organized by problem type.

Classification Metrics

  • Accuracy — (TP + TN) / total. Misleading on imbalanced classes. Use only as a sanity check.
  • Precision — TP / (TP + FP). "Of all positive predictions, how many were actually positive?" Use when false positives are costly (spam filter flagging legitimate email).
  • Recall (Sensitivity) — TP / (TP + FN). "Of all actual positives, how many did we catch?" Use when false negatives are costly (cancer screening).
  • F1 Score — Harmonic mean of precision and recall. Balanced metric for imbalanced datasets. Use F-beta to weight precision vs. recall differently.
  • ROC-AUC — Area under the receiver operating characteristic curve. Threshold-independent; good for ranking models. Use PR-AUC instead when positive class is rare (<5% of data).
  • Log Loss — Penalizes confident wrong predictions heavily. Use when probability calibration matters (churn prediction, ad click-through).

Regression Metrics

  • MAE (Mean Absolute Error) — Intuitive, same unit as target. Robust to outliers. Use when large errors aren't disproportionately worse than small ones.
  • RMSE (Root Mean Squared Error) — Penalizes large errors more than MAE. Standard in forecasting competitions. Sensitive to outliers.
  • R² (Coefficient of Determination) — Proportion of variance explained. Useful for comparing models on the same dataset; not useful across different targets.
  • MAPE (Mean Absolute Percentage Error) — Scale-independent. Blows up when actuals are near zero; use sMAPE as a safer alternative.

Clustering Metrics

  • Silhouette Score — Measures cohesion vs. separation. Range [-1, 1]; higher is better. Works without ground truth labels.
  • Davies-Bouldin Index — Lower is better. Also ground-truth-free. Less intuitive than silhouette.
  • Adjusted Rand Index (ARI) — Requires ground truth labels. Measures cluster assignment similarity; adjusted for chance.

Bias, Variance & the Core Trade-offs

This is the conceptual core of every machine learning cheat sheet worth saving.

  • High bias (underfitting) — Model is too simple. Training error and validation error are both high. Fix: more complex model, more features, less regularization.
  • High variance (overfitting) — Model memorizes training data. Training error is low, validation error is high. Fix: more data, regularization, simpler model, dropout (for neural nets), pruning (for trees).
  • The bias-variance trade-off — Total error = bias² + variance + irreducible noise. You can't eliminate irreducible noise. Tuning is the act of finding the model complexity that minimizes bias² + variance on your specific dataset.

Cross-Validation Rules

  • k-Fold (k=5 or 10) is the default for most problems.
  • Stratified k-Fold when classes are imbalanced.
  • Time-series: never shuffle. Use forward-chaining (walk-forward) splits.
  • Group k-Fold when samples from the same entity must stay in the same fold (e.g., patient records).

Feature Engineering Quick Reference

Model architecture is often the last lever to pull. Feature engineering is usually where the real performance gains come from.

  • Scaling — StandardScaler (zero mean, unit variance) for distance-based algorithms and neural nets. MinMaxScaler for algorithms that require bounded inputs. Tree-based methods don't need scaling.
  • Encoding categoricals — One-hot for low cardinality (<15 categories). Target encoding or embeddings for high cardinality. Ordinal encoding when categories have a meaningful order.
  • Handling missing values — Mean/median imputation for numerical (add a "was_missing" indicator column). Mode imputation for categoricals. Consider model-based imputation (iterative imputer) for structured missingness.
  • Feature creation — Interaction terms, polynomial features, date decomposition (day of week, month, is_holiday), aggregations (rolling mean, lag features for time-series).
  • Feature selection — Correlation filter removes redundant columns. Recursive Feature Elimination (RFE) for smaller feature sets. Feature importance from tree models as a fast heuristic.

Top Courses to Go Beyond the Cheat Sheet

A cheat sheet is a reference, not a curriculum. If you want to go from memorizing terms to actually building and deploying models, these courses are worth the time.

Structuring Machine Learning Projects

Andrew Ng's course cuts straight to the decisions practitioners actually get wrong: how to set up train/dev/test splits, diagnose bias vs. variance, and prioritize what to fix next. It's the fastest way to develop judgment about ML projects, not just technique. Rated 9.8 on Coursera.

Applied Machine Learning in Python

Covers scikit-learn end-to-end with a focus on practical implementation—feature pipelines, model selection, and evaluation. Strong choice if you already understand the theory and want hands-on Python fluency. Rated 9.7 on Coursera.

Machine Learning: Regression

Goes significantly deeper than most regression coverage—ridge and lasso regularization, polynomial features, gradient descent from scratch, and interpreting coefficients correctly. Essential groundwork before moving to ensemble methods. Rated 9.7 on Coursera.

Machine Learning: Classification

Covers logistic regression, decision trees, boosting, and precision-recall trade-offs with proper treatment of imbalanced data—exactly the topics this cheat sheet summarizes. Rated 9.7 on Coursera.

Cluster Analysis and Unsupervised Machine Learning in Python

One of the few courses that covers k-means, hierarchical clustering, GMMs, and density-based methods with practical Python implementations rather than just theory. Rated 9.7 on Udemy.

Production Machine Learning Systems

Addresses the gap most courses ignore: what happens after you train a model. Covers serving infrastructure, monitoring, data drift, and system design for ML pipelines. Rated 9.7 on Coursera.

FAQ

What should a machine learning cheat sheet include?

At minimum: an algorithm selection guide organized by problem type, a metrics reference for classification and regression, and the bias-variance framework. The most useful cheat sheets also include feature engineering rules of thumb and cross-validation guidelines—the decisions you'll repeat on every project.

Which machine learning algorithm should I start with?

For classification: logistic regression, then random forest. For regression: linear regression, then gradient boosted trees. Start simple, establish a baseline, then add complexity only if the simpler model's errors are systematic and meaningful. Most "we need a neural net" decisions turn out to be wrong on tabular data.

How do I choose between precision and recall?

Ask what the cost of each type of error is. If a false negative is more costly than a false positive (missed disease, missed fraud), optimize for recall. If a false positive is more costly (flagging a legitimate transaction as fraud and blocking it), optimize for precision. F1 is a reasonable default when you genuinely don't know.

Do I need to scale features for every algorithm?

No. Tree-based algorithms (decision trees, random forests, gradient boosting) are invariant to feature scale—scaling does nothing for them. Scale for: linear models, SVMs, kNN, k-means, PCA, and neural networks. When in doubt, scaling doesn't hurt and often helps.

What's the difference between validation set and test set?

The validation set is used during development to tune hyperparameters and compare models. The test set is held out entirely until final evaluation—it's your estimate of real-world performance. If you tune on the test set even once, your performance estimate becomes optimistic and unreliable. Treat the test set like a contract: one look, at the end.

Is it worth memorizing machine learning formulas?

For interviews, yes—knowing the gradient of cross-entropy loss or the formula for information gain signals depth. For day-to-day work, understanding what a metric measures and when it misleads you matters more than deriving it from scratch. Focus on intuition first, derivations second.

Bottom Line

Save this machine learning cheat sheet as a reference, not a shortcut. The goal isn't to memorize every formula—it's to build enough intuition that you can read a problem and immediately know which direction to explore. Algorithm selection, metric choice, and feature engineering are judgment calls, not lookup tables. The fastest way to develop that judgment is to build models on real messy datasets, make mistakes, and trace them back to a wrong decision in this list.

If you're building that skill from scratch, start with Structuring Machine Learning Projects for decision-making frameworks, then work through the regression and classification courses to build technical depth. Add the production systems course once you're deploying models—that's where most self-taught practitioners have gaps.

Looking for the best course? Start here:

Related Articles

More in this category

Course AI Assistant Beta

Hi! I can help you find the perfect online course. Ask me something like “best Python course for beginners” or “compare data science courses”.