Intermediate Machine Learning
by Alexis Cook · Kaggle
Our Verdict
Worth takingKaggle's free Intermediate Machine Learning is a focused, ~4-hour micro-course (an Introduction plus six hands-on lessons) by Alexis Cook that teaches the practical scikit-learn and XGBoost workflow needed to make real tabular models work: handling missing values, encoding categorical variables, bundling preprocessing into pipelines, cross-validation, gradient boosting with XGBoost, and detecting data leakage. Our verdict from analyzing the official syllabus and the GitHub mirror of its notebooks plus independent reviews: it is one of the best free ways to close the gap between a beginner who can fit a model and a practitioner who can structure a defensible ML workflow on messy data. Its genuine strength is that every lesson pairs a short reading with an in-browser coded exercise on the Ames Housing dataset, so you actually run SimpleImputer, OneHotEncoder, Pipeline, cross_val_score, and XGBRegressor (with n_estimators, learning_rate, and early_stopping_rounds) yourself. The honest limitation, flagged by independent reviewers, is that it treats algorithms as black boxes (no math or theory of how XGBoost works) and that several later lessons lean on copying the provided code rather than writing it from scratch. Take it as a fast, free skills top-up, not as a course that will teach you why the methods work.
It is free, short (~4 hours), and delivers exactly the high-leverage tabular-ML skills (pipelines, cross-validation, XGBoost, leakage) that beginners typically lack, with hands-on coded exercises and a free completion certificate. The only real caveat is its deliberate lack of theory and some copy-paste-heavy lessons, which keeps it from being a standalone education but does not undermine its value as a focused practical add-on.
Best for: Learners who have finished Kaggle's Intro to Machine Learning (or equivalent) and can already train a basic scikit-learn model, and now want to handle real-world messy data and ship more accurate models. Ideal for aspiring data analysts/scientists preparing for Kaggle tabular competitions or job tasks, bootcamp students wanting a free practical supplement, and working developers who need the practical XGBoost + pipeline workflow quickly without theory.
Skip if: Complete beginners who have never trained a model (start with Kaggle Intro to Machine Learning first), and anyone who wants to understand the mathematics and internals of the algorithms (how gradient boosting actually works), since the course intentionally treats models as black boxes. Also not ideal for those focused on deep learning, neural networks, NLP, or computer vision, as the scope is strictly classical tabular ML in scikit-learn and XGBoost.
About This Course
Handle missing values, categorical variables, pipelines, cross-validation, XGBoost, and data leakage in ML workflows.
What You'll Learn
Curriculum
Sets up the course and the workflow, building on Intro to Machine Learning; uses the Housing Prices (Ames Housing) competition dataset for the exercises.
Three strategies for missing data: drop columns, impute with SimpleImputer (mean/median/most-frequent), and imputation extended with an indicator column flagging where values were originally missing.
Using non-numeric data in models via three approaches: dropping categorical columns, ordinal encoding for ranked categories, and one-hot encoding to avoid imposing a false order.
Bundling preprocessing and modeling with ColumnTransformer and Pipeline for cleaner code, fewer bugs, and easier deployment; pipelines also enable clean cross-validation.
k-fold cross-validation with cross_val_score to get a more robust performance estimate than a single train/validation split, and how it interacts with pipelines.
Gradient boosting with XGBRegressor from the xgboost library (outside scikit-learn but included for its competition performance); tuning n_estimators, learning_rate, and early_stopping_rounds, evaluated with mean_absolute_error on the Ames Housing data.
Identifying and removing leakage: target leakage (predictors containing information unavailable at prediction time) versus train-test contamination (validation data influencing preprocessing), with case-study examples.
Prerequisites
- Completion of Kaggle's Intro to Machine Learning (or equivalent ability to train a basic scikit-learn model and use train/validation splits)
- Working Python knowledge (functions, loops, imports)
- Basic pandas familiarity (DataFrames, reading CSVs, selecting columns)
- No setup required — exercises run in Kaggle's in-browser notebooks
Instructor
Alexis Cook
Instructor · Kaggle
Pros & Cons
Pros
- Completely free, no paywall, and grants a Kaggle certificate of completion; all coding is done in Kaggle's in-browser notebooks with no local setup
- Tightly scoped to high-leverage, immediately useful tabular-ML skills — pipelines, cross-validation, XGBoost, and leakage prevention — that beginners commonly miss
- Hands-on by design: each lesson pairs a short reading with a coded exercise on the real Ames Housing dataset, so you actually run SimpleImputer, OneHotEncoder, Pipeline, cross_val_score, and XGBRegressor
- Very short time investment (~4 hours), making it an efficient supplement after an intro ML course or before a Kaggle competition
- Taught by Alexis Cook, a recognized Kaggle Learn instructor, with a clear, practical, step-by-step style noted positively by independent reviewers
Cons
- Intentionally treats algorithms as black boxes — it teaches how to use XGBoost but not how gradient boosting works, so you must look elsewhere for the underlying math and theory
- Several later lessons are copy-paste heavy: independent reviewers note most original coding happens in the Missing Values and Categorical Variables lessons, while later exercises largely reuse the provided code
- Narrow scope: strictly classical tabular ML in scikit-learn/XGBoost, with no coverage of deep learning, neural networks, NLP, or model deployment beyond pipelines
- Short length means limited practice volume — you will need separate competitions or projects to truly internalize the skills
Alternatives To Consider
Frequently Asked Questions
Is Intermediate Machine Learning free?
Yes — Intermediate Machine Learning is free to access. 100% free. There is no paid tier and no audit-vs-paid distinction — the full course, in-browser notebooks, and a Kaggle certificate of completion are all free with a Kaggle account.
Who is Intermediate Machine Learning for?
Learners who have finished Kaggle's Intro to Machine Learning (or equivalent) and can already train a basic scikit-learn model, and now want to handle real-world messy data and ship more accurate models. Ideal for aspiring data analysts/scientists preparing for Kaggle tabular competitions or job tasks, bootcamp students wanting a free practical supplement, and working developers who need the practical XGBoost + pipeline workflow quickly without theory.
What will you learn in Intermediate Machine Learning?
Handle missing values three ways: dropping columns, imputation with SimpleImputer, and imputation plus a 'was-missing' indicator column; Encode categorical (non-numeric) variables using ordinal encoding and one-hot encoding, and know when each is appropriate; Bundle preprocessing and modeling into scikit-learn Pipelines (with ColumnTransformer) for cleaner, less bug-prone, production-ready code; Use k-fold cross-validation via cross_val_score to estimate model performance more reliably than a single validation split.
What are the prerequisites for Intermediate Machine Learning?
Completion of Kaggle's Intro to Machine Learning (or equivalent ability to train a basic scikit-learn model and use train/validation splits); Working Python knowledge (functions, loops, imports); Basic pandas familiarity (DataFrames, reading CSVs, selecting columns); No setup required — exercises run in Kaggle's in-browser notebooks.
Is Intermediate Machine Learning worth it?
It is free, short (~4 hours), and delivers exactly the high-leverage tabular-ML skills (pipelines, cross-validation, XGBoost, leakage) that beginners typically lack, with hands-on coded exercises and a free completion certificate. The only real caveat is its deliberate lack of theory and some copy-paste-heavy lessons, which keeps it from being a standalone education but does not undermine its value as a focused practical add-on.
How we reviewed this course
This is an independent editorial assessment by Cursarium, based on Kaggle's published course materials and aggregated public learner feedback (last reviewed 2026-06). We have not independently completed the course. Links to providers are standard references, not paid placements.
Sources
- Official course page — Kaggle Learn: Intermediate Machine Learning
- GitHub mirror of course notebooks (lesson titles and per-lesson content) — gabboraron/Intermediate-Machine-Learning-Kaggle
- Raw XGBoost lesson notebook (XGBRegressor, n_estimators, learning_rate, early_stopping_rounds, MAE, Ames Housing) — drakearch/kaggle-courses
- Independent review noting strengths and the black-box / copy-paste weaknesses — Bjornstrom, 'Course review: Intermediate Machine Learning on Kaggle'
- Class Central course listing (free status, provider, student reviews)