Data Science: Machine Learning
by Rafael Irizarry · edX
Our Verdict
Worth it — with caveatsData Science: Machine Learning (HarvardX PH125.8x, recently relisted on edX as 'Data Science: Building Machine Learning Models') is the eighth of nine courses in Harvard's Data Science Professional Certificate, taught by Professor Rafael Irizarry. Across six sections it teaches core supervised-learning workflow in R, culminating in a hands-on MovieLens movie-recommendation system that ties together cross-validation, regularization, matrix factorization and PCA. It is a genuinely strong, rigorous course with an excellent instructor, but multiple independent reviewers and students flag it as the hardest installment of the certificate and note real friction: a steep difficulty jump, assessments that reference topics (PCA, clustering) only lightly covered in lecture, and occasional buggy/assignment edge cases. Take it for the depth and the recommendation-system project, but do not treat it as a beginner-friendly first ML course.
High-quality, rigorous R-based intro to applied ML with a real capstone-style project and a respected instructor, but it assumes comfort with R, linear algebra and basic calculus and is widely reported as the steepest course in the certificate, so it is a strong 'take' for the prepared and a 'skip' for true beginners.
Best for: Learners already comfortable with R (ideally after the earlier courses in the HarvardX Data Science series) who want a rigorous, math-grounded introduction to the supervised machine-learning workflow and a concrete portfolio project (a MovieLens movie-recommendation system). A good fit for self-learners and aspiring data analysts who want to understand the why behind algorithms, not just call library functions, and who are pursuing the full HarvardX Data Science Professional Certificate.
Skip if: Complete beginners with no programming or no linear-algebra/calculus background (multiple reviewers say the difficulty 'ramps up significantly' and is out of step with the certificate's beginner framing); Python-only practitioners who do not want to learn R; and people who want production MLOps, deep learning, or modern frameworks (the course omits SVMs, boosting and neural networks and is R/caret-centric).
About This Course
Part of Harvard's Data Science certificate covering cross-validation, kNN, random forests, and recommendation systems in R.
What You'll Learn
Curriculum
Core terminology and concepts: features, outcomes, prediction vs. inference, and the supervised-learning framing used throughout the course.
Building a first algorithm with training and test sets; the role of conditional probabilities; evaluation via accuracy, confusion matrix, sensitivity/specificity, prevalence, ROC and F1; introduction to the caret workflow.
Why linear (and logistic) regression is a useful but often insufficiently flexible baseline; smoothing noisy data (e.g. loess/bin smoothing); and using matrices/matrix algebra for machine learning in R.
Distance metrics; k-nearest neighbors; k-fold and bootstrap cross-validation to tune parameters and avoid overtraining; and discriminative vs. generative approaches (naive Bayes, LDA/QDA).
Multi-class classification, classification and regression trees and random forests, the curse of dimensionality, and methods/practical use of the caret package that adapt to higher dimensions.
Capstone-style synthesis: applying the algorithms learned, regularization, principal component analysis (PCA) and matrix factorization to build a movie recommendation system on the MovieLens dataset.
Prerequisites
- Working knowledge of R and RStudio (the course assumes prior HarvardX Data Science courses such as R Basics, Visualization, Probability and Inference & Modeling)
- Comfort with basic linear algebra (matrices) and introductory calculus, plus probability/statistics fundamentals
- Familiarity with tidyverse-style data wrangling; no prior machine-learning experience required, but the pace assumes mathematical maturity
Instructor
Rafael Irizarry
Instructor · edX
Pros & Cons
Pros
- Excellent instruction from Professor Rafael Irizarry (Harvard biostatistics): reviewers consistently note he explains clearly, succinctly and slowly enough that motivated learners can follow even difficult material
- Rigorous, math-grounded treatment that teaches the intuition and mechanics behind algorithms (cross-validation, regularization, generative models) rather than just API calls
- A concrete, portfolio-worthy capstone project: building a MovieLens movie-recommendation system that integrates regularization, PCA and matrix factorization
- Free to audit, self-paced, and backed by an active discussion board and a free companion textbook (Irizarry's 'Introduction to Data Science'), so the full content is accessible at no cost
- Hands-on R/caret practice that maps directly onto real data-analysis work and prepares learners for the certificate's capstone
Cons
- Steep, widely reported difficulty spike: independent reviewers and students call it the hardest course in the certificate and say it is poorly aligned with the program's 'no experience needed' beginner framing
- Assessment mismatch: several reviewers report graded assignments/exam questions that reference topics (e.g. PCA, clustering) given only light lecture coverage, plus occasional bugs or unclear edge cases in assignments
- Narrow algorithmic and tooling scope: it is R/caret-centric and omits widely used methods such as SVMs, boosting/gradient boosting and neural networks/deep learning
- R-only: not suitable for learners who want or need Python, which is the more common industry ML language
Alternatives To Consider
Frequently Asked Questions
Is Data Science: Machine Learning free?
Data Science: Machine Learning is $149. Free to audit the full course content; a verified certificate costs $149 (edX). The certificate is only needed if you want the credential or are completing the paid HarvardX Data Science Professional Certificate; the lectures, exercises and companion textbook are otherwise available at no cost. Audit access may be time-limited on self-paced runs. Verify the current price on edX, as edX promo discounts and certificate pricing change.
Who is Data Science: Machine Learning for?
Learners already comfortable with R (ideally after the earlier courses in the HarvardX Data Science series) who want a rigorous, math-grounded introduction to the supervised machine-learning workflow and a concrete portfolio project (a MovieLens movie-recommendation system). A good fit for self-learners and aspiring data analysts who want to understand the why behind algorithms, not just call library functions, and who are pursuing the full HarvardX Data Science Professional Certificate.
What will you learn in Data Science: Machine Learning?
The basics of machine learning: training vs. test sets, conditional probabilities, and how to evaluate algorithms with accuracy, confusion matrices, sensitivity/specificity and F1; How to perform cross-validation to estimate error and avoid overtraining; Several popular algorithms, including k-nearest neighbors (kNN), linear/logistic regression for prediction, smoothing, and tree-based and generative methods (e.g. naive Bayes, QDA/LDA); How to use the caret package to train, tune and compare models and handle higher-dimensional classification.
What are the prerequisites for Data Science: Machine Learning?
Working knowledge of R and RStudio (the course assumes prior HarvardX Data Science courses such as R Basics, Visualization, Probability and Inference & Modeling); Comfort with basic linear algebra (matrices) and introductory calculus, plus probability/statistics fundamentals; Familiarity with tidyverse-style data wrangling; no prior machine-learning experience required, but the pace assumes mathematical maturity.
Is Data Science: Machine Learning worth it?
High-quality, rigorous R-based intro to applied ML with a real capstone-style project and a respected instructor, but it assumes comfort with R, linear algebra and basic calculus and is widely reported as the steepest course in the certificate, so it is a strong 'take' for the prepared and a 'skip' for true beginners.
How we reviewed this course
This is an independent editorial assessment by Cursarium, based on edX's published course materials and aggregated public learner feedback (last reviewed 2026-06). We have not independently completed the course. Links to providers are standard references, not paid placements.
Sources
- Official Harvard course page (pll.harvard.edu) - Data Science: Machine Learning (what you'll learn, 8 weeks, 2-4 hrs/week, free to audit / $149 certificate)
- HarvardX PH125.8x course page on edX (static course outline) - six section titles and descriptions
- GitHub - gmineo/Harvard-Data-Science-Professional, PH125.8x section directory (verified six-section curriculum structure)
- The Data Student - independent HarvardX Data Science: Machine Learning review (difficulty, prerequisites, who should skip)
- Class Central course listing - Data Science: Building Machine Learning Models (Harvard) with aggregated student review themes (PCA/clustering in assignments, bugs, missing SVM/boosting)