Cursarium logoCursarium
intermediateCertificate$199

Machine Learning

by John Paisley · edX

4.5
(2,800 reviews)
80K+ enrolled12 weeksUpdated 2024-09

Our Verdict

Worth it — with caveats

ColumbiaX Machine Learning (CSMM.102x), taught by Columbia University professor John Paisley on edX, is a rigorous, math-first introduction to classical machine learning that is widely regarded as one of the strongest theoretical ML MOOCs available, but it is decisively not a beginner course. Across roughly 12 weeks it derives the mathematics behind supervised and unsupervised methods, from maximum-likelihood and Bayesian estimation through regression, kernels, SVMs, trees, boosting, clustering, EM, matrix factorization, PCA, and hidden Markov models. Independent aggregation by David Venturi (freeCodeCamp/Class Central) places it at a 4.8/5 weighted average, though over only 10 reviews, and it is consistently praised for Paisley's clear derivations while criticized for dense slide-driven lectures and a heavy probability/linear-algebra prerequisite load. Verified first-hand reviewers (e.g. Greg Hamel rates it 4.25/5) warn that even people who finished Andrew Ng's course had to cram, and that most who start will not finish. Take it if you want graduate-level mathematical depth and can audit it for free; skip it if you want a hands-on, applied, code-first path or any coverage of deep learning, which it deliberately omits.

Excellent for the right audience and free to audit, but it is a graduate-level, math-heavy theory course with a steep prerequisite bar and no deep-learning coverage, so it only suits learners who specifically want mathematical depth and have strong calculus/linear-algebra/probability foundations.

Best for: Learners who want to understand the mathematics behind ML algorithms rather than just call library functions: CS/engineering/stats students, working engineers preparing for ML research or graduate study, and anyone comfortable with multivariate calculus, linear algebra through SVD, and probability who wants Ivy-League-level rigor. It is also a natural fit for those pursuing Columbia's Artificial Intelligence MicroMasters, of which this course is a part.

Skip if: Complete beginners, career-switchers wanting a fast applied path, or anyone weak in calculus/linear algebra/probability. Skip it if you want a code-first, project-heavy, scikit-learn/TensorFlow workflow, hand-holding and lots of worked examples, or any coverage of neural networks and deep learning, none of which this course provides.

About This Course

ColumbiaX course covering supervised and unsupervised learning, including classification, regression, clustering, and dimensionality reduction.

What You'll Learn

Foundational estimation frameworks: maximum likelihood, maximum a posteriori (MAP), and Bayesian inference
Regression methods including least squares, ridge regression, the bias-variance tradeoff, and sparsity/regularization
Classification methods: the perceptron, logistic regression, Bayes classifiers with Gaussians, kernel methods, and support vector machines (SVMs)
Tree-based and ensemble methods: decision trees, random forests, bagging, and boosting
Unsupervised learning: k-means and k-NN, the Expectation-Maximization (EM) algorithm, and Gaussian mixture models
Latent and dimensionality methods: matrix factorization, latent factor models, and principal component analysis (PCA)
Sequential and state-space models: Markov chains, hidden Markov models, and continuous state-space models

Curriculum

Estimation foundations (early weeks)

Maximum likelihood, maximum a posteriori (MAP), and Bayesian estimation; core probability tools (expectations, variances, distributions) used throughout the course.

Regression

Linear regression, least squares, ridge regression, the bias-variance tradeoff, and sparsity. Week 3 programming project: linear and ridge regression with active learning.

Classification I

Perceptron and least-squares classification, logistic regression, the Laplace approximation, and K-class Bayes classifiers with Gaussian distributions (Week 6 programming project).

Classification II: kernels and SVMs

Kernel methods, support vector machines, and Gaussian-process-style ideas for nonlinear classification and regression.

Trees and ensembles

Decision trees, random forests, bagging, and boosting.

Clustering and mixtures

k-means and k-NN, the Expectation-Maximization (EM) algorithm, and mixtures of Gaussians. Week 9 programming project: K-means clustering and EM Gaussian Mixture Models.

Matrix factorization and PCA

Matrix factorization, latent factor models, and principal component analysis. Week 12 programming project: probabilistic matrix factorization.

Sequential models

Markov chains, hidden Markov models, and discrete/continuous state-space models.

Prerequisites

  • Multivariate calculus: gradients, partial derivatives, Lagrange multipliers, and quadratic forms
  • Linear algebra through Singular Value Decomposition (SVD): vectors, matrices, eigenvalues, null spaces, positive semi-definite matrices
  • Probability and statistics: expectations, variances, and discrete/continuous probability distributions (including Beta, Dirichlet, Poisson)
  • Programming in Python (with NumPy) or Octave to complete the graded programming projects

Instructor

John Paisley

Instructor · edX

Pros & Cons

Pros

  • Genuine mathematical depth from a strong instructor: Professor John Paisley derives the math behind each algorithm and is repeatedly described as 'brilliant, clear, and clever,' breaking down hard concepts well.
  • Broad, coherent classical-ML coverage in one course: estimation, regression, classification, kernels/SVMs, ensembles, clustering, EM, matrix factorization, PCA, and HMMs.
  • Free to audit on edX, with a verified-certificate option, and it counts toward Columbia University's Artificial Intelligence MicroMasters.
  • Concept-first design means much of the value is accessible even without heavy coding: reviewers note 'you don't really need to do any programming to get a lot out of the course.'
  • Four graded programming projects (Python/Octave) give hands-on implementation of regression, a Gaussian Bayes classifier, k-means/EM, and probabilistic matrix factorization.

Cons

  • Steep prerequisite bar: described as 'essentially a graduate-level course'; reviewers who had already finished Andrew Ng's Stanford ML still had to cram heavily, and 'most students who start the course will not complete it.'
  • Dense, slide-heavy lectures that often pack a lot of math onto a single slide and require frequent pausing and rewatching; delivery is 'unemotional and matter-of-fact,' with some reviewers noting a lack of diagrams/examples and ambiguous scalar-vs-vector-vs-matrix notation.
  • No coverage of neural networks or deep learning, so it is not a path to modern deep-learning practice on its own.
  • Rating rests on a small sample (4.8/5 over only ~10 aggregated reviews), and quizzes are frequently called frustrating and occasionally tricky to the point of penalizing correct understanding.

Alternatives To Consider

Frequently Asked Questions

Is Machine Learning free?

Machine Learning is $199. Free to audit on edX. A verified certificate is a paid upgrade; the catalog lists $199 for the certificate track (the live edX price can vary and is sometimes lower, and financial assistance is generally available on edX). The certificate also counts toward Columbia's AI MicroMasters. Note: the live edX course price could not be confirmed directly at review time, so verify the current price on the edX course page before purchasing.

Who is Machine Learning for?

Learners who want to understand the mathematics behind ML algorithms rather than just call library functions: CS/engineering/stats students, working engineers preparing for ML research or graduate study, and anyone comfortable with multivariate calculus, linear algebra through SVD, and probability who wants Ivy-League-level rigor. It is also a natural fit for those pursuing Columbia's Artificial Intelligence MicroMasters, of which this course is a part.

What will you learn in Machine Learning?

Foundational estimation frameworks: maximum likelihood, maximum a posteriori (MAP), and Bayesian inference; Regression methods including least squares, ridge regression, the bias-variance tradeoff, and sparsity/regularization; Classification methods: the perceptron, logistic regression, Bayes classifiers with Gaussians, kernel methods, and support vector machines (SVMs); Tree-based and ensemble methods: decision trees, random forests, bagging, and boosting.

What are the prerequisites for Machine Learning?

Multivariate calculus: gradients, partial derivatives, Lagrange multipliers, and quadratic forms; Linear algebra through Singular Value Decomposition (SVD): vectors, matrices, eigenvalues, null spaces, positive semi-definite matrices; Probability and statistics: expectations, variances, and discrete/continuous probability distributions (including Beta, Dirichlet, Poisson); Programming in Python (with NumPy) or Octave to complete the graded programming projects.

Is Machine Learning worth it?

Excellent for the right audience and free to audit, but it is a graduate-level, math-heavy theory course with a steep prerequisite bar and no deep-learning coverage, so it only suits learners who specifically want mathematical depth and have strong calculus/linear-algebra/probability foundations.