Cursarium logoCursarium
advancedCertificate$12.99

Artificial Intelligence: Reinforcement Learning in Python

by Lazy Programmer Inc. · Udemy

4.6
(8,500 reviews)
60K+ enrolled18 hoursUpdated 2024-09

Our Verdict

Worth it — with caveats

Artificial Intelligence: Reinforcement Learning in Python by Lazy Programmer (Lazy Programmer Inc.) is the strongest budget-priced introduction to classical reinforcement learning, but it is foundations-only despite the deep-RL marketing wrapper. Across 179 lectures in 22 sections (roughly 18-20 hours), you build every algorithm by hand in plain Python and NumPy: epsilon-greedy / UCB1 / Thompson Sampling bandits, Markov Decision Processes, dynamic programming, Monte Carlo, TD(0), SARSA, Q-learning, and linear function approximation, finishing with a Q-learning stock-trading project. Note the accuracy caveat: this course does NOT teach Deep Q-Networks from scratch despite some catalog blurbs implying it; DQN with experience replay and target networks lives in the instructor's separate 'Advanced AI: Deep Reinforcement Learning' courses. Public sentiment is consistently positive (Udemy lists 4.7/5 across roughly 10,700 ratings; the instructor's own deeplearningcourses.com page also shows 4.7/5), with learners praising the from-scratch, theory-plus-code approach. It is excellent if you want to genuinely understand tabular RL before touching neural nets, and a poor fit if you came expecting hands-on deep RL.

A high-quality, rigorously from-scratch grounding in classical/tabular reinforcement learning at a very low price, but only worth taking if you (a) actually want the fundamentals rather than deep RL, and (b) meet the real math/Python prerequisites (calculus, probability, NumPy, linear regression). The catalog's 'deep Q-networks from scratch' framing overstates the scope, so buy it for what it truly is.

Best for: Python programmers and aspiring ML/data-science practitioners who want to understand reinforcement learning at a technical, build-it-yourself level before jumping to deep RL libraries. It suits people who learn well from paired theory-then-code lectures, who are comfortable with calculus, probability, OOP, NumPy and linear/logistic regression, and who value implementing bandits, MDPs, dynamic programming, Monte Carlo and Q-learning by hand.

Skip if: Complete programming beginners, anyone wanting hands-on Deep Q-Networks / policy gradients / actor-critic (those are in the instructor's separate Advanced/Deep RL courses), people who dislike heavy theory or want a fast plug-and-play library tour, and learners who prefer the canonical Sutton & Barto academic treatment with formal exercises and rigorous proofs.

About This Course

Implement multi-armed bandits, dynamic programming, Monte Carlo, TD learning, and deep Q-networks from scratch in Python.

What You'll Learn

The multi-armed bandit problem and explore-exploit dilemma via epsilon-greedy, optimistic initial values, UCB1, and Bayesian/Thompson Sampling (including Gaussian rewards)
Markov Decision Processes: the Markov property, value functions, the Bellman equation, and optimal policies
Dynamic programming methods: iterative policy evaluation, policy iteration, and value iteration in Gridworld
Monte Carlo prediction and control (with and without exploring starts)
Temporal Difference learning: TD(0) prediction, SARSA, and Q-learning implemented from scratch
Approximation methods using linear models and feature engineering to plug a differentiable model into RL (e.g. CartPole), plus using OpenAI Gym with no code changes
A capstone project applying Q-learning to build a stock-trading bot

Curriculum

Return of the Multi-Armed Bandit

Explore-exploit dilemma, epsilon-greedy theory and code, optimistic initial values, UCB1, Bayesian/Thompson Sampling (incl. Gaussian rewards), nonstationary bandits, online learning. ~26 lectures.

High-Level Overview of Reinforcement Learning

What RL is, unusual RL strategies, and the bridge from bandits to full reinforcement learning.

Build an Intelligent Tic-Tac-Toe Agent (VIP)

Components of an RL system, the value function, and a complete first RL agent: state representation, environment, agent, and main loop.

Markov Decision Processes (MDPs)

Gridworld, the Markov property, future rewards, value functions, the Bellman equation, and optimal policy/value functions. ~14 lectures.

Dynamic Programming

Iterative policy evaluation, policy improvement, policy iteration, and value iteration, coded in Gridworld and Windy Gridworld. ~14 lectures.

Monte Carlo

Monte Carlo policy evaluation and control, including control without exploring starts, with code.

Temporal Difference Learning

TD(0) prediction, SARSA, and Q-learning, each derived and implemented in code.

Approximation Methods

Linear models and feature engineering for prediction and control, CartPole, and using OpenAI Gym; the gateway to plugging in neural networks.

Stock Trading Project with Reinforcement Learning

End-to-end project: data and environment, modeling Q for Q-learning, and a multi-part coded trading bot.

Legacy sections + Appendix/FAQ

Older versions of the bandit, MDP, DP, Monte Carlo, TD and approximation sections are retained, plus environment setup, Python-for-beginners help, and learning-strategy lectures.

Prerequisites

  • Solid Python (conditionals, loops, data structures, object-oriented programming)
  • NumPy proficiency
  • Calculus and probability theory
  • Linear regression and gradient descent (logistic regression helpful)
  • No prior reinforcement learning or TensorFlow/PyTorch knowledge required

Instructor

Lazy Programmer Inc.

Instructor · Udemy

Pros & Cons

Pros

  • Genuinely from-scratch: every algorithm (bandits, DP, Monte Carlo, SARSA, Q-learning) is implemented in plain Python/NumPy, building deep intuition rather than library button-pushing
  • Each topic pairs clear theory (Bellman equation, MDPs, explore-exploit) with immediate coded implementation and beginner exercise prompts
  • Excellent value: frequently around $12-15 on sale with lifetime access, a 30-day money-back guarantee, and a certificate of completion
  • Practical capstone (Q-learning stock-trading bot) plus OpenAI Gym integration give a tangible payoff
  • Strong, consistent learner sentiment: 4.7/5 on Udemy across roughly 10,700 ratings, and 4.7/5 on the instructor's own deeplearningcourses.com page

Cons

  • Scope is over-sold: it covers tabular RL and linear function approximation, NOT Deep Q-Networks from scratch, despite catalog/marketing language implying deep RL
  • Steep real prerequisites (calculus, probability, NumPy, linear regression) make the 'all levels' label misleading for true beginners
  • Contains large duplicated 'Legacy' sections re-teaching the same topics, which can feel cluttered and padded
  • Less academically rigorous than Sutton & Barto or university courses; light on formal exercises, proofs, and modern deep-RL methods (policy gradients, actor-critic)

Alternatives To Consider

Frequently Asked Questions

Is Artificial Intelligence: Reinforcement Learning in Python free?

Artificial Intelligence: Reinforcement Learning in Python is $12.99. Paid Udemy course, typically ~$12.99 on sale (list price is much higher; Udemy discounts heavily and prices vary by region). Includes lifetime access, certificate of completion, and a 30-day money-back guarantee. The same course is also sold on the instructor's deeplearningcourses.com. There is no free full audit, though a free preview of selected lectures is available.

Who is Artificial Intelligence: Reinforcement Learning in Python for?

Python programmers and aspiring ML/data-science practitioners who want to understand reinforcement learning at a technical, build-it-yourself level before jumping to deep RL libraries. It suits people who learn well from paired theory-then-code lectures, who are comfortable with calculus, probability, OOP, NumPy and linear/logistic regression, and who value implementing bandits, MDPs, dynamic programming, Monte Carlo and Q-learning by hand.

What will you learn in Artificial Intelligence: Reinforcement Learning in Python?

The multi-armed bandit problem and explore-exploit dilemma via epsilon-greedy, optimistic initial values, UCB1, and Bayesian/Thompson Sampling (including Gaussian rewards); Markov Decision Processes: the Markov property, value functions, the Bellman equation, and optimal policies; Dynamic programming methods: iterative policy evaluation, policy iteration, and value iteration in Gridworld; Monte Carlo prediction and control (with and without exploring starts).

What are the prerequisites for Artificial Intelligence: Reinforcement Learning in Python?

Solid Python (conditionals, loops, data structures, object-oriented programming); NumPy proficiency; Calculus and probability theory; Linear regression and gradient descent (logistic regression helpful); No prior reinforcement learning or TensorFlow/PyTorch knowledge required.

Is Artificial Intelligence: Reinforcement Learning in Python worth it?

A high-quality, rigorously from-scratch grounding in classical/tabular reinforcement learning at a very low price, but only worth taking if you (a) actually want the fundamentals rather than deep RL, and (b) meet the real math/Python prerequisites (calculus, probability, NumPy, linear regression). The catalog's 'deep Q-networks from scratch' framing overstates the scope, so buy it for what it truly is.