Welcome! This course is jointly taught by UC Berkeley and the Tsinghua-Berkeley Shenzhen Institute (TBSI).
- Prof. Scott Moura (UC Berkeley) <smoura [at] berkeley.edu>
- Co-Instructor Saehong Park (UC Berkeley) <sspark [at] berkeley.edu>
- TA Xinyi Zhou (TBSI) <zxyyx48 [at] 163.com>
China Time | California Time |
---|---|
July 7, 8, 9, 10 (Tu-F) | July 6, 7, 8, 9 (M-Th) |
July 14, 15, 16, 17 (Tu-F); | July 13, 14, 15, 16, 17 (M-Th) |
all at 08:30-10:05 China Time | all at 5:30pm PT - 7:05pm PT |
Day | Topic | Speaker | Pre-recorded Lecture | Slides / Notes | Recordings |
---|---|---|---|---|---|
1 | 1a. Introduction - Course Org | Scott Moura | Zoom Recording PW: 1e*OV@Re | LEC1a Slides | Recording Link PW: 9L%JePa= |
1b. Introduction – History of RL | Scott Moura | Zoom Recording PW: 1k.E69^o | LEC1a Slides | ||
1c. Optimal Control Intro | Scott Moura | Zoom Recording PW: 2B&=2@*@ | |||
2 | 2a. Dynamic Programming | Scott Moura | Zoom Recording PW: 3F*1rg%? | LEC2a Notes | Recording Link PW: 8Q?#51=J |
2b. Case Study: Linear Quadratic Regulator (LQR) | Scott Moura | Zoom Recording PW: 5Y#4=58& | LEC2b Notes | ||
3 | 3a. Policy Evaluation & Policy Improvement | Scott Moura | Zoom Recording PW: 9N@%H4&@ | LEC3a Notes | Recording Link PW: 1A@@0G63 |
3b. Policy Iteration Algo | Scott Moura | Zoom Recording PW: 6y+!+6#9 | LEC3b Notes | ||
3c. Case Study: LQR | Scott Moura | Zoom Recording PW: 6D@YkC&= | LEC3c Notes | ||
4 | 4a. Approximate DP: TD Error & Value Function Approx. | Scott Moura | Zoom Recording PW: 6v&78$We | LEC4a Notes | Recording Link PW: 4t=#ye7T |
4b. Case Study: LQR | Scott Moura | Zoom Recording PW: 1O^fh.8+ | LEC4b Notes | ||
4c. Online RL with ADP | Scott Moura | Zoom Recording PW: 0q=.4378 | LEC4c Notes | ||
5 | 5a. Actor-Critic Method | Scott Moura | |||
5b. Case Study: Offshore Wind | Scott Moura | ||||
6 | 6a. Q-Learning | Saehong Park | |||
6b. Q-Learning / Policy Gradient | Saehong Park | ||||
7 | 7a. Policy Gradient / Actor-Critic | Saehong Park | |||
7b. Actor-Critic | Saehong Park | ||||
8 | 8a. RL for Energy Systems | Saehong Park | |||
8b. Case Study: Battery Fast-charging | Saehong Park |
-
Optimal Control
-
Dynamic Programming
- Principal of Optimality & Value Functions
- Case Study: Linear Quadratic Regulator (LQR)
- Principal of Optimality & Value Functions
-
Policy Evaluation & Policy Improvement
- Policy Iteration Algo & Variants
- Case Study: LQR
-
Approximate Dynamic Programming (ADP)
- Temporal Difference (TD) Error
- Value Function Approximation
- Case Study: LQR
- Online RL with ADP
- Actor-Critic Method
- Case Study: Offshore Wind
-
Q-Learning
- Q-learning algorithm
- Advanced Q-learning algorithm, i.e., DQN
-
Policy Gradient
- Vanilla policy gradient (REINFORCE)
-
Actor-Critic using Policy Gradient
- Actor-Critic using Policy Gradient
- Advanced Actor-Critic algorithm, i.e., DDPG
-
RL for energy systems
- Case Study: Battery Fast-charging
- 2020 Lecture Notes [Updated 2020-7-08]
- 2019 Lecture Notes