Berkeley DeepRL course Homework

Result on HW1

With given hyperparameter, I was able to get the results below.

Envs	Expert Reward Mean(std)	Behavior Cloning	DAgger
Ant-v1	4802.707680(83.771094)	907.181835(1.765161)	515.062574(2.715588)
HalfCheetah-v1	4126.918521(75.359644)	4138.678126(68.150322)	4112.790801(61.154570)
Hopper-v1	3777.821053(3.777258)	3776.561768(3.774195)	3783.089913(4.757560)
Humanoid-v1	10429.852380(51.341089)	367.905249(19.686467)	313.611396(11.924769)
Reacher-v1	-3.894341(1.580284)	-13.215903(3.970900)	-13.954921(4.214502)
Walker2d-v1	5523.786277(50.682188)	4305.271517(1845.930949)	5516.414445(51.565740)

	Behavior Cloning	Dagger
HalfCeetah-v1
Hopper-v1
Walker2D-v1

HalfCheetah, Hopper, Walker2d were trainable and others were not with fixed hyperparameters
In all three successful cases, DAgger gives better performance (higher rewards, lower std.)

Result on HW4

Question 1

I was able to train a policy with policy gradient method and given default hyperparameter and linear value function approximator.

Trained Result

Question 2

Changing value function approximator from linear approximator to neural network does not provide any benefit in trainig.
CartPole
Pendulum
At the beggining, it fails to predict a value (negative explained variance, worse than predicting a constant); it could be better if we put some "annealing" steps.
Or, it might require more sohpiscated hyperparameter search.

TODO

HW 1; Imitation Learning, Dagger
HW 2
HW 3
HW 4; Simple Policy Gradient

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
hw1		hw1
hw2		hw2
hw3		hw3
hw4		hw4
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Berkeley DeepRL course Homework

Result on HW1

Result on HW4

Question 1

Question 2

TODO

About

Releases

Packages

Languages

License

hiwonjoon/cs294-deep-rl-hw

Folders and files

Latest commit

History

Repository files navigation

Berkeley DeepRL course Homework

Result on HW1

Result on HW4

Question 1

Question 2

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages