This is the DQN implementation written by myself using OpenAI gym and keras.
The project has been officially abandoned. But! I have built another more powerful project containing DQN and episodic control, please go there and have a look Model-Free-Episodic-Control
is the main script
stores DQN agent
stores training parameters
contains image preprocessing functions
shows that only amalgamate two frames odd and even is not enough to obliterate flickering.
is a script to show the difference in actions of OpenAI gym and orignial DQN paper, irrelevant to the main functions.
is a script to describe indexing problem in network.
##Exploration and Discoveries ###Building dqn network
I build two models for Q and Q hat. I set Q hat to be untrainable. In addition I add a disconnected_grad to Q hat like has done, however I think that is unnecessary.
This problem is interesting. Given Q_S a matrix of batch_size * num_action, and A a matrix of batch_size * 1, we want to have Q_S[i, A[i]].
In numpy we can do:
batch_size = Q_S.shape[0]
Q_S[range(batch_size), A.reshape(batch_size)]
But in theano, compiling range(Q_S.shape[0]) will raise an error.
Two ways of solution:
- use theano.scan
- use a mask to do indexing like I did
Details are in
described, I do not think adding two frames is enough to solve the flickering problem.
###Frame Skipping After chatting with jietang and Greg Brockman I found out that OpenAI gym has already implemented frame skipping in _step() function in
Another finding is action difference described in
##Notes Sometimes we need to know current lives in atari games. So I sent a pull request to OpenAI gym. openai/gym#163
##Incomplete Functions: experience relay
Prioritized Experience Replay
double DQN
dueling DQN
Playing Atari with Deep Reinforcement Learning, V. Mnih et al., NIPS Workshop, 2013.
Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.
Some repositories really gave me much inspiration. They are in