Deep-Q-Learning

Implementation of Deep Q-Learning and Double Deep Q-Learning algorithms for the Highway-env Gym environment.

Uses a Conv Net to approximate the Q-function. Supports stacking of frames to capture temporal information with an LSTM layer. In both algorithms, the model is trained using a replay buffer and target network to stabilize learning.