Initial commit GPU support

cliff-bohm · Apr 20, 2018 · d99f92d · d99f92d
1 parent 59533d4
commit d99f92d
Show file tree

Hide file tree

Showing 40 changed files with 4,554 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@ env/
 *.p
 mujoco/mjpro131
 *.h5
+gym_tensorflow.so
diff --git a/gpu_implementation/README.md b/gpu_implementation/README.md
@@ -0,0 +1,53 @@
+## AI Labs - GPU Neuroevolution
+This folder constains preliminary work done to implement GPU-based deep neuroevolution.
+For problems like Atari where the policy evaluation takes a considerable amount of time it is advantageous to make use of GPUs to evaluate the Neural Networks. This code shows how it is possible to run Atari simulations in parallel using the GPU in a way where we can evaluate neural networks in batches and have both CPU and GPU operating at the same time.
+
+This folder has code in prototype stage and still requires a lot of changes to optimize performance, maintanability, and testing. We welcome pull requests to this repo and have plans to improve it in the future. Although it can run on CPU-only, it is slower than our original implementation due to overhead. Once this implementation has matured we plan on distributing it as a package for easy installation. We included an implementation of the HardMaze, but the GA-NS implementation will be added later on.
+
+## Installation
+
+clone repo
+
+```
+git clone https://github.com/uber-common/deep-neuroevolution.git
+```
+
+create python3 virtual env
+
+```
+python3 -m venv env
+. env/bin/activate
+```
+
+install tensorflow or tensorflow-gpu > 1.2.
+```
+pip install tensorflow-gpu
+```
+Follow instructions under ./gym_tensorflow/README on how to compile the optimized interfaces.
+
+To train GA on Atari just run:
+```
+python ga.py ga_atari_config.json
+```
+Random search (It's a special case of GA where 0 individuals become parents):
+```
+python ga.py ra_atari_config.json
+```
+
+Evolution Strategies:
+```
+python es.py es_atari_config.json
+```
+
+Visualizing policies is possible if you install gym with `pip install gym` and run:
+```
+python -m neuroevolution.display
+```
+We currently have one example policy but more will be added in the future.
+
+## Breakdown
+
+* gym_tensorflow - Folder containing TensorFlow custom ops for Reinforcement Learning (Atari, Hard Maze).
+  * moving away from python-based environments has significant speed ups on a multithreaded environment.
+* neuroevolution - folder containing source code to evaluate many policies simultaneously.
+  * concurrent_worker.py - Improved implementation where each thread can evaluate a dynamic sized batch of policies at a time. Needs custom Tensorflow ops.
diff --git a/gpu_implementation/configurations/es_atari_config.json b/gpu_implementation/configurations/es_atari_config.json
@@ -0,0 +1,18 @@
+{
+    "game": "frostbite",
+    "model": "ModelVirtualBN",
+    "num_validation_episodes": 30,
+    "num_test_episodes": 200,
+    "population_size": 5000,
+    "timesteps": 250e6,
+    "episode_cutoff_mode": 5000,
+    "return_proc_mode": "centered_rank",
+    "l2coeff": 0.005,
+    "mutation_power": 0.02,
+    "optimizer": {
+        "args": {
+            "stepsize": 0.01
+        },
+        "type": "adam"
+    }
+}
diff --git a/gpu_implementation/configurations/ga_atari_config.json b/gpu_implementation/configurations/ga_atari_config.json
@@ -0,0 +1,12 @@
+{
+    "game": "frostbite",
+    "model": "Model",
+    "num_validation_episodes": 30,
+    "num_test_episodes": 200,
+    "population_size": 1000,
+    "episode_cutoff_mode": 5000,
+    "timesteps": 1.5e9,
+    "validation_threshold": 10,
+    "mutation_power": 0.002,
+    "selection_threshold": 20
+}
diff --git a/gpu_implementation/configurations/rs_atari_config.json b/gpu_implementation/configurations/rs_atari_config.json
@@ -0,0 +1,12 @@
+{
+    "game": "frostbite",
+    "model": "Model",
+    "num_validation_episodes": 30,
+    "num_test_episodes": 200,
+    "population_size": 1000,
+    "episode_cutoff_mode": 5000,
+    "timesteps": 1.5e9,
+    "validation_threshold": 10,
+    "mutation_power": 0.002,
+    "selection_threshold": 0
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -6,3 +6,4 @@ env/ @@
     *.p
     mujoco/mjpro131
     *.h5
+    gym_tensorflow.so