Are there any baselines/benchmarks for continuous action versions of environments? #128

Chulabhaya · 2025-01-06T15:47:42Z

Hi all! I noticed that some of the environments like SMAX and MPE offer continuous action versions of the environments. I put together a continuous action MAPPO based on the existing discrete action MAPPO, but I have no way to tell whether it's achieving expected performance or not. Do you guys happen to have any internal plots of curves showing learning performance on continuous action versions of SMAX and MPE? Thanks in advance!

amacrutherford · 2025-01-08T18:34:14Z

Hey! How does your performance compare the to the discrete implementations? I ran these a while ago so do not have the plots to hand

Chulabhaya · 2025-01-09T02:07:48Z

I've only tried two environments so far, but on MPE Simple Spread it gets pretty close, but on SMAX 2s3z it doesn't learn much at all. However I realized I had a bug in my implementation which is that I forgot to account for making sure the actions sampled from the Gaussian fit within the [0, 1] range of the actions for MPE and SMAX. How did you handle that in your original testing? Clipping the sampled actions with min 0 max 1, using a squashed tanh, etc.?

Chulabhaya · 2025-01-13T00:33:54Z

So with clipping implemented for actions, I'm seeing ~20-25% winrate on SMAX 2s3z after 10 million timesteps. This is obviously significantly worse vs. the discrete version which completely solves that environment within ~2-3 million timesteps. However the continuous action space does also make the problem harder; do these results seem on-par or does something seem off?

Here's my implementation of continuous MAPPO: https://pastebin.com/VP0yKY9W

I'm using the following params to get to that ~20-25%, these were initially copied from the discrete MAPPO SMAX config:

"LR": 0.0002
"NUM_ENVS": 128
"NUM_STEPS": 128 
"TOTAL_TIMESTEPS": 1e7
"FC_DIM_SIZE": 128
"GRU_HIDDEN_DIM": 128
"UPDATE_EPOCHS": 4
"NUM_MINIBATCHES": 4
"GAMMA": 0.99
"GAE_LAMBDA": 0.95
"CLIP_EPS": 0.2
"SCALE_CLIP_EPS": False
"ENT_COEF": 0.0
"VF_COEF": 0.5
"MAX_GRAD_NORM": 0.25
"ACTIVATION": "relu"
"OBS_WITH_AGENT_ID": True
"ENV_NAME": "HeuristicEnemySMAX"
"MAP_NAME": "2s3z"
"SEED": 0
"ENV_KWARGS": 
  "see_enemy_actions": True
  "walls_cause_death": True
  "attack_mode": "closest"
  "action_type": "continuous"
"ANNEAL_LR": False

amacrutherford · 2025-01-14T16:24:44Z

so you can actually look here for a continous action implementation of IPPO: https://github.com/amacrutherford/sampling-for-learnability/blob/main/sfl/train/jaxnav_sfl.py.

MPE should be pretty much the same between the two implementations. Those SMAX results seem on par with what we found :)

Chulabhaya · 2025-01-15T15:09:50Z

so you can actually look here for a continous action implementation of IPPO: https://github.com/amacrutherford/sampling-for-learnability/blob/main/sfl/train/jaxnav_sfl.py.

MPE should be pretty much the same between the two implementations. Those SMAX results seem on par with what we found :)

Awesome, sounds great! And thank you for the IPPO implementation, I will definitely take a look at that. Would you guys be interested if I made a PR for continuous action MAPPO for review?

amacrutherford · 2025-01-20T12:42:05Z

yes please that would be fab! Please include a training curve on the PR contrasting the performance of the continuous action implementation with discrete for MPE :)

amacrutherford self-assigned this Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any baselines/benchmarks for continuous action versions of environments? #128

Are there any baselines/benchmarks for continuous action versions of environments? #128

Chulabhaya commented Jan 6, 2025

amacrutherford commented Jan 8, 2025

Chulabhaya commented Jan 9, 2025

Chulabhaya commented Jan 13, 2025

amacrutherford commented Jan 14, 2025

Chulabhaya commented Jan 15, 2025

amacrutherford commented Jan 20, 2025

Are there any baselines/benchmarks for continuous action versions of environments? #128

Are there any baselines/benchmarks for continuous action versions of environments? #128

Comments

Chulabhaya commented Jan 6, 2025

amacrutherford commented Jan 8, 2025

Chulabhaya commented Jan 9, 2025

Chulabhaya commented Jan 13, 2025

amacrutherford commented Jan 14, 2025

Chulabhaya commented Jan 15, 2025

amacrutherford commented Jan 20, 2025