Reproduce paper results

The commands below launch training scripts locally in background, on a single seed. Alternatively, you may set the flag exps.noslurm=false to run them as slurm jobs. In the latter case, you may specify flags such as host.time=300 (5 hours) for custom sbatch parameters. To run multiple seeds, you may add the flag sweep.config=[tenseeds] or sweep.seed=[42,43,44,45,46]. Refer to exps-launcher for more information on our experiments-launcher syntax.

NO-DR

CartPole: python launch_exps.py script=train_udr config=[adaptive+,classicCartpoleEasy,short] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_NoDR" dr_percentage=0.0
SwingUpCartPole: python launch_exps.py script=train_udr config=[adaptive+,cartpoleEasy,short] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_NoDR" dr_percentage=0.0
Hopper: python launch_exps.py script=train_udr config=[adaptive+,hopper,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_NoDR" dr_percentage=0.0
Walker2d: python launch_exps.py script=train_udr config=[adaptive+,walker2d_v2,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_NoDR" dr_percentage=0.0
HalfCheetah: python launch_exps.py script=train_udr config=[adaptive+,halfcheetah,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_NoDR" dr_percentage=0.0
Swimmer: python launch_exps.py script=train_udr config=[adaptive+,swimmer_v2,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_NoDR" dr_percentage=0.0
PandaPush: python launch_exps.py script=train_udr config=[adaptive+,pandaPush_v11,medium,easyDR0,qacc04] seed=42 exps.noslurm=true wandb="offline"

Fixed-DR

CartPole: python launch_exps.py script=train_udr config=[adaptive+,classicCartpoleEasy,short] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_FixedDR"
SwingUpCartPole: python launch_exps.py script=train_udr config=[adaptive+,cartpoleEasy,short] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_FixedDR"
Hopper: python launch_exps.py script=train_udr config=[adaptive+,hopper,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_FixedDR"
Walker2d: python launch_exps.py script=train_udr config=[adaptive+,walker2d_v2,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_FixedDR"
HalfCheetah: python launch_exps.py script=train_udr config=[adaptive+,halfcheetah,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_FixedDR"
Swimmer: python launch_exps.py script=train_udr config=[adaptive+,swimmer_v2,longx] seed=42 wandb=offline exps.noslurm=true exps.group_suffix="_FixedDR"
PandaPush: python launch_exps.py script=train_udr config=[adaptive+,pandaPush_v11,medium,space1,qacc04] seed=42 exps.noslurm=true wandb="offline"

DORAEMON

Inclined Plane (toy problem): python launch_exps.py script=train_doraemon config=[adaptive+,plane_v2,short,succRate50] sweep.kl_ub=[0.1,0.05,0.01,0.005,0.001] seed=42 wandb="offline" exps.noslurm=true
CartPole: python launch_exps.py script=train_doraemon config=[adaptive+,classicCartpoleEasy,short,succRate50] seed=42 exps.noslurm=true wandb="offline" kl_ub=0.001
SwingUpCartPole: python launch_exps.py script=train_doraemon config=[adaptive+,cartpoleEasy,short,succRate50] seed=42 exps.noslurm=true wandb="offline" kl_ub=0.1
Hopper: python launch_exps.py script=train_doraemon config=[adaptive+,hopper,longx,succRate50] seed=42 exps.noslurm=true wandb="offline" kl_ub=0.005
Walker2d: python launch_exps.py script=train_doraemon config=[adaptive+,walker2d_v2,longx,succRate50] seed=42 exps.noslurm=true wandb="offline" kl_ub=0.01
HalfCheetah: python launch_exps.py script=train_doraemon config=[adaptive+,halfcheetah,longx,succRate50] seed=42 exps.noslurm=true wandb="offline" kl_ub=0.005
Swimmer: python launch_exps.py script=train_doraemon config=[adaptive+,swimmer_v2,longx,succRate50] seed=42 exps.noslurm=true wandb="offline" kl_ub=0.001
PandaPush: python launch_exps.py script=train_doraemon config=[adaptive+,pandaPush_v11,medium,succRate50,easyStart,qacc04] seed=42 exps.noslurm=true wandb="offline" kl_ub=0.1

AutoDR

CartPole: python launch_exps.py script=train_autodr config=[adaptive+,classicCartpoleEasy,short,original] exps.noslurm=true seed=42 wandb="offline" delta=0.0166
SwingUpCartPole: python launch_exps.py script=train_autodr config=[adaptive+,cartpoleEasy,short,original] exps.noslurm=true seed=42 wandb="offline" delta=0.0498
Hopper: python launch_exps.py script=train_autodr config=[adaptive+,hopper,longx,original] exps.noslurm=true seed=42 wandb="offline" delta=0.0083
Walker2d: python launch_exps.py script=train_autodr config=[adaptive+,walker2d_v2,longx,original] exps.noslurm=true seed=42 wandb="offline" delta=0.0166
HalfCheetah: python launch_exps.py script=train_autodr config=[adaptive+,halfcheetah,longx,original] exps.noslurm=true seed=42 wandb="offline" delta=0.0083
Swimmer: python launch_exps.py script=train_autodr config=[adaptive+,swimmer_v2,longx,original] exps.noslurm=true seed=42 wandb="offline" delta=0.0083
PandaPush: python launch_exps.py script=train_autodr config=[adaptive+,pandaPush_v11,medium,original,easyStart,qacc04] seed=42 exps.noslurm=true wandb="offline" delta=0.0498

LSDR

CartPole: python launch_exps.py script=train_lsdr config=[adaptive+,classicCartpoleEasy,short,slowPace] exps.noslurm=true seed=42 wandb=offline alpha=5
SwingUpCartPole: python launch_exps.py script=train_lsdr config=[adaptive+,cartpoleEasy,short,slowPace] exps.noslurm=true seed=42 wandb=offline alpha=1
Hopper: python launch_exps.py script=train_lsdr config=[adaptive+,hopper,longx,slowPace] exps.noslurm=true seed=42 wandb=offline alpha=5
Walker2d: python launch_exps.py script=train_lsdr config=[adaptive+,walker2d_v2,longx,slowPace] exps.noslurm=true seed=42 wandb=offline alpha=10
HalfCheetah: python launch_exps.py script=train_lsdr config=[adaptive+,halfcheetah,longx,slowPace] exps.noslurm=true seed=42 wandb=offline alpha=0.1
Swimmer: python launch_exps.py script=train_lsdr config=[adaptive+,swimmer_v2,longx,slowPace] exps.noslurm=true seed=42 wandb=offline alpha=0.1
PandaPush: python launch_exps.py script=train_lsdr config=[adaptive+,pandaPush_v11,medium,slowPace,easyStart,qacc04,baselinePandaReturn] seed=42 exps.noslurm=true wandb=offline alpha=0.0001 obj_fun_lr=0.001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproduce_paper_results.md

reproduce_paper_results.md

Reproduce paper results

NO-DR

Fixed-DR

DORAEMON

AutoDR

LSDR

Files

reproduce_paper_results.md

Latest commit

History

reproduce_paper_results.md

File metadata and controls

Reproduce paper results

NO-DR

Fixed-DR

DORAEMON

AutoDR

LSDR