-
Notifications
You must be signed in to change notification settings - Fork 86
1. Environment (env)
RoboHive tasks are formulated as MPD environments, which are exposed using OpenAI gym API. In addition to them being feature complete in terms of gym environments, Robohive environment supports necessary features that make the robohive ecosystem unique for simulation as well as real world robotics. Below is a summary of Robohive features
- Gym API
- Large diversity of tasks
- Physically realistic task and scene simulation.
- Visually expressive tasks
- Support for common robotic hardware
- Simulation/hardware agnostic environment/ task definition
- Sim2real support
- Vectorized/ batched envs to support MBRL
- Support for visual observations
- Support for foundation models for visual latent representations.
- Dense and sparse rewards
- Success criterias for evaluations
Next we outline basic functionalities of Robohive envs
- Environment Registration: RoboHive environments are pure gym environments. They follow the native gym registration API and can be registered as any gym environment.
Code
from gym.envs.registration import register
# Hand Manipulation Suite: Open door
from mj_envs.envs.hand_manipulation_suite.door_v1 import DoorEnvV1
register(
id='DemoDoor-v1',
entry_point='mj_envs.envs.hand_manipulation_suite:DoorEnvV1',
max_episode_steps=100,
kwargs={
'model_path':'mj_envs/envs/hand_manipulation_suite/assets/DAPG_door.xml',
}
)
- Passing arguments to the environment
- Registering variants of an environment
- Forward an environment
- Stepping an environment ahead in time
- Env details
- Action space - pos, delta pos
- Observations + batched
- Rewards + batched
- Env_infos
- Done
- Env close
Next, we outline the fundamental design principles behind RoboHive envs.
- Simulation hardware agnostic - robot class
- Partial observability – sim / sim_obsd
- Batched rewards, observations
- Env interpretability (obs / rwd dict)
For human interpretability both observations (obs_dict
) and well as rewards (rwd_dict
) are maintained as an exhaustive dictionary to keep track of the individual terms. Note that these dictionaries are over-complete representations of any/all features that can be of interest. In order to construct the final observation vector env.obs_keys
are used. Similarly env.weighted_reward_keys
are used to construct the final reward returned by the environment.
- Dict vs vector observations
# Get observation vector at the current timestep t i.e. obs_vector(t)
# obs_vector is recovered via env.obs_keys as obs_vector = env.obs_dict[env.obs_keys]
obs_vec_t = env.get_obs()
# Advance the env from t-> t+dt and return new information
obs_vector_tdt, rwd_tdt, done_tdt, info_tdt = env.step(action_t)
# Access the entire dictionary
print(info_tdt['obs_dict'].keys(), info_tdt['rwd_dict'].keys())
- Visual observations
- Visual representation features - R3M, RRL, VIP
- Onscreen
- Offscreen NOTE: For information on how to add visual keys, see here
RoboHive: A unified framework for robot learning