-
Notifications
You must be signed in to change notification settings - Fork 6
Environment design
This page provides a detailed explanation of the environment design in the RL-ADN framework. The environment simulates a power network, and the agent's task is to manage this network by controlling the batteries attached to various nodes.
The PowerNetEnv
class is the core environment class in the RL-ADN framework. It inherits from gym.Env
and is designed to interface with reinforcement learning algorithms. Below are the key components and methods of the PowerNetEnv
class:
-
Attributes:
-
voltage_limits
: Limits for the voltage in the network. -
algorithm
: Algorithm choice for power flow calculations (Laurent
orPandaPower
). -
battery_list
: List of nodes where batteries are attached. -
year
,month
,day
: Current date in the simulation. -
train
: Boolean indicating whether the environment is in training mode. -
state_pattern
: Pattern for the state representation. -
network_info
: Information about the network topology and parameters. -
action_space
: Action space of the environment. -
data_manager
: Instance ofGeneralPowerDataManager
for handling time-series data. -
episode_length
: Length of an episode in terms of time steps. -
state_space
: State space of the environment.
-
-
Methods:
-
__init__(self, env_config)
: Initializes the environment with the given configuration. -
reset(self)
: Resets the environment to its initial state and returns the initial observation. -
_reset_date(self)
: Resets the date for a new episode. -
_reset_time(self)
: Resets the time for a new episode. -
_reset_batteries(self)
: Resets the state of all batteries in the environment. -
_build_state(self)
: Builds and normalizes the current state of the environment. -
_get_obs(self)
: Gets the current observation from the environment based on the chosen algorithm. -
_apply_battery_actions(self, action)
: Applies the given actions to the batteries and updates the network state. -
step(self, action)
: Advances the environment by one time step based on the given action. -
_calculate_reward(self, current_obs, vm_pu_after_control_bat, saved_power)
: Calculates the reward based on the current state and action. -
render(self, current_obs, next_obs, reward, finish)
: Renders the current state of the environment.
-
-
Environment Configuration: The environment is initialized with a configuration dictionary (
env_config
) containing parameters such as voltage limits, algorithm choice, battery list, date, and data paths. -
Network Setup: Based on the configuration, the environment sets up the network using either
GridTensor
(for the Laurent algorithm) orPandaPower
(for the PandaPower algorithm).
-
Reset: At the start of each episode, the environment resets the date, time, and battery states. It also builds the initial state of the environment by fetching data from the
GeneralPowerDataManager
. -
Step: For each time step within an episode:
- The agent selects an action based on the current state.
- The environment applies the action to the batteries and updates the network state accordingly.
- The power flow is recalculated, and the new state is built.
- The reward for the action is calculated based on the updated state.
- The environment checks if the episode has ended and returns the next state, reward, and episode termination status.
- State Construction: The state includes variables such as net load power, state of charge (SOC) of the batteries, energy price, current time step, and voltage at battery nodes.
- Normalization: The state is normalized to ensure that all state variables are within a consistent range, facilitating stable learning by the agent.
- Reward Components: The reward is calculated based on the saved energy, adjusted by the current energy price, and penalized for any voltage deviations beyond the acceptable range.
The PowerNetEnv
class is the backbone of the RL-ADN environment, managing the simulation of the power network, handling interactions with the DRL agent, and ensuring the integrity and efficiency of data handling. By understanding the class structure and data flow, users can effectively utilize and customize the environment for their specific research needs.
For detailed examples and further customization options, refer to the full documentation and example notebooks provided with the framework.