-
Notifications
You must be signed in to change notification settings - Fork 6
Interaction_Loop_Layer
The Interaction Loop Layer in the RL-ADN framework governs how the agent interacts with the environment during each time step of an episode. This loop is crucial for training reinforcement learning agents, as it determines the sequence of actions, observations, and rewards that the agent experiences.
For each time step ( t ) in an episode, the following sequence of operations occurs:
-
State Acquisition:
- The agent obtains the current state ( s_t ).
-
Action Determination:
- Based on the current state ( s_t ), the agent determines an action ( a_t ) to be executed in the environment.
-
Action Execution:
- Once ( a_t ) is received, the environment executes the
step
function to perform the power flow and update the status of ESSs and the distribution network. This step accounts for the consequence of the action at the current time step ( t ).
- Once ( a_t ) is received, the environment executes the
-
Reward Calculation:
- Based on the resultant observations from the action execution, the reward ( r_t ) is calculated using the designed reward calculation block.
-
Next State Sampling:
- The Data Manager in the environment samples external time-series data for the next time step ( t+1 ), which includes demand, renewable energy generation, and price. This sampling emulates the stochastic fluctuations of the environment.
- These external variables are combined with updated internal observations to form the resultant transition of the environment.
Users can freely design the build-state
block and the cal-reward
block to explore how different states and reward structures influence the performance of algorithms on various tasks.
-
State Construction (
build-state
Block):- Users can customize the state representation to investigate its impact on algorithm performance. The framework provides a default state pattern, but users are encouraged to modify it as needed for their specific tasks.
-
Reward Calculation (
cal-reward
Block):- Similarly, the reward calculation can be tailored to different optimization tasks. The framework includes a default reward calculation method, but users can adapt it to suit their objectives.
For the convenience of users, the RL-ADN framework provides default implementations for both the state pattern and reward calculation. These defaults offer a starting point for new users and ensure that the framework can be used out-of-the-box for common tasks.
By following this interaction loop, the RL-ADN framework ensures a structured and efficient process for training reinforcement learning agents. The flexibility in customizing the state and reward calculation blocks allows for a wide range of experiments and optimizations, facilitating in-depth research and development.