Skip to content

Interaction_Loop_Layer

Hou Shengren edited this page Aug 5, 2024 · 1 revision

Interaction Loop Layer

The Interaction Loop Layer in the RL-ADN framework governs how the agent interacts with the environment during each time step of an episode. This loop is crucial for training reinforcement learning agents, as it determines the sequence of actions, observations, and rewards that the agent experiences.

Interaction Process

For each time step ( t ) in an episode, the following sequence of operations occurs:

  1. State Acquisition:

    • The agent obtains the current state ( s_t ).
  2. Action Determination:

    • Based on the current state ( s_t ), the agent determines an action ( a_t ) to be executed in the environment.
  3. Action Execution:

    • Once ( a_t ) is received, the environment executes the step function to perform the power flow and update the status of ESSs and the distribution network. This step accounts for the consequence of the action at the current time step ( t ).
  4. Reward Calculation:

    • Based on the resultant observations from the action execution, the reward ( r_t ) is calculated using the designed reward calculation block.
  5. Next State Sampling:

    • The Data Manager in the environment samples external time-series data for the next time step ( t+1 ), which includes demand, renewable energy generation, and price. This sampling emulates the stochastic fluctuations of the environment.
    • These external variables are combined with updated internal observations to form the resultant transition of the environment.

Customization Options

Users can freely design the build-state block and the cal-reward block to explore how different states and reward structures influence the performance of algorithms on various tasks.

  • State Construction (build-state Block):

    • Users can customize the state representation to investigate its impact on algorithm performance. The framework provides a default state pattern, but users are encouraged to modify it as needed for their specific tasks.
  • Reward Calculation (cal-reward Block):

    • Similarly, the reward calculation can be tailored to different optimization tasks. The framework includes a default reward calculation method, but users can adapt it to suit their objectives.

Default Implementation

For the convenience of users, the RL-ADN framework provides default implementations for both the state pattern and reward calculation. These defaults offer a starting point for new users and ensure that the framework can be used out-of-the-box for common tasks.


By following this interaction loop, the RL-ADN framework ensures a structured and efficient process for training reinforcement learning agents. The flexibility in customizing the state and reward calculation blocks allows for a wide range of experiments and optimizations, facilitating in-depth research and development.