Skip to content

Data_Augmentation_Model

Hou Shengren edited this page Aug 5, 2024 · 1 revision

Data Augmentation Model

The RL-ADN framework incorporates Gaussian mixture models (GMM) and Copula functions for data augmentation. The GMM is a probabilistic model that assumes data originates from a blend of multiple Gaussian distributions, each characterized by unique means and covariances. This model can adeptly capture the complex and multi-modal nature of time series data in distribution networks, which often exhibit intricate patterns due to fluctuating load demands and renewable energy generation. Complementing the GMM, Copula functions are utilized to encapsulate the time-correlation structure between multiple time-step data in a defined period, independent of their marginal distributions. This dual approach ensures a comprehensive and realistic augmentation of time-series data in distribution network operations. In our framework, three augmentation methods are provided: GMM, t-Copula, and Gaussian Copula.

The integration of GMM and Copula functions (GMC) in the RL-ADN framework marks a significant advancement in creating robust and reliable environments for training reinforcement learning agents. This approach adeptly handles the complexities and uncertainties inherent in power distribution networks, enhancing the training data's quality and the resulting policies' effectiveness.

Data Augmentation Workflow

The augmentation process involves several sophisticated statistical techniques, outlined as follows:

  • The ActivePowerDataManager class, a subclass of the GeneralPowerDataManager, preprocesses the input data, fills missing values through interpolation, and restructures the data into an appropriate format for augmentation.
  • A Gaussian Mixture Model (GMM) is fitted to the marginal distribution of historical active power data for each node and time step, capturing the underlying distribution of power consumption.
  • The Bayesian Information Criterion (BIC) is employed to select the optimal number of components for each GMM, ensuring that the model complexity is balanced against the goodness of fit.
  • A Copula-based approach is then applied, which models the dependency structure between different nodes and time steps, allowing for the generation of synthetic data points that maintain the correlation observed in historical data.
  • The augment_data method leverages the GMM and Copula to produce new data samples, which are then transformed from the probabilistic space back to the power data scale.

The TimeSeriesDataAugmentor module interacts with the data manager to retrieve the necessary preprocessed data, and then applies its augmentation algorithms to produce an augmented dataset. The output is a synthetic yet realistic dataset that reflects the variability and unpredictability inherent in power systems. This enriched dataset is crucial for training RL agents, providing them with a diverse range of scenarios to learn from and ultimately resulting in a more adaptable and robust decision-making policy.

Upon completion of the augmentation process, the synthetic data is saved to a CSV file, facilitating easy integration into the training pipeline. This automated and sophisticated data augmentation procedure enhances the RL-ADN framework's capability to train more effective and resilient RL agents for the distribution network ESSs operations.