AI-powered spatiotemporal imputation and prediction of chlorophyll-a concentration in coastal oceans
This repository contains the code for the STIMP method, an advanced AI framework to impute and predict Chl_a across a broad spatiotemporal scale in coastal oceans. STIMP's results can be utilized to diagnose and analyze the ecosystem health of coastal oceans based on the remote sensing measurement.
![](https://private-user-images.githubusercontent.com/150044070/403327448-47b87208-e49a-45e0-9c93-8d792546bcac.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5NTk0NjQsIm5iZiI6MTczODk1OTE2NCwicGF0aCI6Ii8xNTAwNDQwNzAvNDAzMzI3NDQ4LTQ3Yjg3MjA4LWU0OWEtNDVlMC05YzkzLThkNzkyNTQ2YmNhYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwN1QyMDEyNDRaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01OTZmODQwN2FmOTM1ZTQ3OTgxMjE0NzQyZmQ1NTY4OGUyMTVlYTRkZjY2YzcxNTc3Y2E4NzJkNTA0ZjlkZmEwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.xVL8nyZ4S-hSCjksvs-L7poC_C7MuTjKY_Guyidsebo)
We provide source code for reproducing the experiments of the paper "AI-powered spatiotemporal imputation and prediction of chlorophyll-a concentration in coastal oceans".
git clone https://github.com/YangLabHKUST/STIMP.git
cd /path/to/STIMP
conda create -n stimp python=3.9
conda activate stimp
pip install -r requirements.txt
All data used in this work are publicly available through online sources. The chlorophyll-a observation datasets were 8-day averaged Level 3 mapped products from Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua projects with a spatial resolution of 4 km https://search.earthdata.nasa.gov/search?q=10.5067/AQUA/MODIS/L3M/CHL/2022. You can select the data with .8D..4km.nc as filter.
We also uploaded the datasets on Zenodo at https://doi.org/10.5281/zenodo.14724760. Then,
mv data.zip /path/to/STIMP/
unzip e data.zip
Prepare the dataset from the raw data We generate the 4 datasets, including Pearl River Estuary, the Northern of Mexico, Chesapeake Bay and Yangtze River Estuary, following this tutorials. The generated datasets are also included in the data.zip
Taking the Pearl River Estuary as an example, we construct 9 datasets with different missing rate. We train the STIMP with each dataset:
for i in {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9}
do
python imputation/train_stimp.py --missing_ratio $i --area PRE
done
Baselines can be trained:
for i in {0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9}
do
python imputation/train_cf.py --missing_ratio $i --area PRE
python imputation/train_csdi.py --missing_ratio $i --area PRE
python imputation/train_dineof.py --missing_ratio $i --area PRE
python imputation/train_imputeformer.py --missing_ratio $i --area PRE
python imputation/train_inpainter.py --missing_ratio $i --area PRE
python imputation/train_lin_itp.py --missing_ratio $i --area PRE
python imputation/train_mae.py --missing_ratio $i --area PRE
python imputation/train_mean.py --missing_ratio $i --area PRE
python imputation/train_trmf.py --missing_ratio $i --area PRE
done
Some visualization results are contained within Imputation in Pearl River Estuary
For other coastal ocean areas, STIMP and baselines are trained by replacing PRE with MEXICO, Chesapeake or Yangtze.
- Imputation in the Northern of MEXICO
- Impuation in Chesapeake Bay
- Imputation in Yangtze River Estuary
Observations of Chl_a in Pearl River Estuary are imputed:
python dataset/generate_data_with_stimp.py --area PRE
We sample 10 different imputed Chl_a distribution from
for i in {0..9}
do
python prediction/train.py --index $i --area PRE
done
Baselines are learned based on the original observations
python prediction/train_without_spatial_imputation.py --method "CrossFormer" --area PRE
python prediction/train_without_spatial_imputation.py --method "iTransformer" --area PRE
python prediction/train_without_spatial_imputation.py --method "TSMixer" --area PRE
python prediction/train_without_imputation.py --method "MTGNN" --area PRE
python prediction/train_as_image_without_imputation.py --method "PredRNN" --area PRE
python prediction/train_xgboost_without_imputation.py --area PRE
We also train baselines based on the imputed Chl_a distribution (in supplementary material):
for i in {0..9}
do
python prediction/train_without_spatial.py --method "CrossFormer" --area PRE --index $i
python prediction/train_without_spatial.py --method "iTransformer" --area PRE --index $i
python prediction/train_without_spatial.py --method "TSMixer" --area PRE --index $i
python prediction/train.py --method "MTGNN" --area PRE --index $i
python prediction/train_as_image.py --method "PredRNN" --area PRE --index $i
python prediction/train_xgboost.py --area PRE --index $i
done
We provide the source code for overall prediction performance in each coastal ocean area:
- Overall prediction performance in Pearl River Estuary
- Overall prediction performance in the Northern of MEXICO
- Overall prediction performance in Chesapeake Bay
- Overall prediction performance in Yangtze River Estuary
Some case studies are included in the following tutorials: