PotSim: A Large-Scale Simulated Dataset for Benchmarking AI Techniques on Potato Crop

This repository contains the official implementation associated with this paper. The corresponding dataset is publicly available here.

Description:

PotSim is a large-scale simulated agricultural dataset specifically designed for AI-driven research on potato cultivation. This dataset is grounded in real-world crop management scenarios and extrapolated to approximately 4.9 million hypothetical crop management scenarios. It encompasses diverse factors including varying planting dates, fertilizer application rates and timings, irrigation strategies, and 24 years of weather data. The resulting dataset comprises over 675 million daily simulation records, offering an extensive and realistic framework for agricultural AI research.

Features:

The repository contains three main files example.ipynb, plots.ipynb, and run.py. To reproduce the train/test results presented in the paper, we provide run.py, which can be executed over a command line interface or terminal. To follow a step by step procedure and work with our dataset, we provide example.ipynb, a jupyter notebook template, which act as a starting point for further exploration. To make it easier to visualize and plot the results, we have provided plots.ipnb, a jupyter notebook template, which contains few example plots and can be edited according the requirements.

Directory Structure:

Directory Name	Description
`data`	Contains all datasets required for experiments and analyses.
`data/potsim_yearly`	Default location for yearly dataset files utilized in the study.
`models`	Houses all model architecture definitions and related scripts.
`outputs`	Default directory for saving model checkpoints, logs, and results generated during training.
`saves`	Stores pre-trained model states and checkpoints from experiments referenced in the paper.
`testing`	Includes scripts and functions for evaluating model performance and generating metrics.
`training`	Contains training routines, configuration files, and code for model optimization.
`utils`	Utility functions for data preprocessing, splitting, and model configuration management.
`utils/potsimloader`	Specialized utilities for efficient data loading and processing workflows.

Requirements:

To install the requirements:

conda env create -f environment.yml
conda activate potsim_env

Depending on the version of CUDA on your system, install PyTorch v2.5.1 from official PyTorch source at https://pytorch.org

# Example for cuda-version 12.4
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

To allow on gpu metrics and display the model parameters clearly

pip install torchmetrics==1.7.1 torchinfo==1.8

If your system is not set up with conda package manager, then please visit https://www.anaconda.com/download to install Miniconda accoding to your operating system and then continue by installing the requirements from above.

Usage: `run.py`

The script supports two main commands: train and test.

Make sure your datasets are in the .parquet format and accessible by the script at data folder.
For more details on available target variables and models, check the code or add a --help flag:

python run.py --help
python run.py train --help
python run.py test --help

1. Train a Model

python run.py train -tgt -m  [options]

Arguments:

Argument	Type	Required	Default	Description
`-tgt`, `--target`	str	Yes		Target variable to predict. Choices: see below
`-m`, `--model`	str	Yes		Model type to use. Choices: see below
`-tdata`, `--train_dataset`	str	No	`train_split`	Training dataset split
`-vdata`, `--val_dataset`	str	No	`val_split`	Validation dataset split
`-bs`, `--batch_size`	int	No	`256`	Batch size
`-lr`, `--learning_rate`	float	No	`0.005`	Learning rate
`-ep`, `--epochs`	int	No	`100`	Maximum number of epochs
`-sl`, `--seq_len`	int	No	`15`	Sequence length (for sequence models)
`-d`, `--device`	str	No	`None`	Device: `cpu` or `cuda`

Example:

python run.py train -tgt="NTotL1" -m="lstm" -tdata="train_split" -vdata="val_split" -bs=256 -lr=0.001 -ep=10 -sl=15 -d="cuda"

2. Test a Model

python run.py test -tgt  -m  -data  [options]

Arguments:

Argument	Type	Required	Default	Description
`-tgt`, `--target`	str	Yes		Target variable to predict. Choices: see below
`-m`, `--model`	str	Yes		Model type to use. Choices: see below
`-data`, `--dataset`	str	Yes		Dataset to run test on
`-mdir`, `--model_dir`	str	No	`saves`	Directory where trained models are saved (`outputs` or `saves`)
`-bs`, `--batch_size`	int	No	`256`	Batch size
`-sl`, `--seq_len`	int	No	`15`	Sequence length (for sequence models)
`-d`, `--device`	str	No	`None`	Device: `cpu` or `cuda`

Example:

python run.py test -tgt="NTotL1" -m="lstm" -data="test_split" -mdir="saves" -bs=256 -sl=15 -d="cuda"

Results:

R2 Metrics for Different Models

Target	CNN1D	Transformer	LSTM	LinearRegression	MLP	TCN
`NLeach`	0.432	-0.02	0.343	0.002	0.014	0.265
`NPlantUp`	0.803	0.733	0.794	0.322	0.753	0.791
`NTotL1`	0.843	0.764	0.831	0.481	0.779	0.823
`NTotL2`	0.861	0.799	0.849	0.489	0.792	0.843
`SWatL1`	0.973	0.949	0.972	0.620	0.841	0.950
`SWatL2`	0.944	0.783	0.928	0.700	0.816	0.914

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PotSim: A Large-Scale Simulated Dataset for Benchmarking AI Techniques on Potato Crop

Description:

Features:

Directory Structure:

Requirements:

Usage: `run.py`

1. Train a Model

2. Test a Model

Results:

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
models		models
plots		plots
saves		saves
testing		testing
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
example.ipynb		example.ipynb
plots.ipynb		plots.ipynb
run.py		run.py

License

GatorSense/PotSim

Folders and files

Latest commit

History

Repository files navigation

PotSim: A Large-Scale Simulated Dataset for Benchmarking AI Techniques on Potato Crop

Description:

Features:

Directory Structure:

Requirements:

Usage: run.py

1. Train a Model

2. Test a Model

Results:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Usage: `run.py`

Packages