OSSL models

Welcome to the Open Soil Spectral Library (OSSL)!

This is the repository for all model calibration development for the Soil Spectroscopy for Global Good project and based on the Open Soil Spectral Library (OSSL) database.

We have used the MLR3 framework for fitting machine learning models, specifically with the Cubist algorithm.

The README of folder R-mlr explains the steps used to calibrate the models, while this page provides an overview of the modeling and results.

In summary, we have provided 5 different model types depending on the availability of samples in the database, without the use of geocovariates (na code), i.e., site information or environmental layers are not used, only the spectra data.

The model types are composed of two different subsets, i.e. using the KSSL soil spectral library alone (kssl code) or the full OSSL database (ossl code), in combination with three spectral types: VisNIR (visnir code), NIR from the Neospectra instrument (nir.neospectra code), and MIR (mir code).

After running some internal evaluations, we recommend using the KSSL models only when i) there is not any model available using the full OSSL database for a given combination of soil property and spectral region of interest; ii) the spectra to be predicted has the same instrument manufacturer/model as the spectrometers used to build the KSSL VisNIR and MIR libraries; iii) the KSSL library is representative for the new spectra, based both on spectral similarity and range of soil properties of interest of the target samples. Otherwise, use the models with ossl acronym.

spectra_type	subset	geo	model_name
mir	kssl	na	mir_cubist_kssl_na_v1.2
mir	ossl	na	mir_cubist_ossl_na_v1.2
nir.neospectra	ossl	na	nir.neospectra_cubist_ossl_na_v1.2
visnir	kssl	na	visnir_cubist_kssl_na_v1.2
visnir	ossl	na	visnir_cubist_ossl_na_v1.2

The machine learning algorithm Cubist (coding name cubist) takes advantage of a decision-tree splitting method but fits linear regression models at each terminal leaf. It also uses a boosting mechanism (sequential trees adjusted by weights) that allows the growth of a forest by tuning the number of committees. We haven’t used the correction of final predictions by the nearest neighbors’ influence due to the lack of this feature in the MLR3 framework.

Hyperparameter optimization was done with internal resampling (inner) using 5-fold cross-validation and a smaller subset for speeding up this operation. This task was performed with a grid search of the hyperparameter space testing up to 5 configurations to find the lowest RMSE. The final model with the best optimal hyperparameters was fitted at the end with the full train data.

As predictors, we have used the first 120 PCs of the compressed spectra, a threshold that considers the trade-off between spectral representation and compression magnitude. The remaining farther components were used for the trustworthiness test, i.e., if a sample is underrepresented in respect of the feature space from the calibration set because of their unique features presented in those farther components, then a flag is raised and indicated in the results.

Soil properties with available models can be found in fitted_modeling_combinations_v1.2.csv.

Final evaluation was performed with external (outer) 10-fold cross validation of the tuned models using root mean square error (rmse), mean error (bias), R squared (rsq), Lin’s concordance correlation coefficient (ccc), and the ratio of performance to the interquartile range (rpiq).

The cross-validated predictions were used to estimate the unbiased error (absolute residual), which were further employed to calibrate uncertainty models via conformal prediction intervals. The error model is calibrated using the same fine-tuned structure of the respective response model. Conformity scores are estimated for a defined confidence level, which was defined to 68% to approximate one standard deviation.

Some soil properties were natural-log transformed (with offset = 1, that is why we use log1p function) to improve the prediction performance of highly skewed distributed soil properties. They were back-transformed only at the end after running all modeling steps, including performance estimation and the definition of the uncertainty intervals.

The final fitted models along with their performance metrics can be found in fitted_models_performance_v1.2.csv

Validation plots are available in the out/plots folder.

OSSL Engine

The OSSL Engine is a web platform where the user can upload spectra from the VisNIR (400-2500 nm), NIR (1350-2550 nm), or MIR (600-4000 cm^-1) ranges and get predictions back with uncertainty estimation and the representativeness flag. Please, check it out!

Please, check the example datasets for formatting your spectra to the minimum required level. You can provide either csv files or directly asd or opus (.0) for VisNIR and MIR scans, respectively.

We recommend using the OSSL model type when getting predictions. KSSL is recommended when the spectra to be predicted have the same instrument manufacturer/model as the KSSL library and are represented by the range of soil properties of interest. Otherwise, use the OSSL subset.

Using the OSSL models on your computer

Please follow the instructions provided in the ossl-nix GitHub repository for a full reproducible and local execution of the OSSL models.

Other resources

The OSSL is a public and growing database that is compiled by the Soil Spectroscopy for Global Good initiative.

A peer-reviewed and open-access publication is available for additional reference.

You can also visit other additional open-access repositories in our GitHub organization.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
R-mlr		R-mlr
img		img
out		out
sample-data		sample-data
tex		tex
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.Rmd		README.Rmd
README.md		README.md
ossl-models.Rproj		ossl-models.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSSL models

OSSL Engine

Using the OSSL models on your computer

Other resources

About

Releases

Packages

Contributors 2

Languages

License

soilspectroscopy/ossl-models

Folders and files

Latest commit

History

Repository files navigation

OSSL models

OSSL Engine

Using the OSSL models on your computer

Other resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages