1) smiles_to_graphs.py
This file contains code to convert a list of smiles (read from a text file) to the data matrix.
2) lasso_fits.py
This file contains the LASSO implementation used in the paper.
3) bootstrap_error_estimation.py
This file can be used to generate a bootstrap sample of mean absolute error (MAE) for the model.
4) helper_files.py
This file contains a list of plotting helper functions used in the paper.
5) final_model_lasso_EE.json, final_model_lasso_EH.json and final_model_lasso_EG.json
JSON file contains details of the three models (EE, EH and E\G) in the following pseudocode format:
{
'coefficients': model_obj.model_.coef_.tolist(),
'intercept': model_obj.model_.intercept_,
'alpha': model_obj.model_.alpha_,
'cv_alphas': model_obj.model_.alphas_.tolist(),
'cv_mse_path': model_obj.model_.mse_path_.tolist(),
'X_scale_mean': model_obj.X_scaler.mean_.tolist(),
'y_scale_mean': float(model_obj.y_scaler.mean_),
'X_scaler_std': model_obj.X_scaler.scale_.tolist(),
'y_scale_mean': float(model_obj.y_scaler.scale_) }
6) train_mols.txt and test_mols.txt
Ids of train and test molecules. Nomenlature follows the indexing used in QM9.