Skip to content

Latest commit

 

History

History
29 lines (26 loc) · 1.5 KB

README.md

File metadata and controls

29 lines (26 loc) · 1.5 KB

thermochemical-data-fusion

List of files


1) smiles_to_graphs.py
  This file contains code to convert a list of smiles (read from a text file) to the data matrix.
2) lasso_fits.py
  This file contains the LASSO implementation used in the paper.
3) bootstrap_error_estimation.py
  This file can be used to generate a bootstrap sample of mean absolute error (MAE) for the model.
4) helper_files.py
  This file contains a list of plotting helper functions used in the paper.
5) final_model_lasso_EE.json, final_model_lasso_EH.json and final_model_lasso_EG.json
  JSON file contains details of the three models (EE, EH and E\G) in the following pseudocode format:
  {
   'coefficients': model_obj.model_.coef_.tolist(),
   'intercept': model_obj.model_.intercept_,
   'alpha': model_obj.model_.alpha_,
   'cv_alphas': model_obj.model_.alphas_.tolist(),
   'cv_mse_path': model_obj.model_.mse_path_.tolist(),
   'X_scale_mean': model_obj.X_scaler.mean_.tolist(),
   'y_scale_mean': float(model_obj.y_scaler.mean_),
   'X_scaler_std': model_obj.X_scaler.scale_.tolist(),
   'y_scale_mean': float(model_obj.y_scaler.scale_) }

6) train_mols.txt and test_mols.txt
  Ids of train and test molecules. Nomenlature follows the indexing used in QM9.