Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug transformer fitting #385

Merged
merged 31 commits into from
Jan 15, 2025
Merged
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d44ee76
One transformer set for each fold and for train_valid dataset
stewarthe6 Dec 9, 2024
10675c6
Returned all_training_datasets
stewarthe6 Dec 9, 2024
71d0859
creating transformers now expects a DeepChem dataset instead of a mod…
stewarthe6 Dec 9, 2024
a2520a4
Added fold as a parameter to more functions, 'final' is the name of t…
stewarthe6 Dec 10, 2024
18987d3
No more transformers in perf_data, No more transformers in EpochManag…
stewarthe6 Dec 10, 2024
4689f9e
Switched back to saving one .pkl for all transformers. The pkl is sti…
stewarthe6 Dec 11, 2024
cd14bd9
Fixed issue where _create_*_transformers sometimes would not return a…
stewarthe6 Dec 11, 2024
00a74af
Missing model_dataset argument
stewarthe6 Dec 11, 2024
9cb3f4d
Removed double nested list
stewarthe6 Dec 11, 2024
72172a2
Updated transformer path in test
stewarthe6 Dec 11, 2024
9863aef
Removed fold argument and added backwards transformer functionality
stewarthe6 Dec 11, 2024
a40dffa
Removed a few 'final' arguments that are no longer used
stewarthe6 Dec 11, 2024
6332c60
Removed unused imports
stewarthe6 Dec 11, 2024
398cf06
specified fold for embedding features
stewarthe6 Dec 11, 2024
218f70d
More tests for perf_data
stewarthe6 Dec 12, 2024
78f7e35
Test to make sure transformers are fit correctly on training data only
stewarthe6 Dec 12, 2024
804a62b
Added check to make sure that every requested id in the subset has a …
stewarthe6 Dec 12, 2024
23bd84f
call get_untransformed_responses instead
stewarthe6 Dec 12, 2024
61c18fd
Cache the untransformed response dict
stewarthe6 Dec 13, 2024
9c8abc3
Should not have to pass a 'final' argument
stewarthe6 Dec 13, 2024
a7ed96a
Weights and y should be the same shape
stewarthe6 Dec 13, 2024
b0940ad
dataset transformation moved into generate_predictions()
paulsonak Dec 13, 2024
a7eb892
zero out transformed values that are larger than 1e30
paulsonak Dec 13, 2024
d1b25d8
get_untransformed_responses returns an array, not a dictionary
stewarthe6 Dec 16, 2024
74c1953
sped up and updated the test
stewarthe6 Dec 16, 2024
58c454d
Updated transformer test to correctly test the standard deviation and…
stewarthe6 Dec 16, 2024
d17bfc4
update large values to be capped at 1e30
paulsonak Dec 18, 2024
d5b3b55
Added a test for kfold cross validation transformers
stewarthe6 Dec 19, 2024
069fa28
Merge branch 'bug_transformer_fitting' of github.com:ATOMScience-org/…
stewarthe6 Dec 19, 2024
3fb2f45
Test for y transformers
stewarthe6 Dec 19, 2024
d309761
Removed unused 'fold' parameter. Added documentation for this PR
stewarthe6 Jan 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Removed a few 'final' arguments that are no longer used
  • Loading branch information
stewarthe6 committed Dec 11, 2024
commit a40dffaade71cecb9a7b770843deaaa6582587fc
4 changes: 2 additions & 2 deletions atomsci/ddm/pipeline/perf_plots.py
Original file line number Diff line number Diff line change
@@ -138,7 +138,7 @@ def plot_pred_vs_actual(model, epoch_label='best', threshold=None, error_bars=Fa
subs_pred_df=pred_df[pred_df.subset==subset]

perf_data = wrapper.get_perf_data(subset, epoch_label)
pred_results = perf_data.get_prediction_results('final')
pred_results = perf_data.get_prediction_results()
# pred_df=pfm.predict_from_pipe(model)
std=len([x for x in pred_df.columns if 'std' in x]) > 0
if perf_data.num_tasks > 1:
@@ -164,7 +164,7 @@ def plot_pred_vs_actual(model, epoch_label='best', threshold=None, error_bars=Fa
# % binding / inhibition data, with one row per subset.
for s, subset in enumerate(subsets):
perf_data = wrapper.get_perf_data(subset, epoch_label)
pred_results = perf_data.get_prediction_results('final')
pred_results = perf_data.get_prediction_results()
y_actual = perf_data.get_real_values('final')
ids, y_pred, y_std = perf_data.get_pred_values('final')
r2 = pred_results['r2_score']
2 changes: 1 addition & 1 deletion atomsci/ddm/test/integrative/hybrid/test_hybrid.py
Original file line number Diff line number Diff line change
@@ -52,7 +52,7 @@ def test():

print("Check the model performance on validation data")
pred_data = pl.model_wrapper.get_perf_data(subset="valid", epoch_label="best")
pred_results = pred_data.get_prediction_results('final')
pred_results = pred_data.get_prediction_results()
print(pred_results)

pred_score = pred_results['r2_score']
2 changes: 1 addition & 1 deletion atomsci/ddm/utils/hyperparam_search_wrapper.py
Original file line number Diff line number Diff line change
@@ -1564,7 +1564,7 @@ def lossfn(p):
for subset in subsets:
if not model_failed:
perf_data = pl.model_wrapper.get_perf_data(subset=subset, epoch_label="best")
sub_pred_results = perf_data.get_prediction_results('final')
sub_pred_results = perf_data.get_prediction_results()
else:
if tparam.prediction_type == "regression":
sub_pred_results = {"r2_score": 0, "rms_score": 100}
Loading