-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHyperparamLog.txt
225 lines (188 loc) · 17.4 KB
/
HyperparamLog.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
Stage 1: lr selection. Going with 8e-5 from now on. 5e-5 to 9e-5 is almost the same.
------------------------------------------------------------------------
5e-5 40 -> 4 to 9 (6)
1e-5 40 -> 4 to 9 (6)
3.3*12 -> 48hrs
adamw
5e-5 fold 4 done but not saved locally
View run at https://wandb.ai/serrelab/prov-gigapath/runs/TCGA_eval_pretrained_mutation_5_gene_pat_strat_40_5e-05_seed_0_fold_4_2024-07-21-18-53-00
1e-5 fold 4 done but not saved locally
View run at https://wandb.ai/serrelab/prov-gigapath/runs/TCGA_eval_pretrained_mutation_5_gene_pat_strat_40_1e-05_seed_0_fold_4_2024-07-21-22-33-46
9e-5 40 -> 8 to 9 (2)
9e-6 40 -> 5 to 9 (5)
8e-5 40 -> 4 to 9 (6)
3.3*13 -> 48hrs
adamw
40_8e-05 -> fold 4 done but not locally.
View run at https://wandb.ai/serrelab/prov-gigapath/runs/TCGA_eval_pretrained_mutation_5_gene_pat_strat_40_8e-05_seed_0_fold_4_2024-07-22-05-59-21
40 9e-5 -> fold 8 done but not locally.
View run at https://wandb.ai/serrelab/prov-gigapath/runs/TCGA_eval_pretrained_mutation_5_gene_pat_strat_40_9e-05_seed_0_fold_8_2024-07-21-18-32-42
40_9e-06_seed_0_fold_5 -> fold 5 done but not locally.
View run at https://wandb.ai/serrelab/prov-gigapath/runs/TCGA_eval_pretrained_mutation_5_gene_pat_strat_40_9e-06_seed_0_fold_5_2024-07-22-00-12-30
Smaller is better
The hyperparam essentially get reduced to 2 hyperparams as the losses are essentially the same for the rest:
1e-5 -> 1e-5, 9e-6
5e-5 -> 5e-5, 9e-5, 8e-5
==============================================================================================================================================
Stage 2: Schedule free optim with above lr + gradient clipping.
------------------------------------------------------------------------
8e-5
(gigapath) (base) [vsharm44@vscode1 break-pathology]$ sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 0 --end_fold 4
Submitted batch job 3740456
(gigapath) (base) [vsharm44@vscode1 break-pathology]$ sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 5 --end_fold 9
Submitted batch job 3740457
5e-6
(gigapath) (base) [vsharm44@vscode1 break-pathology]$ sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 5 --end_fold 9 --lr 0.000005
Submitted batch job 3740462
(gigapath) (base) [vsharm44@vscode1 break-pathology]$ sbatch finetune_provpath.sh --data TCGA --start_fold 0 --end_fold 4 --lr 0.000005
Submitted batch job 3740476
* Schedule Free Optim -> works pretty cleanly.
* Try gradient clipping (unscaled) different norm thresholds.
Currently at 1. Experiment from 0.1 to 5.
blue on wandb is the gradient clip norm one.
3702680: 3hrs, quadrortx, Quadro RTX 6000, gpu2002.oscar.ccv.brown.edu
--------
epoch 20 lr 0.00008
## Job Started : Mon Jul 22 01:45:54 AM EDT 2024
'adamw_schedulefree'
fold 0 done on wandb but not locally.
View run at https://wandb.ai/serrelab/prov-gigapath/runs/TCGA_eval_pretrained_mutation_5_gene_pat_strat_20_8e-05_seed_0_fold_0_2024-07-22-01-46-16
3702159: 1.5 hrs, a5500, NVIDIA RTX A5500, gpu2250.oscar.ccv.brown.edu
--------
20 8e-05
2024-07-22-00-49-48, ## Job Started : Mon Jul 22 12:49:22 AM EDT 2024
'adamw_schedulefree'
fold 0 done on wandb but not locally.
View run at https://wandb.ai/serrelab/prov-gigapath/runs/TCGA_eval_pretrained_mutation_5_gene_pat_strat_20_8e-05_seed_0_fold_0_2024-07-22-00-49-48
==============================================================================================================================================
Stage 3: Increased Epoch + Top 10 checkpoints used
---------------------------------------------------------------------------------------------------------------------------------
Schedule free optim with above lr + gradient clipping.
sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 0 --end_fold 1
sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 2 --end_fold 3
sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 4 --end_fold 5
sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 6 --end_fold 7
sbatch -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 8 --end_fold 9
3949003
3949004
3949005
3949006
3949007
sbatch -J ft_longnet_seed40 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 0 --end_fold 1 --seed 40
sbatch -J ft_longnet_seed40 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 2 --end_fold 3 --seed 40
sbatch -J ft_longnet_seed40 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 4 --end_fold 5 --seed 40
sbatch -J ft_longnet_seed40 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 6 --end_fold 7 --seed 40
sbatch -J ft_longnet_seed40 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 8 --end_fold 9 --seed 40
3949019 ft_longnet_seed40 gpu carney-gcondo 4 1-02:00:00 1-01:57:55 gpu2107
3949020 ft_longnet_seed40 gpu carney-gcondo 4 1-02:00:00 1-01:57:55 gpu2107
3949021 ft_longnet_seed40 gpu carney-gcondo 4 1-02:00:00 1-01:57:55 gpu2111
3949022 ft_longnet_seed40 gpu carney-gcondo 4 1-02:00:00 1-01:57:55 gpu2112
3949023 ft_longnet_seed40 gpu carney-gcondo 4 1-02:00:00 1-01:57:55 gpu2113
sbatch -J ft_longnet_seed80 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 0 --end_fold 1 --seed 80
sbatch -J ft_longnet_seed80 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 2 --end_fold 3 --seed 80
sbatch -J ft_longnet_seed80 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 4 --end_fold 5 --seed 80
sbatch -J ft_longnet_seed80 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 6 --end_fold 7 --seed 80
sbatch -J ft_longnet_seed80 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 8 --end_fold 9 --seed 80
3949024 ft_longnet_seed80 gpu carney-gcondo 4 1-02:00:00 1-01:58:24 gpu2113
3949025 ft_longnet_seed80 gpu carney-gcondo 4 1-02:00:00 1-01:58:24 gpu2116
3949026 ft_longnet_seed80 gpu carney-gcondo 4 1-02:00:00 1-01:58:24 gpu2501
3949027 ft_longnet_seed80 gpu carney-gcondo 4 1-02:00:00 1-01:58:24 gpu2501
3949028 ft_longnet_seed80 gpu carney-gcondo 4 1-02:00:00 1-01:58:24 gpu2505
sbatch -J ft_longnet_seed120 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 0 --end_fold 1 --seed 120
sbatch -J ft_longnet_seed120 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 2 --end_fold 3 --seed 120
sbatch -J ft_longnet_seed120 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 4 --end_fold 5 --seed 120
sbatch -J ft_longnet_seed120 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 6 --end_fold 7 --seed 120
sbatch -J ft_longnet_seed120 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 8 --end_fold 9 --seed 120
3949032 ft_longnet_seed120 gpu carney-gcondo 4 1-02:00:00 1-01:59:04 gpu2604
3949033 ft_longnet_seed120 gpu carney-gcondo 4 1-02:00:00 1-01:59:04 gpu2604
3949034 ft_longnet_seed120 gpu carney-gcondo 4 1-02:00:00 1-01:59:04 gpu2506
3949035 ft_longnet_seed120 gpu carney-gcondo 4 1-02:00:00 1-01:59:04 gpu2506
3949036 ft_longnet_seed120 gpu carney-gcondo 4 1-02:00:00 1-01:59:04 gpu2606
sbatch -J ft_longnet_seed181 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 0 --end_fold 1 --seed 181
sbatch -J ft_longnet_seed181 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 2 --end_fold 3 --seed 181
sbatch -J ft_longnet_seed181 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 4 --end_fold 5 --seed 181
sbatch -J ft_longnet_seed181 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 6 --end_fold 7 --seed 181
sbatch -J ft_longnet_seed181 -A carney-tserre-condo finetune_provpath.sh --data TCGA --start_fold 8 --end_fold 9 --seed 181
3949039 ft_longnet_seed181 gpu carney-gcondo 4 1-02:00:00 1-01:59:53 gpu2607
3949040 ft_longnet_seed181 gpu carney-gcondo 4 1-02:00:00 1-01:59:53 gpu2607
3949041 ft_longnet_seed181 gpu carney-gcondo 4 1-02:00:00 1-01:59:53 gpu2607
3949042 ft_longnet_seed181 gpu carney-gcondo 4 1-02:00:00 1-01:59:53 gpu2607
3949043 ft_longnet_seed181 gpu carney-gcondo 4 1-02:00:00 1-01:59:53 gpu2607
==============================================================================================================================================
Stage 4:
Epochs 70 + Top 10 checkpoints used. Raised since the scores were increasing and losses were on a downward trend.
Slide encoder layer: Trying all 13 layers.
The train set has all slides per patient and the val/test only has 1 max slide (as measured by num of tiles) per patient.
Commit = d0bc08b9590710f15f527a433d39745f5e4ef70f
----------------------------------------------------------------------------------------------------------------------------------------------
slide encoder layers: 0 to 12, for multiple layers use 0-5 (0th and 5th layer)
dict_keys(['layer_0_embed', 'layer_1_embed', 'layer_2_embed', 'layer_3_embed', 'layer_4_embed', 'layer_5_embed', 'layer_6_embed', 'layer_7_embed', 'layer_8_embed', 'layer_9_embed', 'layer_10_embed', 'layer_11_embed', 'layer_12_embed', 'last_layer_embed'])
# with seed 0
sbatch -J ft_l0_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 0 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l1_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 1 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l2_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 2 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l3_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 3 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l5_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 5 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l6_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 6 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l8_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 8 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l4_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 4 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l7_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 7 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l9_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 9 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l10_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 10 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l11_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 11 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l12_s0 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 12 --epoch 70 --start_fold 0 --end_fold 1
# with another seed 40
sbatch -J ft_l0_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 0 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l1_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 1 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l2_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 2 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l3_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 3 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l4_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 4 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l5_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 5 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l7_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 7 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l6_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 6 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l8_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 8 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l9_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 9 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l10_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 10 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l11_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 11 --epoch 70 --start_fold 0 --end_fold 1
sbatch -J ft_l12_s40 -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 40 --featlayer 12 --epoch 70 --start_fold 0 --end_fold 1
[vsharm44@vscode1 break-pathology]$ myjobinfo | grep ft
4131018 ft_l0_s0 2024-08-09T19:24:26 COMPLETED 14:43:01 60G
4131019 ft_l1_s0 2024-08-09T19:24:26 COMPLETED 15:12:25 60G
4131020 ft_l2_s0 2024-08-09T19:24:26 COMPLETED 16:05:46 60G
4131021 ft_l3_s0 2024-08-09T19:24:26 COMPLETED 16:26:10 60G
4131022 ft_l4_s0 2024-08-09T19:24:26 RUNNING 20:01:36 60G
4131023 ft_l5_s0 2024-08-09T19:24:26 COMPLETED 09:01:34 60G
4131024 ft_l6_s0 2024-08-09T19:24:26 COMPLETED 09:25:40 60G
4131025 ft_l7_s0 2024-08-09T19:24:26 COMPLETED 19:53:43 60G
4131026 ft_l8_s0 2024-08-09T19:24:26 COMPLETED 10:18:59 60G
4131028 ft_l10_s0 2024-08-09T19:24:26 RUNNING 20:01:36 60G
4131029 ft_l11_s0 2024-08-09T19:24:26 RUNNING 20:01:36 60G
4131031 ft_l0_s40 2024-08-09T19:24:37 COMPLETED 14:12:50 60G
4131032 ft_l1_s40 2024-08-09T19:24:37 COMPLETED 14:34:23 60G
4131033 ft_l2_s40 2024-08-09T19:24:37 COMPLETED 15:46:35 60G
4131034 ft_l3_s40 2024-08-09T19:24:37 COMPLETED 16:51:04 60G
4131035 ft_l4_s40 2024-08-09T19:24:37 COMPLETED 17:41:57 60G
4131038 ft_l7_s40 2024-08-09T19:24:37 RUNNING 20:01:25 60G
4131040 ft_l9_s40 2024-08-09T19:24:37 RUNNING 20:01:25 60G
4131041 ft_l10_s40 2024-08-09T19:24:37 RUNNING 20:01:25 60G
4131042 ft_l11_s40 2024-08-09T19:24:37 RUNNING 20:01:25 60G
4131043 ft_l12_s40 2024-08-09T19:24:37 RUNNING 20:01:25 60G
4131318 ft_l9_s0 2024-08-09T19:38:36 RUNNING 19:47:26 60G
4131320 ft_l12_s0 2024-08-09T19:40:35 RUNNING 19:45:27 60G
4131322 ft_l5_s40 2024-08-09T19:40:47 COMPLETED 18:06:24 60G
4131328 ft_l6_s40 2024-08-09T19:40:57 RUNNING 19:45:05 60G
4131329 ft_l8_s40 2024-08-09T19:41:04 RUNNING 19:44:58 60G
The patient stratification logic doesn't work. The model starts overfitting (val loss inc after 25 epochs while train decreases) and none of the runs meets the test AUROC score presented in the paper.
(2 seeds, 2 folds, 13 layers) so 52 runs.
==============================================================================================================================================
Stage 5:
Epochs 50 with top 5 checkpointing. Reduced to 50 from 70 because after that model was overfitting.
Slide encoder layers: 1 to 12. 0 has shown to not learn anything but maybe it was because of bad (overfitting) experiment design.
The train/val/test sets only have 1 largest slide (num tiles) per patient.
Commit = major commit indicating end of TCGA-LUAD
548d921671dbe47a76eefc22c25d8d66927a15d2
5704ce83ca69ed95f88e4b513d77adde6a7f42c3
seeds 0, 40, 140, 200, 250, 300, 350, 400, 450, 500
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sbatch -J ft_l0_s0_f0_strat -A carney-tserre-condo --mem=60G -t 30:00:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 0 --epoch 70 --start_fold 0 --end_fold 1 --pat_strat 1 --test_strat 1 --val_strat 1 --train_strat
sbatch -J test -A carney-tserre-condo -t 00:10:00 finetune_provpath.sh --data TCGA --seed 0 --featlayer 1 --start_fold 0 --end_fold 0 --pat_strat 1 --test_strat 1 --val_strat 1 --train_strat 1 --epoch_dryrun 1 --batch_dryrun 1