Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully randomized grids; Lazy-loading model dictionary #69

Merged
merged 31 commits into from
Oct 27, 2024
Merged
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d1b0678
update parameter options for completely random rotation angle of ense…
chenyangkang Oct 24, 2024
8826ea1
pre-commit syntax correction
chenyangkang Oct 24, 2024
93ed961
add more pytest on the completely_random_rotation; #59
chenyangkang Oct 24, 2024
9556f6e
update badge
chenyangkang Oct 24, 2024
c26e686
add lazyloading model dictionary choices; Save ensmebles of models to…
chenyangkang Oct 25, 2024
5f878da
fix tests
chenyangkang Oct 25, 2024
ec8cc91
fix test
chenyangkang Oct 25, 2024
99f0bb4
add test for lazyloading
chenyangkang Oct 26, 2024
26f3271
fix
chenyangkang Oct 26, 2024
da34854
lazy_loading/saving dir name no longer controled by random_state para…
chenyangkang Oct 26, 2024
62cdaae
fix
chenyangkang Oct 26, 2024
cf079e6
update
chenyangkang Oct 26, 2024
55e7dfe
fix
chenyangkang Oct 26, 2024
3ebe936
change njobs to n_jobs, following sklearn way
chenyangkang Oct 26, 2024
690ac08
add test for Hurdle_for_AdaSTEM
chenyangkang Oct 26, 2024
d573274
update tests to cover more
chenyangkang Oct 26, 2024
a7740f1
fix tests
chenyangkang Oct 26, 2024
9cf0b70
fix tests; fix AdaSTEM score method
chenyangkang Oct 27, 2024
045f552
lazy loading pytests
chenyangkang Oct 27, 2024
07bd202
fix
chenyangkang Oct 27, 2024
577637d
fix n_jobs
chenyangkang Oct 27, 2024
d0c4b0e
fix
chenyangkang Oct 27, 2024
d38ff40
fix
chenyangkang Oct 27, 2024
8fdba80
fix
chenyangkang Oct 27, 2024
be81716
update tests
chenyangkang Oct 27, 2024
522d384
update doc1
chenyangkang Oct 27, 2024
7f0c050
update
chenyangkang Oct 27, 2024
853ca39
fix
chenyangkang Oct 27, 2024
4f466eb
fix
chenyangkang Oct 27, 2024
df392d1
update version
chenyangkang Oct 27, 2024
2f6dc8f
update lazyloading docs
chenyangkang Oct 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix n_jobs
  • Loading branch information
chenyangkang committed Oct 27, 2024
commit 577637df95a0afc3c300e9c59b548e01905b5fdd
5 changes: 3 additions & 2 deletions docs/A_brief_introduction/A_brief_introduction.md
Original file line number Diff line number Diff line change
@@ -37,7 +37,7 @@ In the first case, the classifier and regressor "talk" to each other in each sep
User can define the size of the stixels (spatial temporal grids) in terms of space and time. Larger stixel promotes generalizability but loses precision in fine resolution; Smaller stixel may have better predictability in the exact area but reduced ability of extrapolation for points outside the stixel. See section [Optimizing stixel size](https://chenyangkang.github.io/stemflow/Examples/07.Optimizing_stixel_size.html) for discussion about selecting gridding parameters and [Tips for spatiotemporal indexing](https://chenyangkang.github.io/stemflow/Tips/Tips_for_spatiotemporal_indexing.html).

## A simple demo
In the demo, we first split the training data using temporal sliding windows with a size of 50 day of year (DOY) and step of 20 DOY (`temporal_start = 1`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval=50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 25 units (measured in longitude and latitude, `grid_len_upper_threshold=25`), and stop splitting to prevent the edge length being chunked below 5 units (`grid_len_lower_threshold=5`) or containing less than 50 checklists (`points_lower_threshold=50`). Model fitting is run using 1 core (`njobs=1`).
In the demo, we first split the training data using temporal sliding windows with a size of 50 day of year (DOY) and step of 20 DOY (`temporal_start = 1`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval=50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 25 units (measured in longitude and latitude, `grid_len_upper_threshold=25`), and stop splitting to prevent the edge length being chunked below 5 units (`grid_len_lower_threshold=5`) or containing less than 50 checklists (`points_lower_threshold=50`). Model fitting is run using 1 core (`n_jobs=1`).

This process is executed 10 times (`ensemble_fold = 10`), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediction phase, only spatial-temporal points with more than 7 (`min_ensemble_required = 7`) ensembles usable are predicted (otherwise, set as `np.nan`).

@@ -68,7 +68,8 @@ model = AdaSTEMRegressor(
Spatio2='latitude', # spatial coordinates shown in the dataframe
Temporal1='DOY',
use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor
njobs=1
n_jobs=1,
random_state=42
)
```

19 changes: 19 additions & 0 deletions docs/Examples/01.AdaSTEM_demo.ipynb
Original file line number Diff line number Diff line change
@@ -2985,6 +2985,25 @@
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tarfile \n",
" \n",
"# open file \n",
"file = tarfile.open('test.tar.gz') \n",
"print(file.getnames()) \n",
" \n",
"# extract files \n",
"file.extractall('./Destination_FolderName') \n",
" \n",
"# close file \n",
"file.close() "
]
},
{
"cell_type": "markdown",
"metadata": {},
19 changes: 19 additions & 0 deletions docs/Examples/08.Lazy_loading.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
4 changes: 2 additions & 2 deletions docs/Tips/Tips_for_different_tasks.md
Original file line number Diff line number Diff line change
@@ -37,7 +37,7 @@ model = AdaSTEMClassifier(
Spatio2='proj_lat',
Temporal1='DOY',
use_temporal_to_train=True,
njobs=1
n_jobs=1
)
```

@@ -103,7 +103,7 @@ model = AdaSTEMRegressor(
Spatio2='proj_lat',
Temporal1='DOY',
use_temporal_to_train=True,
njobs=1
n_jobs=1
)
```
Correspondingly, you would use a set of metrics for the regression problem:
10 changes: 5 additions & 5 deletions docs/Tips/Tips_for_spatiotemporal_indexing.md
Original file line number Diff line number Diff line change
@@ -57,7 +57,7 @@ model = AdaSTEMClassifier(
Spatio2='proj_lat',
Temporal1='Week',
use_temporal_to_train=True, # In each stixel, whether 'Week' should be a predictor
njobs=1
n_jobs=1
)
```

@@ -100,7 +100,7 @@ model = AdaSTEMClassifier(
Spatio2='proj_lat',
Temporal1='Week',
use_temporal_to_train=True,
njobs=1
n_jobs=1
)
```

@@ -132,7 +132,7 @@ model = AdaSTEMClassifier(
Spatio2='proj_lat',
Temporal1='Week',
use_temporal_to_train=True,
njobs=1
n_jobs=1
)
```

@@ -161,7 +161,7 @@ model = STEMClassifier(
Spatio2='proj_lat',
Temporal1='Week',
use_temporal_to_train=True,
njobs=1
n_jobs=1
)
```

@@ -194,7 +194,7 @@ model = SphereAdaSTEMRegressor(
points_lower_threshold=50, # Only stixels with more than 50 samples are trained
Temporal1='DOY',
use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor
njobs=1
n_jobs=1
)
```

3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -157,7 +157,8 @@ model = AdaSTEMRegressor(
Spatio2='latitude', # spatial coordinates shown in the dataframe
Temporal1='DOY',
use_temporal_to_train=True, # In each stixel, whether 'DOY' should be a predictor
njobs=1
n_jobs=1,
random_state=42
)
```

14 changes: 7 additions & 7 deletions stemflow/model/Hurdle.py
Original file line number Diff line number Diff line change
@@ -217,7 +217,7 @@ def fit(self, X_train: Union[pd.core.frame.DataFrame, np.ndarray], y_train: Sequ
def predict(
self,
X_test: Union[pd.core.frame.DataFrame, np.ndarray],
njobs: int = 1,
n_jobs: int = 1,
verbosity: int = 1,
return_by_separate_ensembles: bool = False,
) -> np.ndarray:
@@ -226,7 +226,7 @@ def predict(
Args:
X_test:
Test variables
njobs:
n_jobs:
Multi-processing in prediction.
verbosity:
Whether to show progress bar. 0 for No, and Yes other wise.
@@ -238,17 +238,17 @@ def predict(
"""
if verbosity == 0:
cls_res = self.classifier.predict(
X_test, njobs=njobs, verbosity=0, return_by_separate_ensembles=return_by_separate_ensembles
X_test, n_jobs=n_jobs, verbosity=0, return_by_separate_ensembles=return_by_separate_ensembles
)
reg_res = self.regressor.predict(
X_test, njobs=njobs, verbosity=0, return_by_separate_ensembles=return_by_separate_ensembles
X_test, n_jobs=n_jobs, verbosity=0, return_by_separate_ensembles=return_by_separate_ensembles
)
else:
cls_res = self.classifier.predict(
X_test, njobs=njobs, verbosity=1, return_by_separate_ensembles=return_by_separate_ensembles
X_test, n_jobs=n_jobs, verbosity=1, return_by_separate_ensembles=return_by_separate_ensembles
)
reg_res = self.regressor.predict(
X_test, njobs=njobs, verbosity=1, return_by_separate_ensembles=return_by_separate_ensembles
X_test, n_jobs=n_jobs, verbosity=1, return_by_separate_ensembles=return_by_separate_ensembles
)
# reg_res = np.where(reg_res>=0, reg_res, 0) ### we constrain the reg value to be positive
res = np.where(cls_res < 0.5, 0, cls_res)
@@ -267,7 +267,7 @@ def predict_proba(
Args:
X_test:
Testing variables
njobs:
n_jobs:
Multi-processing in prediction.
verbosity:
Whether to show progress bar. 0 for No, and Yes other wise.
Loading