Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling for splitter issue #134 #135

Merged
merged 1 commit into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
penv/
env/
venv/
.vscode/settings.json
coco_instances_val2017.json
mypythonlib.egg-info/
Expand All @@ -19,4 +20,4 @@ road_sign_data.yaml
BCCD_Dataset/
model_training/
__pycache__
samples
samples
28 changes: 27 additions & 1 deletion pylabel/splitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,18 @@ def GroupShuffleSplit(
):
"""
This function uses the GroupShuffleSplit command from sklearn. It can split into 3 groups (train,
test, and val) by applying the command twice.
test, and val) by applying the command twice. If you want to split into only 2 groups (train and test),
then set val_pct to 0.
"""

# Check inputs and raise errors if needed
assert 0 < float(train_pct) < 1, "train_pct must be between 0 and 1"
assert 0 < float(test_pct) < 1, "test_pct must be between 0 and 1"
# check that the sum of train_pct, test_pct, and val_pct is equal to 1
assert (
round(train_pct + test_pct + val_pct, 1) == 1
), "Sum of train_pct, test_pct, and val_pct must equal 1."

df_main = self.dataset.df
gss = sklearnGroupShuffleSplit(
n_splits=1, train_size=train_pct, random_state=random_state
Expand Down Expand Up @@ -69,6 +79,22 @@ def StratifiedGroupShuffleSplit(
train, test, or val. When a split dataset is exported the annotations will be split into
seperate groups so that can be used used in model training, testing, and validation.
"""

# Check inputs and raise errors if needed
assert (
0 <= float(train_pct) <= 1
), "train_pct must be greater than or equal to 0 and less than or equal to 1"
assert (
0 <= float(test_pct) <= 1
), "test_pct must be greater than or equal to 0 and less than or equal to 1"
assert (
0 <= float(val_pct) <= 1
), "val_pct must be greater than or equal to 0 and less than or equal to 1"
# check that the sum of train_pct, test_pct, and val_pct is equal to 1
assert (
round(train_pct + test_pct + val_pct, 1) == 1
), "Sum of train_pct, test_pct, and val_pct must equal 1."

df_main = self.dataset.df
df_main = df_main.reindex(
np.random.permutation(df_main.index)
Expand Down