-
Notifications
You must be signed in to change notification settings - Fork 61
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Doc] Add Imbalance label guide and reorg (#1176)
*Issue #, if available:* *Description of changes:* This PR adds one new document for handling imbalanced labels in classification and regression. It also reorganize the `Advanced Topics` slightly by removing the advanced-usage, and splitting its contents into two separated docs. The PR modified the title of `Advanced Topics` into `Practical & Advanced Guides` to reflect the complexity under this category, and add new docs into the index page's `Practical and Advanced Guides` section. Preview link:http://james4graphstorm.readthedocs.io/en/james_adv_imbalance/# By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-55-95.us-west-2.compute.internal> Co-authored-by: Theodore Vasiloudis <theodoros.vasiloudis@gmail.com> Co-authored-by: xiang song(charlie.song) <classicxsong@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-244.us-west-2.compute.internal>
- Loading branch information
1 parent
03af1e4
commit 6758c8d
Showing
6 changed files
with
293 additions
and
66 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
.. _imbalanced_labels: | ||
|
||
Deal with Imbalanced Labels in Classification/Regression | ||
========================================== | ||
|
||
In some cases, the number of labels of different classes could be imbalanced, i.e., some classes | ||
have either too many or too few data points. For example, most fraud detection tasks only have a | ||
small number of fraudulent activities (positive labels) versus a huge number of legitimate activities | ||
(negative labels). Even in regression tasks, it is possible to encounter many dominant values that | ||
can cause imbalanced labels. If not handled properly, these imbalanced labels could impact classification/regression | ||
model performance a lot. For example, because too many negative labels are fit into models, models | ||
may learn to classify all unseen samples as negative. GraphStorm | ||
provides several ways to tackle the class imbalance problem. | ||
|
||
For classification tasks, users can configure two arguments in command line interfaces (CLIs), the | ||
``imbalance_class_weights`` and ``class_loss_func``. | ||
|
||
The ``imbalance_class_weights`` allows users to give scale weights for each class, hence forcing models | ||
to learn more on the classes with higher scale weight. For example, if there are 10 positive labels versus | ||
90 negative labels, you can set ``imbalance_class_weights`` to be ``0.1, 0.9``, meaning class 0 (usually | ||
for negative labels) has weight ``0.1``, and class 1 (usually for positive labels) has weight ``0.9``. | ||
This places more importance on correctly classifying positive samples and less on negative ones. Below | ||
is an example about how to set the ``imbalance_class_weights`` in a YAML configuration file. | ||
|
||
.. code-block:: yaml | ||
imbalance_class_weights: 0.1,0.9 | ||
You can also set ``focal`` as the ``class_loss_func`` configuration's value, which will use the | ||
`focal loss function <https://arxiv.org/abs/1708.02002>`_ in binary classification tasks. The focal loss | ||
function is designed for imbalanced classes. Its formula is :math:`loss(p_t) = -\alpha_t(1-p_t)^{\gamma}log(p_t)`, | ||
where :math:`p_t=p`, if :math:`y=1`, otherwise, :math:`p_t = 1-p`. Here :math:`p` is the probability of output | ||
in a binary classification. This function has two hyperparameters, :math:`\alpha` and :math:`\gamma`, | ||
corresponding to the ``alpha`` and ``gamma`` configuration in GraphStorm. Larger values of ``gamma`` will help | ||
update models on harder cases so as to detect more positive samples if the positive to negative ratio is small. | ||
There is no clear guideline for values of ``alpha``. You can use its default value(``0.25``) first, and then | ||
search for optimal values. Below is an example about how to set the `focal loss funciton` in a YAML configuration file. | ||
|
||
.. code-block:: yaml | ||
class_loss_func: focal | ||
gamma: 10.0 | ||
alpha: 0.5 | ||
Apart from focal loss and class weights, you can also output the classification results as probabilities of positive and negative | ||
classes by setting the value of ``return_proba`` configuration to be ``true``. By default GraphStorm outputs | ||
classification results using the argmax values, e.g., either 0s or 1s in binary tasks, which equals to using | ||
``0.5`` as the threshold to classify negative from positive samples. With probabilities as outputs, you can use | ||
different thresholds, hence being able to achieve desired outcomes. For example, if you need higher recall to catch | ||
more suspicious positive samples, a smaller threshold, e.g., "0.25", could classify more positive cases. Or you may | ||
use methods like `ROC curve` or `Precision-Recall curve` to determine the optimal threshold. Below is an example about how | ||
to set the ``return_proba`` in a YAML configuration file. | ||
|
||
.. code-block:: yaml | ||
return_proba: true | ||
For regression tasks where there are some dominant values, e.g., 0s, in labels, GraphStorm provides the | ||
`shrinkage loss function <https://openaccess.thecvf.com/content_ECCV_2018/html/Xiankai_Lu_Deep_Regression_Tracking_ECCV_2018_paper.html>`_, | ||
which can be set by using ``shrinkage`` as value of the ``regression_loss_func`` configuration. Its formula is | ||
:math:`loss = l^2/(1 + \exp \left( \alpha \cdot (\gamma - l)\right))`, where :math:`l` is the absolute difference | ||
between predictions and labels. The shrinkage loss function also has the :math:`\alpha` and :math:`\gamma` hyperparameters. | ||
You can use the same ``alpha`` and ``gamma`` configuration as the focal loss function to modify their values. The shrinkage | ||
loss penalizes the importance of easy samples (when :math:`l < 0.5`) and keeps the loss of hard samples unchanged. Below is | ||
an example about how to set the `shrinkage loss function` in a YAML configuration file. | ||
|
||
.. code-block:: yaml | ||
regression_loss_func: shrinkage | ||
gamma: 0.2 | ||
alpha: 5 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,196 @@ | ||
.. _multi_target_ntypes: | ||
|
||
Multiple Target Node Types Training | ||
=================================== | ||
|
||
When training on a heterogeneous graph, we often need to train a model by minimizing the objective | ||
function on more than one node type. GraphStorm provides supports to achieve this goal. The recommended | ||
method is to leverage GraphStorm's multi-task learning method, i.e., using multiple node tasks, and each | ||
trained on one target node type. | ||
|
||
More detailed guide of using multi-task learning can be found in | ||
:ref:`Multi-task Learning in GraphStorm<multi_task_learning>`. This guide provides two examples of how | ||
to conduct two target node type classification training on the `movielen 100k <https://www.kaggle.com/datasets/prajitdatta/movielens-100k-dataset>`_ | ||
data, where the **movie** ("item" in the original data) and **user** node types have classification | ||
labels associated. | ||
|
||
Using multi-task learning for multiple target node types training (Recommended) | ||
-------------------------------------------------------------------------------- | ||
|
||
Preparing the training data | ||
............................ | ||
|
||
During graph construction step, you can define two classification tasks on the two node types as | ||
shown in the JSON example below. | ||
|
||
.. code-block:: json | ||
{ | ||
"version": "gconstruct-v0.1", | ||
"nodes": [ | ||
{ | ||
"node_type": "movie", | ||
...... | ||
], | ||
"labels": [ | ||
{ | ||
"label_col": "label_movie", | ||
"task_type": "classification", | ||
"split_pct": [0.8, 0.1, 0.1], | ||
"mask_field_names": ["train_mask_movie", | ||
"val_mask_movie", | ||
"test_mask_movie"] | ||
}, | ||
] | ||
}, | ||
{ | ||
"node_type": "user", | ||
...... | ||
], | ||
"labels": [ | ||
{ | ||
"label_col": "label_user", | ||
"task_type": "classification", | ||
"split_pct": [0.2, 0.2, 0.6], | ||
"mask_field_names": ["train_mask_user", | ||
"val_mask_user", | ||
"test_mask_user"] | ||
}, | ||
] | ||
}, | ||
], | ||
...... | ||
} | ||
The above configuration defines two classification tasks for the **movie** nodes and **user** nodes, respectively. | ||
Each node type has its own "lable_col" and train/validation/test mask fields associated. Then you can | ||
follow the instructions in :ref:`Run graph construction<run-graph-construction>` to use the GraphStorm | ||
construction tool for creating partitioned graph data. | ||
|
||
Define multi-task for model training | ||
............................... | ||
|
||
Now, you can specify two training tasks by providing the `multi_task_learning` configurations in | ||
the training configuration YAML file, like the example below. | ||
|
||
.. code-block:: yaml | ||
--- | ||
version: 1.0 | ||
gsf: | ||
basic: | ||
... | ||
multi_task_learning: | ||
- node_classification: | ||
target_ntype: "movie" | ||
label_field: "label_movie" | ||
mask_fields: | ||
- "train_mask_movie" | ||
- "val_mask_movie" | ||
- "test_mask_movie" | ||
num_classes: 10 | ||
task_weight: 0.5 | ||
- node_classification: | ||
target_ntype: "user" | ||
label_field: "label_user" | ||
mask_fields: | ||
- "train_mask_user" | ||
- "val_mask_user" | ||
- "test_mask_user" | ||
task_weight: 1.0 | ||
... | ||
The above configuration defines one classification task for the **movie** node type and another one | ||
for the **user** node type. The two node classification tasks will take their own label name, i.e., | ||
`label_movie` and `label_user`, and their own train/validation/test mask fields. It also defines | ||
which prioritizes user node classification (task_weight = 1.0) over movie node classification (task_weight = 0.5). | ||
(`task_weight = 1.0`) than classification on **movie** nodes (`task_weight = 0.5`). | ||
|
||
Run multi-task model training | ||
.............................. | ||
|
||
You can use the `graphstorm.run.gs_multi_task_learning` command to run multi-task learning tasks, | ||
like the following example. | ||
|
||
.. code-block:: bash | ||
python -m graphstorm.run.gs_multi_task_learning \ | ||
--workspace <PATH_TO_WORKSPACE> \ | ||
--num-trainers 1 \ | ||
--num-servers 1 \ | ||
--part-config <PATH_TO_GRAPH_DATA> \ | ||
--cf <PATH_TO_CONFIG> \ | ||
Run multi-task model Inference | ||
............................... | ||
|
||
For inference, you can use the same command line `graphstorm.run.gs_multi_task_learning` with an | ||
additional argument `--inference` as the following: | ||
|
||
.. code-block:: bash | ||
python -m graphstorm.run.gs_multi_task_learning \ | ||
--inference \ | ||
--workspace <PATH_TO_WORKSPACE> \ | ||
--num-trainers 1 \ | ||
--num-servers 1 \ | ||
--part-config <PATH_TO_GRAPH_DATA> \ | ||
--cf <PATH_TO_CONFIG> \ | ||
--save-prediction-path <PATH_TO_OUTPUT> | ||
The prediction results of each prediction tasks will be saved into different sub-directories under | ||
<PATH_TO_OUTPUT>. The sub-directories are prefixed with the `<task_type>_<node/edge_type>_<label_name>`. | ||
|
||
Using multi-target node type training (Not Recommended) | ||
------------------------------------------------------- | ||
|
||
You can also use GraphStorm's multi-target node types configuration. But this method is less | ||
flexible than the multi-task learning method. | ||
|
||
- Train on multiple node types: The users only need to edit the ``target_ntype`` in model config | ||
YAML file to minimize the objective function defined on mutiple target node types. For example, | ||
by setting ``target_ntype`` as following, we can jointly optimize the objective function defined | ||
on "movie" and "user" node types. | ||
|
||
.. code-block:: yaml | ||
target_ntype: | ||
- movie | ||
- user | ||
- During evaluation, the users need to choose a single node type. For example, by setting | ||
``eval_target_ntype: movie``, we will only perform evaluation on "movie" node type. GraphStorm | ||
only supports evaluating on a single node type. | ||
|
||
- Per target node type decoder: The users may also want to use a different decoder on each node type, | ||
where the output dimension for each decoder maybe different. We can achieve this by setting ``num_classes`` | ||
in model config YAML file. For example, by setting ``num_classes`` as following, GraphStorm will | ||
create a decoder with an output dimension as 3 for movie node type, and a decoder with an output | ||
dimension as 7 for user node type. | ||
|
||
.. code-block:: yaml | ||
num_classes: | ||
movie: 3 | ||
user: 7 | ||
- Reweighting on loss function: The users may also want to use a customized loss function reweighting | ||
on each node type, which can be achieved by setting ``multilabel``, ``multilabel_weights``, and | ||
``imbalance_class_weights``. Examples are illustrated as following. Our current implementation does | ||
not support different node types with different ``multilabel`` setting. | ||
|
||
.. code-block:: yaml | ||
multilabel: | ||
movie: true | ||
user: true | ||
multilabel_weights: | ||
movie: 0.1,0.2,0.3 | ||
user: 0.1,0.2,0.3,0.4,0.5,0.0 | ||
multilabel: | ||
movie: false | ||
user: false | ||
imbalance_class_weights: | ||
movie: 0.1,0.2,0.3 | ||
user: 0.1,0.2,0.3,0.4,0.5,0.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.