awslabs · zhjwy9343 · Feb 20, 2025 · Feb 14, 2025 · Feb 15, 2025 · Feb 15, 2025
diff --git a/docs/source/advanced/imbalanced-labels.rst b/docs/source/advanced/imbalanced-labels.rst
@@ -0,0 +1,73 @@
+.. _imbalanced_labels:
+
+Deal with Imbalanced Labels in Classification/Regression
+==========================================
+
+In some cases, the number of labels of different classes could be imbalanced, i.e., some classes
+have either too many or too few data points. For example, most fraud detection tasks only have a
+small number of fraudulent activities (positive labels) versus a huge number of legitimate activities
+(negative labels). Even in regression tasks, it is possible to encounter many dominant values that
+can cause imbalanced labels. If not handled properly, these imbalanced labels could impact classification/regression
+model performance a lot. For example, because too many negative labels are fit into models, models
+may learn to classify all unseen samples as negative. GraphStorm
+provides several ways to tackle the class imbalance problem.
+
+For classification tasks, users can configure two arguments in command line interfaces (CLIs), the
+``imbalance_class_weights`` and ``class_loss_func``.
+
+The ``imbalance_class_weights`` allows users to give scale weights for each class, hence forcing models
+to learn more on the classes with higher scale weight. For example, if there are 10 positive labels versus
+90 negative labels, you can set ``imbalance_class_weights`` to be ``0.1, 0.9``, meaning class 0 (usually
+for negative labels) has weight ``0.1``, and class 1 (usually for positive labels) has weight ``0.9``.
+This places more importance on correctly classifying positive samples and less on negative ones. Below
+is an example about how to set the ``imbalance_class_weights`` in a YAML configuration file.
+
+  .. code-block:: yaml
+
+    imbalance_class_weights: 0.1,0.9
+
+You can also set ``focal`` as the ``class_loss_func`` configuration's value, which will use the
+`focal loss function <https://arxiv.org/abs/1708.02002>`_ in binary classification tasks. The focal loss
+function is designed for imbalanced classes. Its formula is :math:`loss(p_t) = -\alpha_t(1-p_t)^{\gamma}log(p_t)`,
+where :math:`p_t=p`, if :math:`y=1`, otherwise, :math:`p_t = 1-p`. Here :math:`p` is the probability of output
+in a binary classification. This function has two hyperparameters, :math:`\alpha` and :math:`\gamma`,
+corresponding to the ``alpha`` and ``gamma`` configuration in GraphStorm. Larger values of ``gamma`` will help
+update models on harder cases so as to detect more positive samples if the positive to negative ratio is small.
+There is no clear guideline for values of ``alpha``. You can use its default value(``0.25``) first, and then
+search for optimal values. Below is an example about how to set the `focal loss funciton` in a YAML configuration file.
+
+  .. code-block:: yaml
+
+    class_loss_func: focal
+
+    gamma: 10.0
+    alpha: 0.5
+
+Apart from focal loss and class weights, you can also output the classification results as probabilities of positive and negative
+classes by setting the value of ``return_proba`` configuration to be ``true``. By default GraphStorm outputs
+classification results using the argmax values, e.g., either 0s or 1s in binary tasks, which equals to using
+``0.5`` as the threshold to classify negative from positive samples. With probabilities as outputs, you can use
+different thresholds, hence being able to achieve desired outcomes. For example, if you need higher recall to catch
+more suspicious positive samples, a smaller threshold, e.g., "0.25", could classify more positive cases. Or you may
+use methods like `ROC curve` or `Precision-Recall curve` to determine the optimal threshold. Below is an example about how
+to set the ``return_proba`` in a YAML configuration file.
+
+  .. code-block:: yaml
+
+    return_proba: true
+
+For regression tasks where there are some dominant values, e.g., 0s, in labels, GraphStorm provides the
+`shrinkage loss function <https://openaccess.thecvf.com/content_ECCV_2018/html/Xiankai_Lu_Deep_Regression_Tracking_ECCV_2018_paper.html>`_,
+which can be set by using ``shrinkage`` as value of the ``regression_loss_func`` configuration. Its formula is
+:math:`loss = l^2/(1 + \exp \left( \alpha \cdot (\gamma - l)\right))`, where :math:`l` is the absolute difference
+between predictions and labels. The shrinkage loss function also has the :math:`\alpha` and :math:`\gamma` hyperparameters.
+You can use the same ``alpha`` and ``gamma`` configuration as the focal loss function to modify their values. The shrinkage
+loss penalizes the importance of easy samples (when :math:`l < 0.5`) and keeps the loss of hard samples unchanged. Below is
+an example about how to set the `shrinkage loss function` in a YAML configuration file.
+
+  .. code-block:: yaml
+
+    regression_loss_func: shrinkage
+
+    gamma: 0.2
+    alpha: 5
diff --git a/docs/source/advanced/advanced-usages.rst → docs/source/advanced/multi-target-ntypes.rst b/docs/source/advanced/advanced-usages.rst → docs/source/advanced/multi-target-ntypes.rst
@@ -1,12 +1,9 @@
-.. _advanced_usages:
-
-GraphStorm Advanced Usages
-===========================
+.. _multi_target_ntypes:
 
 Multiple Target Node Types Training
--------------------------------------
+===================================
 
-When training on a hetergenious graph, we often need to train a model by minimizing the objective function on more than one node type. GraphStorm provides supports to achieve this goal.
+When training on a heterogeneous graph, we often need to train a model by minimizing the objective function on more than one node type. GraphStorm provides supports to achieve this goal.
 
 - Train on multiple node types: The users only need to edit the ``target_ntype`` in model config YAML file to minimize the objective function defined on mutiple target node types. For example, by setting ``target_ntype`` as following, we can jointly optimize the objective function defined on "movie" and "user" node types.
 

diff --git a/docs/source/cli/model-training-inference/configuration-run.rst b/docs/source/cli/model-training-inference/configuration-run.rst
@@ -397,14 +397,14 @@ General Configurations
             - For link prediction tasks, the default value is ``mrr``.
 - **gamma**: Set the value of the hyperparameter denoted by the symbol gamma. Gamma is used in the following cases: i/ focal loss for binary classification ii/ DistMult score function for link prediction, iii/ TransE score function for link prediction, iv/ RotatE score function for link prediction, v/ shrinkage loss for regression.
 
-    - Yaml: ``gamma: 10.0``
-    - Argument: ``--gamma 10.0``
-    - Default value: None
+    - Yaml: ``gamma: 2.0``
+    - Argument: ``--gamma 2.0``
+    - Default value: ``2``
 - **alpha**: Set the value of the hyperparameter denoted by the symbol alpha. Alpha is used in the following cases: i/ focal loss for binary classification and ii/ shrinkage loss for regression.
 
-    - Yaml: ``alpha: 10.0``
-    - Argument: ``--alpha 10.0``
-    - Default value: None
+    - Yaml: ``alpha: 0.25``
+    - Argument: ``--alpha 0.25``
+    - Default value: ``0.25``
 
 Classification and Regression Task
 ```````````````````````````````````

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -35,7 +35,7 @@ Welcome to the GraphStorm Documentation and Tutorials
 
 .. toctree::
    :maxdepth: 2
-   :caption: Advanced Topics
+   :caption: Practical & Advanced Guides
    :hidden:
    :glob:
 
@@ -44,11 +44,12 @@ Welcome to the GraphStorm Documentation and Tutorials
    advanced/link-prediction
    advanced/advanced-wholegraph
    advanced/multi-task-learning
-   advanced/advanced-usages
    advanced/using-graphbolt
+   advanced/multi-target-ntypes
+   advanced/imbalanced-labels
    advanced/gsprocessing-emr-ec2
 
-GraphStorm is a graph machine learning (GML) framework designed for enterprise use cases. It simplifies the development, training and deployment of GML models on industry-scale graphs (measured in billons of nodes and edges) by providing scalable training and inference pipelines of GML models. GraphStorm comes with a collection of built-in GML models, allowing users to train a GML model with a single command, eliminating the need to write any code. Moreover, GraphStorm provides a wide range of configurations to customiz model implementations and training pipelines, enhancing model performance. In addition, GraphStorm offers a programming interface that enables users to train custom GML models in a distributed manner. Users can bring their own model implementations and leverage the GraphStorm training pipeline for scalability.
+GraphStorm is a graph machine learning (GML) framework designed for enterprise use cases. It simplifies the development, training and deployment of GML models on industry-scale graphs (measured in billions of nodes and edges) by providing scalable training and inference pipelines of GML models. GraphStorm comes with a collection of built-in GML models, allowing users to train a GML model with a single command, eliminating the need to write any code. Moreover, GraphStorm provides a wide range of configurations to customize model implementations and training pipelines, enhancing model performance. In addition, GraphStorm offers a programming interface that enables users to train custom GML models in a distributed manner. Users can bring their own model implementations and leverage the GraphStorm training pipeline for scalability.
 
 Getting Started
 ----------------
@@ -83,16 +84,18 @@ The released GraphStorm APIs list the major components that can help users to de
 
 To help users use these APIs, GraphStorm also released a set of Jupyter notebooks at :ref:`GraphStorm API Programming Example Notebooks<programming-examples>`. By running these notebooks, users can explore some APIs, learn how to use APIs to reproduce CLIs pipelines, and then customize GraphStorm components for specific requirements.
 
-Users can find the comprehensive descriptions of these GraphStorm APIs in the :ref:`API Reference<api-reference>` documentations. For unrelease APIs, we encourage users to read their source code. If users want to have more APIs formally released, please raise issues at the `GraphStorm GitHub Repository <https://github.com/awslabs/graphstorm/issues>`_.
+Users can find the comprehensive descriptions of these GraphStorm APIs in the :ref:`API Reference<api-reference>` documentations. For unreleased APIs, we encourage users to read their source code. If users want to have more APIs formally released, please raise issues at the `GraphStorm GitHub Repository <https://github.com/awslabs/graphstorm/issues>`_.
 
-Advanced Topics
-----------------
+Practical and Advanced Guides
+------------------------------
 
 - For users who want to use their own GML models in GraphStorm, follow the :ref:`Use Your Own GNN Models<use-own-models>` tutorial to learn the programming interfaces and the steps of how to modify users' own models.
 - For users who want to leverage language models on nodes with text features, follow the :ref:`Use Language Model in GraphStorm<language_models>` tutorial to learn how to leverage BERT models to use text as node features in GraphStorm.
 - There are various usages of GraphStorm to both speed up training process and help to boost model performance for link prediction tasks. Users can find these usages in the :ref:`Link Prediction Learning in GraphStorm<link_prediction_usage>` page.
 - GraphStorm team has been working with NVIDIA team to integrate the NVIDIA's WholeGraph library into GraphStorm for speed-up of feature copy. Users can follow the :ref:`Use WholeGraph in GraphStorm<advanced_wholegraph>` tutorial to know more details.
-- In v0.3, GraphStorm releases an experimental feature to support multi-task learning on the same graph, allowing users to define multiple training targets on different nodes and edges within a single training loop. Users can check the :ref:`Multi-task Learning in GraphStorm<multi_task_learning>` tutorial to know more details.
+- Since v0.3, GraphStorm releases the feature to support multi-task learning on the same graph, allowing users to define multiple training targets on different nodes and edges within a single training loop. Users can check the :ref:`Multi-task Learning in GraphStorm<multi_task_learning>` tutorial to know more details.
+- Since v0.4, GraphStorm adds support for GraphBolt stochastic training. GraphBolt is a new data loading module for DGL that enables faster and more efficient graph sampling, potentially leading to significant efficiency benefits. For detailed use pf GraphBolt in GraphStorm, follow the :ref:`Using GraphBolt to speed up training and inference<using-graphbolt-ref>` guide.
+- For questions users asked frequently, there are several guides. The :ref:`Multiple Target Node Types Training<multi_target_ntypes>` document provides guides of using multiple target node types in training. The :ref:`Deal with Imbalance Labels in Classification/Regression<imbalanced_labels>` guide lists several built-in features that can help to tackle challenge of imbalanced labels. If users want to use their own AWS EMR for graph processing, the :ref:`Running distributed graph processing on customized EMR-on-EC2 clusters<gsprocessing_emr_ec2_customized_clusters>` guide provides more details.
 
 Contribution
 -------------