Skip to content

Commit

Permalink
Merge branch 'main' into pull-2.6-binaries
Browse files Browse the repository at this point in the history
  • Loading branch information
svekars authored Jan 24, 2025
2 parents 88cae7b + 2a30921 commit e263354
Show file tree
Hide file tree
Showing 10 changed files with 207 additions and 12 deletions.
6 changes: 6 additions & 0 deletions .lycheeignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,9 @@ https://pytorch.org/tutorials/beginner/colab/n

# Ignore local host link from intermediate_source/tensorboard_tutorial.rst
http://localhost:6006

# Ignore local host link from recipes_source/deployment_with_flask.rst
http://localhost:5000/predict

# Ignore local host link from advanced_source/cpp_frontend.rst
https://www.uber.com/blog/deep-neuroevolution/
4 changes: 2 additions & 2 deletions advanced_source/cpp_frontend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ the right tool for the job. Examples for such environments include:
Multiprocessing is an alternative, but not as scalable and has significant
shortcomings. C++ has no such constraints and threads are easy to use and
create. Models requiring heavy parallelization, like those used in `Deep
Neuroevolution <https://eng.uber.com/deep-neuroevolution/>`_, can benefit from
Neuroevolution <https://www.uber.com/blog/deep-neuroevolution/>`_, can benefit from
this.
- **Existing C++ Codebases**: You may be the owner of an existing C++
application doing anything from serving web pages in a backend server to
Expand Down Expand Up @@ -662,7 +662,7 @@ Defining the DCGAN Modules
We now have the necessary background and introduction to define the modules for
the machine learning task we want to solve in this post. To recap: our task is
to generate images of digits from the `MNIST dataset
<http://yann.lecun.com/exdb/mnist/>`_. We want to use a `generative adversarial
<https://huggingface.co/datasets/ylecun/mnist>`_. We want to use a `generative adversarial
network (GAN)
<https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`_ to solve
this task. In particular, we'll use a `DCGAN architecture
Expand Down
6 changes: 5 additions & 1 deletion en-wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ FX
FX's
FairSeq
Fastpath
FakeTensor
FakeTensors
FFN
FloydHub
FloydHub's
Expand Down Expand Up @@ -368,6 +370,8 @@ downsample
downsamples
dropdown
dtensor
dtype
dtypes
duration
elementwise
embeddings
Expand Down Expand Up @@ -615,6 +619,7 @@ triton
uint
UX
umap
unbacked
uncomment
uncommented
underflowing
Expand Down Expand Up @@ -651,7 +656,6 @@ RecSys
TorchRec
sharding
TBE
dtype
EBC
sharder
hyperoptimized
Expand Down
2 changes: 1 addition & 1 deletion intermediate_source/FSDP_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ It also comes with considerable engineering complexity to handle the training of
`PyTorch FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`__, released in PyTorch 1.11 makes this easier.

In this tutorial, we show how to use `FSDP APIs <https://pytorch.org/docs/stable/fsdp.html>`__, for simple MNIST models that can be extended to other larger models such as `HuggingFace BERT models <https://huggingface.co/blog/zero-deepspeed-fairscale>`__,
`GPT 3 models up to 1T parameters <https://pytorch.medium.com/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ . The sample DDP MNIST code has been borrowed from `here <https://github.com/yqhu/mnist_examples>`__.
`GPT 3 models up to 1T parameters <https://pytorch.medium.com/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ . The sample DDP MNIST code courtesy of `Patrick Hu <https://github.com/yqhu/>`_.


How FSDP works
Expand Down
2 changes: 1 addition & 1 deletion intermediate_source/ddp_series_minGPT.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ training <ddp_series_multinode.html>`__ \|\| **minGPT Training**
Training “real-world” models with DDP
=====================================

Authors: `Suraj Subramanian <https://github.com/suraj813>`__
Authors: `Suraj Subramanian <https://github.com/subramen>`__

.. grid:: 2

Expand Down
2 changes: 1 addition & 1 deletion intermediate_source/ddp_series_multinode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ training** \|\| `minGPT Training <ddp_series_minGPT.html>`__
Multinode Training
==================

Authors: `Suraj Subramanian <https://github.com/suraj813>`__
Authors: `Suraj Subramanian <https://github.com/subramen>`__

.. grid:: 2

Expand Down
6 changes: 3 additions & 3 deletions intermediate_source/dynamic_quantization_bert_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ the following helper functions: one for converting the text examples
into the feature vectors; The other one for measuring the F1 score of
the predicted result.

The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/main/src/transformers/data/datasets/glue.py>`_ function converts the texts into input features:

- Tokenize the input sequences;
- Insert [CLS] in the beginning;
Expand All @@ -147,7 +147,7 @@ The `glue_convert_examples_to_features <https://github.com/huggingface/transform
- Generate token type ids to indicate whether a token belongs to the
first sequence or the second sequence.

The `glue_compute_metrics <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function has the compute metrics with
The `glue_compute_metrics <https://github.com/huggingface/transformers/blob/main/src/transformers/data/metrics/__init__.py#L60>`_ function has the compute metrics with
the `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_, which
can be interpreted as a weighted average of the precision and recall,
where an F1 score reaches its best value at 1 and worst score at 0. The
Expand Down Expand Up @@ -273,7 +273,7 @@ We load the tokenizer and fine-tuned BERT sequence classifier model
2.3 Define the tokenize and evaluation function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We reuse the tokenize and evaluation function from `HuggingFace <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
We reuse the tokenize and evaluation function from `HuggingFace <https://github.com/huggingface/transformers/blob/main/examples/legacy/pytorch-lightning/run_glue.py>`_.

.. code:: python
Expand Down
185 changes: 185 additions & 0 deletions intermediate_source/torch_export_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -629,6 +629,191 @@ def forward(self, x, y):
"bool_val": None,
}

######################################################################
# Data-dependent errors
# ---------------------
#
# While trying to export models, you have may have encountered errors like "Could not guard on data-dependent expression", or Could not extract specialized integer from data-dependent expression".
# These errors exist because ``torch.export()`` compiles programs using FakeTensors, which symbolically represent their real tensor counterparts. While these have equivalent symbolic properties
# (e.g. sizes, strides, dtypes), they diverge in that FakeTensors do not contain any data values. While this avoids unnecessary memory usage and expensive computation, it does mean that export may be
# unable to out-of-the-box compile parts of user code where compilation relies on data values. In short, if the compiler requires a concrete, data-dependent value in order to proceed, it will error out,
# complaining that the value is not available.
#
# Data-dependent values appear in many places, and common sources are calls like ``item()``, ``tolist()``, or ``torch.unbind()`` that extract scalar values from tensors.
# How are these values represented in the exported program? In the `Constraints/Dynamic Shapes <https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html#constraints-dynamic-shapes>`_
# section, we talked about allocating symbols to represent dynamic input dimensions.
# The same happens here: we allocate symbols for every data-dependent value that appears in the program. The important distinction is that these are "unbacked" symbols,
# in contrast to the "backed" symbols allocated for input dimensions. The `"backed/unbacked" <https://pytorch.org/docs/main/export.programming_model.html#basics-of-symbolic-shapes>`_
# nomenclature refers to the presence/absence of a "hint" for the symbol: a concrete value backing the symbol, that can inform the compiler on how to proceed.
#
# In the input shape symbol case (backed symbols), these hints are simply the sample input shapes provided, which explains why control-flow branching is determined by the sample input properties.
# For data-dependent values, the symbols are taken from FakeTensor "data" during tracing, and so the compiler doesn't know the actual values (hints) that these symbols would take on.
#
# Let's see how these show up in exported programs:

class Foo(torch.nn.Module):
def forward(self, x, y):
a = x.item()
b = y.tolist()
return b + [a]

inps = (
torch.tensor(1),
torch.tensor([2, 3]),
)
ep = export(Foo(), inps)
print(ep)

######################################################################
# The result is that 3 unbacked symbols (notice they're prefixed with "u", instead of the usual "s" for input shape/backed symbols) are allocated and returned:
# 1 for the ``item()`` call, and 1 for each of the elements of ``y`` with the ``tolist()`` call.
# Note from the range constraints field that these take on ranges of ``[-int_oo, int_oo]``, not the default ``[0, int_oo]`` range allocated to input shape symbols,
# since we have no information on what these values are - they don't represent sizes, so don't necessarily have positive values.

######################################################################
# Guards, torch._check()
# ^^^^^^^^^^^^^^^^^^^^^^
#
# But the case above is easy to export, because the concrete values of these symbols aren't used in any compiler decision-making; all that's relevant is that the return values are unbacked symbols.
# The data-dependent errors highlighted in this section are cases like the following, where `data-dependent guards <https://pytorch.org/docs/main/export.programming_model.html#control-flow-static-vs-dynamic>`_ are encountered:

class Foo(torch.nn.Module):
def forward(self, x, y):
a = x.item()
if a // 2 >= 5:
return y + 2
else:
return y * 5

######################################################################
# Here we actually need the "hint", or the concrete value of ``a`` for the compiler to decide whether to trace ``return y + 2`` or ``return y * 5`` as the output.
# Because we trace with FakeTensors, we don't know what ``a // 2 >= 5`` actually evaluates to, and export errors out with "Could not guard on data-dependent expression ``u0 // 2 >= 5 (unhinted)``".
#
# So how do we export this toy model? Unlike ``torch.compile()``, export requires full graph compilation, and we can't just graph break on this. Here are some basic options:
#
# 1. Manual specialization: we could intervene by selecting the branch to trace, either by removing the control-flow code to contain only the specialized branch, or using ``torch.compiler.is_compiling()`` to guard what's traced at compile-time.
# 2. ``torch.cond()``: we could rewrite the control-flow code to use ``torch.cond()`` so we don't specialize on a branch.
#
# While these options are valid, they have their pitfalls. Option 1 sometimes requires drastic, invasive rewrites of the model code to specialize, and ``torch.cond()`` is not a comprehensive system for handling data-dependent errors.
# As we will see, there are data-dependent errors that do not involve control-flow.
#
# The generally recommended approach is to start with ``torch._check()`` calls. While these give the impression of purely being assert statements, they are in fact a system of informing the compiler on properties of symbols.
# While a ``torch._check()`` call does act as an assertion at runtime, when traced at compile-time, the checked expression is sent to the symbolic shapes subsystem for reasoning, and any symbol properties that follow from the expression being true,
# are stored as symbol properties (provided it's smart enough to infer those properties). So even if unbacked symbols don't have hints, if we're able to communicate properties that are generally true for these symbols via
# ``torch._check()`` calls, we can potentially bypass data-dependent guards without rewriting the offending model code.
#
# For example in the model above, inserting ``torch._check(a >= 10)`` would tell the compiler that ``y + 2`` can always be returned, and ``torch._check(a == 4)`` tells it to return ``y * 5``.
# See what happens when we re-export this model.

class Foo(torch.nn.Module):
def forward(self, x, y):
a = x.item()
torch._check(a >= 10)
torch._check(a <= 60)
if a // 2 >= 5:
return y + 2
else:
return y * 5

inps = (
torch.tensor(32),
torch.randn(4),
)
ep = export(Foo(), inps)
print(ep)

######################################################################
# Export succeeds, and note from the range constraints field that ``u0`` takes on a range of ``[10, 60]``.
#
# So what information do ``torch._check()`` calls actually communicate? This varies as the symbolic shapes subsystem gets smarter, but at a fundamental level, these are generally true:
#
# 1. Equality with non-data-dependent expressions: ``torch._check()`` calls that communicate equalities like ``u0 == s0 + 4`` or ``u0 == 5``.
# 2. Range refinement: calls that provide lower or upper bounds for symbols, like the above.
# 3. Some basic reasoning around more complicated expressions: inserting ``torch._check(a < 4)`` will typically tell the compiler that ``a >= 4`` is false. Checks on complex expressions like ``torch._check(a ** 2 - 3 * a <= 10)`` will typically get you past identical guards.
#
# As mentioned previously, ``torch._check()`` calls have applicability outside of data-dependent control flow. For example, here's a model where ``torch._check()`` insertion
# prevails while manual specialization & ``torch.cond()`` do not:

class Foo(torch.nn.Module):
def forward(self, x, y):
a = x.item()
return y[a]

inps = (
torch.tensor(32),
torch.randn(60),
)
export(Foo(), inps)

######################################################################
# Here is a scenario where ``torch._check()`` insertion is required simply to prevent an operation from failing. The export call will fail with
# "Could not guard on data-dependent expression ``-u0 > 60``", implying that the compiler doesn't know if this is a valid indexing operation -
# if the value of ``x`` is out-of-bounds for ``y`` or not. Here, manual specialization is too prohibitive, and ``torch.cond()`` has no place.
# Instead, informing the compiler of ``u0``'s range is sufficient:

class Foo(torch.nn.Module):
def forward(self, x, y):
a = x.item()
torch._check(a >= 0)
torch._check(a <= y.shape[0])
return y[a]

inps = (
torch.tensor(32),
torch.randn(60),
)
ep = export(Foo(), inps)
print(ep)

######################################################################
# Specialized values
# ^^^^^^^^^^^^^^^^^^
#
# Another category of data-dependent error happens when the program attempts to extract a concrete data-dependent integer/float value
# while tracing. This looks something like "Could not extract specialized integer from data-dependent expression", and is analogous to
# the previous class of errors - if these occur when attempting to evaluate concrete integer/float values, data-dependent guard errors arise
# with evaluating concrete boolean values.
#
# This error typically occurs when there is an explicit or implicit ``int()`` cast on a data-dependent expression. For example, this list comprehension
# has a `range()` call that implicitly does an ``int()`` cast on the size of the list:

class Foo(torch.nn.Module):
def forward(self, x, y):
a = x.item()
b = torch.cat([y for y in range(a)], dim=0)
return b + int(a)

inps = (
torch.tensor(32),
torch.randn(60),
)
export(Foo(), inps, strict=False)

######################################################################
# For these errors, some basic options you have are:
#
# 1. Avoid unnecessary ``int()`` cast calls, in this case the ``int(a)`` in the return statement.
# 2. Use ``torch._check()`` calls; unfortunately all you may be able to do in this case is specialize (with ``torch._check(a == 60)``).
# 3. Rewrite the offending code at a higher level. For example, the list comprehension is semantically a ``repeat()`` op, which doesn't involve an ``int()`` cast. The following rewrite avoids data-dependent errors:

class Foo(torch.nn.Module):
def forward(self, x, y):
a = x.item()
b = y.unsqueeze(0).repeat(a, 1)
return b + a

inps = (
torch.tensor(32),
torch.randn(60),
)
ep = export(Foo(), inps, strict=False)
print(ep)

######################################################################
# Data-dependent errors can be much more involved, and there are many more options in your toolkit to deal with them: ``torch._check_is_size()``, ``guard_size_oblivious()``, or real-tensor tracing, as starters.
# For more in-depth guides, please refer to the `Export Programming Model <https://pytorch.org/docs/main/export.programming_model.html>`_,
# or `Dealing with GuardOnDataDependentSymNode errors <https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs>`_.

######################################################################
# Custom Ops
# ----------
Expand Down
4 changes: 2 additions & 2 deletions intermediate_source/torchserve_with_ipex.rst
Original file line number Diff line number Diff line change
Expand Up @@ -379,8 +379,8 @@ For interested readers, please check out the following documents:

- `CPU specific optimizations <https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#cpu-specific-optimizations>`_
- `Maximize Performance of Intel® Software Optimization for PyTorch* on CPU <https://www.intel.com/content/www/us/en/developer/articles/technical/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html>`_
- `Performance Tuning Guide <https://intel.github.io/intel-extension-for-pytorch/tutorials/performance_tuning/tuning_guide.html>`_
- `Launch Script Usage Guide <https://intel.github.io/intel-extension-for-pytorch/tutorials/performance_tuning/launch_script.html>`_
- `Performance Tuning Guide <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/tuning_guide.html>`_
- `Launch Script Usage Guide <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html>`_
- `Top-down Microarchitecture Analysis Method <https://www.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html>`_
- `Configuring oneDNN for Benchmarking <https://oneapi-src.github.io/oneDNN/dev_guide_performance_settings.html#benchmarking-settings>`_
- `Intel® VTune™ Profiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html#gs.tcbgpa>`_
Expand Down
2 changes: 1 addition & 1 deletion prototype_source/fx_graph_mode_ptq_static.rst
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ of the observers for activation and weight. ``QConfigMapping`` contains mapping
Utility functions related to ``qconfig`` can be found in the `qconfig <https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/qconfig.py>`_ file
while those for ``QConfigMapping`` can be found in the `qconfig_mapping <https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/qconfig_mapping.py>`
while those for ``QConfigMapping`` can be found in the `qconfig_mapping <https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/fx/qconfig_mapping_utils.py>`

.. code:: python
Expand Down

0 comments on commit e263354

Please sign in to comment.