From 84ac70ae260c25ab0088be17adc7f0d3eee2349d Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Wed, 22 Jan 2025 14:19:16 -0500 Subject: [PATCH 01/10] Update .lycheeignore (#3247) * Update .lycheeignore Adding a link to ignore * Update .lycheeignore --- .lycheeignore | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.lycheeignore b/.lycheeignore index 994f287121..ed70fffc2a 100644 --- a/.lycheeignore +++ b/.lycheeignore @@ -12,3 +12,6 @@ https://pytorch.org/tutorials/beginner/colab/n # Ignore local host link from intermediate_source/tensorboard_tutorial.rst http://localhost:6006 + +# Ignore local host link from recipes_source/deployment_with_flask.rst +http://localhost:5000/predict From 85d0fa9df7d0c88a580142cbf59e95c554263068 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Thu, 23 Jan 2025 01:02:32 -0500 Subject: [PATCH 02/10] Update ddp_series_multinode.rst (#3248) Edit author link --- intermediate_source/ddp_series_multinode.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate_source/ddp_series_multinode.rst b/intermediate_source/ddp_series_multinode.rst index 5717589bda..8746eb19bb 100644 --- a/intermediate_source/ddp_series_multinode.rst +++ b/intermediate_source/ddp_series_multinode.rst @@ -6,7 +6,7 @@ training** \|\| `minGPT Training `__ Multinode Training ================== -Authors: `Suraj Subramanian `__ +Authors: `Suraj Subramanian `__ .. grid:: 2 From 3c565ca8123bcba84f530820ef968db5bff14556 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Thu, 23 Jan 2025 02:18:05 -0500 Subject: [PATCH 03/10] Update ddp_series_minGPT.rst (#3246) --- intermediate_source/ddp_series_minGPT.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate_source/ddp_series_minGPT.rst b/intermediate_source/ddp_series_minGPT.rst index 259db3623c..743568ae18 100644 --- a/intermediate_source/ddp_series_minGPT.rst +++ b/intermediate_source/ddp_series_minGPT.rst @@ -6,7 +6,7 @@ training `__ \|\| **minGPT Training** Training “real-world” models with DDP ===================================== -Authors: `Suraj Subramanian `__ +Authors: `Suraj Subramanian `__ .. grid:: 2 From 15547173ea4825faeb18c99a1f4eae2aecf31b65 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Thu, 23 Jan 2025 10:57:19 -0500 Subject: [PATCH 04/10] Update dynamic_quantization_bert_tutorial.rst (#3239) Update links in dynamic quantization bert tutorial --- intermediate_source/dynamic_quantization_bert_tutorial.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/intermediate_source/dynamic_quantization_bert_tutorial.rst b/intermediate_source/dynamic_quantization_bert_tutorial.rst index e515f53a1d..786ef11f3b 100644 --- a/intermediate_source/dynamic_quantization_bert_tutorial.rst +++ b/intermediate_source/dynamic_quantization_bert_tutorial.rst @@ -138,7 +138,7 @@ the following helper functions: one for converting the text examples into the feature vectors; The other one for measuring the F1 score of the predicted result. -The `glue_convert_examples_to_features `_ function converts the texts into input features: +The `glue_convert_examples_to_features `_ function converts the texts into input features: - Tokenize the input sequences; - Insert [CLS] in the beginning; @@ -147,7 +147,7 @@ The `glue_convert_examples_to_features `_ function has the compute metrics with +The `glue_compute_metrics `_ function has the compute metrics with the `F1 score `_, which can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The @@ -273,7 +273,7 @@ We load the tokenizer and fine-tuned BERT sequence classifier model 2.3 Define the tokenize and evaluation function ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -We reuse the tokenize and evaluation function from `HuggingFace `_. +We reuse the tokenize and evaluation function from `HuggingFace `_. .. code:: python From 3b1257d2644161e0ec1972a2b0b5298c2306b135 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Thu, 23 Jan 2025 13:35:14 -0500 Subject: [PATCH 05/10] Update .lycheeignore (#3253) Adding https://www.uber.com/blog/deep-neuroevolution/ to ignore, link valid but not passing --- .lycheeignore | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.lycheeignore b/.lycheeignore index ed70fffc2a..3d86ae872d 100644 --- a/.lycheeignore +++ b/.lycheeignore @@ -15,3 +15,6 @@ http://localhost:6006 # Ignore local host link from recipes_source/deployment_with_flask.rst http://localhost:5000/predict + +# Ignore local host link from advanced_source/cpp_frontend.rst +https://www.uber.com/blog/deep-neuroevolution/ From 903f7af9ca7594fe53b0dc1ba6e7be4f1ab9f9ce Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Thu, 23 Jan 2025 14:58:28 -0500 Subject: [PATCH 06/10] Update cpp_frontend.rst (#3245) Update broken links --- advanced_source/cpp_frontend.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/advanced_source/cpp_frontend.rst b/advanced_source/cpp_frontend.rst index de22fbf05a..d31be00c63 100644 --- a/advanced_source/cpp_frontend.rst +++ b/advanced_source/cpp_frontend.rst @@ -57,7 +57,7 @@ the right tool for the job. Examples for such environments include: Multiprocessing is an alternative, but not as scalable and has significant shortcomings. C++ has no such constraints and threads are easy to use and create. Models requiring heavy parallelization, like those used in `Deep - Neuroevolution `_, can benefit from + Neuroevolution `_, can benefit from this. - **Existing C++ Codebases**: You may be the owner of an existing C++ application doing anything from serving web pages in a backend server to @@ -662,7 +662,7 @@ Defining the DCGAN Modules We now have the necessary background and introduction to define the modules for the machine learning task we want to solve in this post. To recap: our task is to generate images of digits from the `MNIST dataset -`_. We want to use a `generative adversarial +`_. We want to use a `generative adversarial network (GAN) `_ to solve this task. In particular, we'll use a `DCGAN architecture From 5a5edfc611c22efcc3dc5192212d0df3a12aa30e Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Thu, 23 Jan 2025 16:23:02 -0500 Subject: [PATCH 07/10] Update torchserve_with_ipex.rst (#3249) Update to the tuning guide and launch script links Co-authored-by: Mark Saroufim --- intermediate_source/torchserve_with_ipex.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/intermediate_source/torchserve_with_ipex.rst b/intermediate_source/torchserve_with_ipex.rst index 1a11b4180f..23d91f50cb 100644 --- a/intermediate_source/torchserve_with_ipex.rst +++ b/intermediate_source/torchserve_with_ipex.rst @@ -379,8 +379,8 @@ For interested readers, please check out the following documents: - `CPU specific optimizations `_ - `Maximize Performance of Intel® Software Optimization for PyTorch* on CPU `_ -- `Performance Tuning Guide `_ -- `Launch Script Usage Guide `_ +- `Performance Tuning Guide `_ +- `Launch Script Usage Guide `_ - `Top-down Microarchitecture Analysis Method `_ - `Configuring oneDNN for Benchmarking `_ - `Intel® VTune™ Profiler `_ From 37e0b1ee04b199b375aa474302c5198ce7dec27f Mon Sep 17 00:00:00 2001 From: Pian Pawakapan Date: Thu, 23 Jan 2025 15:28:12 -0800 Subject: [PATCH 08/10] [BE][export] add data-dependent section to export tutorial (#3244) --- en-wordlist.txt | 6 +- intermediate_source/torch_export_tutorial.py | 185 +++++++++++++++++++ 2 files changed, 190 insertions(+), 1 deletion(-) diff --git a/en-wordlist.txt b/en-wordlist.txt index 7c2ed6c398..b56df45df0 100644 --- a/en-wordlist.txt +++ b/en-wordlist.txt @@ -81,6 +81,8 @@ FX FX's FairSeq Fastpath +FakeTensor +FakeTensors FFN FloydHub FloydHub's @@ -368,6 +370,8 @@ downsample downsamples dropdown dtensor +dtype +dtypes duration elementwise embeddings @@ -615,6 +619,7 @@ triton uint UX umap +unbacked uncomment uncommented underflowing @@ -651,7 +656,6 @@ RecSys TorchRec sharding TBE -dtype EBC sharder hyperoptimized diff --git a/intermediate_source/torch_export_tutorial.py b/intermediate_source/torch_export_tutorial.py index 9acacf5362..c992eefa9f 100644 --- a/intermediate_source/torch_export_tutorial.py +++ b/intermediate_source/torch_export_tutorial.py @@ -629,6 +629,191 @@ def forward(self, x, y): "bool_val": None, } +###################################################################### +# Data-dependent errors +# --------------------- +# +# While trying to export models, you have may have encountered errors like "Could not guard on data-dependent expression", or Could not extract specialized integer from data-dependent expression". +# These errors exist because ``torch.export()`` compiles programs using FakeTensors, which symbolically represent their real tensor counterparts. While these have equivalent symbolic properties +# (e.g. sizes, strides, dtypes), they diverge in that FakeTensors do not contain any data values. While this avoids unnecessary memory usage and expensive computation, it does mean that export may be +# unable to out-of-the-box compile parts of user code where compilation relies on data values. In short, if the compiler requires a concrete, data-dependent value in order to proceed, it will error out, +# complaining that the value is not available. +# +# Data-dependent values appear in many places, and common sources are calls like ``item()``, ``tolist()``, or ``torch.unbind()`` that extract scalar values from tensors. +# How are these values represented in the exported program? In the `Constraints/Dynamic Shapes `_ +# section, we talked about allocating symbols to represent dynamic input dimensions. +# The same happens here: we allocate symbols for every data-dependent value that appears in the program. The important distinction is that these are "unbacked" symbols, +# in contrast to the "backed" symbols allocated for input dimensions. The `"backed/unbacked" `_ +# nomenclature refers to the presence/absence of a "hint" for the symbol: a concrete value backing the symbol, that can inform the compiler on how to proceed. +# +# In the input shape symbol case (backed symbols), these hints are simply the sample input shapes provided, which explains why control-flow branching is determined by the sample input properties. +# For data-dependent values, the symbols are taken from FakeTensor "data" during tracing, and so the compiler doesn't know the actual values (hints) that these symbols would take on. +# +# Let's see how these show up in exported programs: + +class Foo(torch.nn.Module): + def forward(self, x, y): + a = x.item() + b = y.tolist() + return b + [a] + +inps = ( + torch.tensor(1), + torch.tensor([2, 3]), +) +ep = export(Foo(), inps) +print(ep) + +###################################################################### +# The result is that 3 unbacked symbols (notice they're prefixed with "u", instead of the usual "s" for input shape/backed symbols) are allocated and returned: +# 1 for the ``item()`` call, and 1 for each of the elements of ``y`` with the ``tolist()`` call. +# Note from the range constraints field that these take on ranges of ``[-int_oo, int_oo]``, not the default ``[0, int_oo]`` range allocated to input shape symbols, +# since we have no information on what these values are - they don't represent sizes, so don't necessarily have positive values. + +###################################################################### +# Guards, torch._check() +# ^^^^^^^^^^^^^^^^^^^^^^ +# +# But the case above is easy to export, because the concrete values of these symbols aren't used in any compiler decision-making; all that's relevant is that the return values are unbacked symbols. +# The data-dependent errors highlighted in this section are cases like the following, where `data-dependent guards `_ are encountered: + +class Foo(torch.nn.Module): + def forward(self, x, y): + a = x.item() + if a // 2 >= 5: + return y + 2 + else: + return y * 5 + +###################################################################### +# Here we actually need the "hint", or the concrete value of ``a`` for the compiler to decide whether to trace ``return y + 2`` or ``return y * 5`` as the output. +# Because we trace with FakeTensors, we don't know what ``a // 2 >= 5`` actually evaluates to, and export errors out with "Could not guard on data-dependent expression ``u0 // 2 >= 5 (unhinted)``". +# +# So how do we export this toy model? Unlike ``torch.compile()``, export requires full graph compilation, and we can't just graph break on this. Here are some basic options: +# +# 1. Manual specialization: we could intervene by selecting the branch to trace, either by removing the control-flow code to contain only the specialized branch, or using ``torch.compiler.is_compiling()`` to guard what's traced at compile-time. +# 2. ``torch.cond()``: we could rewrite the control-flow code to use ``torch.cond()`` so we don't specialize on a branch. +# +# While these options are valid, they have their pitfalls. Option 1 sometimes requires drastic, invasive rewrites of the model code to specialize, and ``torch.cond()`` is not a comprehensive system for handling data-dependent errors. +# As we will see, there are data-dependent errors that do not involve control-flow. +# +# The generally recommended approach is to start with ``torch._check()`` calls. While these give the impression of purely being assert statements, they are in fact a system of informing the compiler on properties of symbols. +# While a ``torch._check()`` call does act as an assertion at runtime, when traced at compile-time, the checked expression is sent to the symbolic shapes subsystem for reasoning, and any symbol properties that follow from the expression being true, +# are stored as symbol properties (provided it's smart enough to infer those properties). So even if unbacked symbols don't have hints, if we're able to communicate properties that are generally true for these symbols via +# ``torch._check()`` calls, we can potentially bypass data-dependent guards without rewriting the offending model code. +# +# For example in the model above, inserting ``torch._check(a >= 10)`` would tell the compiler that ``y + 2`` can always be returned, and ``torch._check(a == 4)`` tells it to return ``y * 5``. +# See what happens when we re-export this model. + +class Foo(torch.nn.Module): + def forward(self, x, y): + a = x.item() + torch._check(a >= 10) + torch._check(a <= 60) + if a // 2 >= 5: + return y + 2 + else: + return y * 5 + +inps = ( + torch.tensor(32), + torch.randn(4), +) +ep = export(Foo(), inps) +print(ep) + +###################################################################### +# Export succeeds, and note from the range constraints field that ``u0`` takes on a range of ``[10, 60]``. +# +# So what information do ``torch._check()`` calls actually communicate? This varies as the symbolic shapes subsystem gets smarter, but at a fundamental level, these are generally true: +# +# 1. Equality with non-data-dependent expressions: ``torch._check()`` calls that communicate equalities like ``u0 == s0 + 4`` or ``u0 == 5``. +# 2. Range refinement: calls that provide lower or upper bounds for symbols, like the above. +# 3. Some basic reasoning around more complicated expressions: inserting ``torch._check(a < 4)`` will typically tell the compiler that ``a >= 4`` is false. Checks on complex expressions like ``torch._check(a ** 2 - 3 * a <= 10)`` will typically get you past identical guards. +# +# As mentioned previously, ``torch._check()`` calls have applicability outside of data-dependent control flow. For example, here's a model where ``torch._check()`` insertion +# prevails while manual specialization & ``torch.cond()`` do not: + +class Foo(torch.nn.Module): + def forward(self, x, y): + a = x.item() + return y[a] + +inps = ( + torch.tensor(32), + torch.randn(60), +) +export(Foo(), inps) + +###################################################################### +# Here is a scenario where ``torch._check()`` insertion is required simply to prevent an operation from failing. The export call will fail with +# "Could not guard on data-dependent expression ``-u0 > 60``", implying that the compiler doesn't know if this is a valid indexing operation - +# if the value of ``x`` is out-of-bounds for ``y`` or not. Here, manual specialization is too prohibitive, and ``torch.cond()`` has no place. +# Instead, informing the compiler of ``u0``'s range is sufficient: + +class Foo(torch.nn.Module): + def forward(self, x, y): + a = x.item() + torch._check(a >= 0) + torch._check(a <= y.shape[0]) + return y[a] + +inps = ( + torch.tensor(32), + torch.randn(60), +) +ep = export(Foo(), inps) +print(ep) + +###################################################################### +# Specialized values +# ^^^^^^^^^^^^^^^^^^ +# +# Another category of data-dependent error happens when the program attempts to extract a concrete data-dependent integer/float value +# while tracing. This looks something like "Could not extract specialized integer from data-dependent expression", and is analogous to +# the previous class of errors - if these occur when attempting to evaluate concrete integer/float values, data-dependent guard errors arise +# with evaluating concrete boolean values. +# +# This error typically occurs when there is an explicit or implicit ``int()`` cast on a data-dependent expression. For example, this list comprehension +# has a `range()` call that implicitly does an ``int()`` cast on the size of the list: + +class Foo(torch.nn.Module): + def forward(self, x, y): + a = x.item() + b = torch.cat([y for y in range(a)], dim=0) + return b + int(a) + +inps = ( + torch.tensor(32), + torch.randn(60), +) +export(Foo(), inps, strict=False) + +###################################################################### +# For these errors, some basic options you have are: +# +# 1. Avoid unnecessary ``int()`` cast calls, in this case the ``int(a)`` in the return statement. +# 2. Use ``torch._check()`` calls; unfortunately all you may be able to do in this case is specialize (with ``torch._check(a == 60)``). +# 3. Rewrite the offending code at a higher level. For example, the list comprehension is semantically a ``repeat()`` op, which doesn't involve an ``int()`` cast. The following rewrite avoids data-dependent errors: + +class Foo(torch.nn.Module): + def forward(self, x, y): + a = x.item() + b = y.unsqueeze(0).repeat(a, 1) + return b + a + +inps = ( + torch.tensor(32), + torch.randn(60), +) +ep = export(Foo(), inps, strict=False) +print(ep) + +###################################################################### +# Data-dependent errors can be much more involved, and there are many more options in your toolkit to deal with them: ``torch._check_is_size()``, ``guard_size_oblivious()``, or real-tensor tracing, as starters. +# For more in-depth guides, please refer to the `Export Programming Model `_, +# or `Dealing with GuardOnDataDependentSymNode errors `_. + ###################################################################### # Custom Ops # ---------- From 4b1d9fb029c11fe51ed31f5ee72c9e12bbfa153d Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Thu, 23 Jan 2025 20:30:39 -0500 Subject: [PATCH 09/10] Update FSDP_tutorial.rst (#3252) Link no longer exists so giving credit to creator instead --- intermediate_source/FSDP_tutorial.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate_source/FSDP_tutorial.rst b/intermediate_source/FSDP_tutorial.rst index ce10488986..8e5217c64a 100644 --- a/intermediate_source/FSDP_tutorial.rst +++ b/intermediate_source/FSDP_tutorial.rst @@ -11,7 +11,7 @@ It also comes with considerable engineering complexity to handle the training of `PyTorch FSDP `__, released in PyTorch 1.11 makes this easier. In this tutorial, we show how to use `FSDP APIs `__, for simple MNIST models that can be extended to other larger models such as `HuggingFace BERT models `__, -`GPT 3 models up to 1T parameters `__ . The sample DDP MNIST code has been borrowed from `here `__. +`GPT 3 models up to 1T parameters `__ . The sample DDP MNIST code courtesy of `Patrick Hu `_. How FSDP works From 2a30921062c976c57e7f71db02524de75e898872 Mon Sep 17 00:00:00 2001 From: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com> Date: Fri, 24 Jan 2025 10:08:05 -0500 Subject: [PATCH 10/10] Update fx_graph_mode_ptq_static.rst (#3255) Update to utility link --- prototype_source/fx_graph_mode_ptq_static.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/prototype_source/fx_graph_mode_ptq_static.rst b/prototype_source/fx_graph_mode_ptq_static.rst index 0c4f8065e3..da16d04dbc 100644 --- a/prototype_source/fx_graph_mode_ptq_static.rst +++ b/prototype_source/fx_graph_mode_ptq_static.rst @@ -253,7 +253,7 @@ of the observers for activation and weight. ``QConfigMapping`` contains mapping Utility functions related to ``qconfig`` can be found in the `qconfig `_ file -while those for ``QConfigMapping`` can be found in the `qconfig_mapping ` +while those for ``QConfigMapping`` can be found in the `qconfig_mapping ` .. code:: python