Skip to content

Commit

Permalink
add FAQ page
Browse files Browse the repository at this point in the history
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
  • Loading branch information
cyanguwa committed Oct 4, 2024
1 parent 77b7ab6 commit 828c4f3
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 0 deletions.
47 changes: 47 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
..
Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
See LICENSE for license information.

Frequently Asked Questions (FAQ)
================================

FP8 checkpoint compatibility
----------------------------

Transformer Engine starts to support FP8 attention in 1.6. When checkpointing, it stores the FP8 metadata, including the scaling factors and amax history, under a `._extra_state` key. As our FP8 attention support expands from one backend to multiple backends, the location of the `._extra_state` key has also shifted. We take the `MultiheadAttention` module as an example and show the checkpoint structure in different Transformer Engine versions.

.. list-table::
:widths: 15 25 50
:header-rows: 1

* - Version
- FP8 metadata
- Checkpoint compatibility (checkpoint version: loading behavior)
* - <= 1.5
- None
-
- <= 1.5: no FP8 metadata loaded (as expected)
- > 1.5: "unexpected key" error
* - 1.6, 1.7
- `core_attention.fused_attention._extra_state`
-
- <= 1.5: initialize FP8 metadata to default, i.e. 1s for scaling factors and 0s for amaxes
- 1.6, 1.7: load FP8 metadata from checkpoint
- >= 1.8: "unexpected key" error
* - >=1.8, <= 1.11
- `core_attention._extra_state`
-
- <= 1.5: initialize FP8 metadata to default, i.e. 1s for scaling factors and 0s for amaxes
- 1.6, 1.7: this checkpoint save/load version pair relies on users to map the 1.6/1.7 key to the 1.8-1.11 key; otherwise, initialize FP8 metadata to default, i.e. 1s for scaling factors and 0s for amaxes. Mapping in this example can be done by:
.. code-block:: python
>>> state_dict["core_attention._extra_state"] = \
state_dict["core_attention.fused_attention._extra_state"]
>>> del state_dict["core_attention.fused_attention._extra_state"]
- >= 1.8: load FP8 metadata from checkpoint
* - >=1.12
- `core_attention._extra_state`
-
- <= 1.5: initialize FP8 metadata to default, i.e. 1s for scaling factors and 0s for amaxes
- >= 1.6: load FP8 metadata from checkpoint
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Transformer Engine documentation

installation
examples/quickstart.ipynb
faq

.. toctree::
:hidden:
Expand Down

0 comments on commit 828c4f3

Please sign in to comment.