[PyTorch] Documentation for op fuser API #2447

timmoon10 · 2025-12-03T05:57:58Z

Description

This PR adds a basic usage guide for the op fuser and includes it in the autogenerated API docs.

It is ready as-is, but if reviews take a while I may expand it with a guide on creating custom fused ops.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Add basic usage guide for op fuser
Include TE ops in autogenerated API docs
Debug TE ops docstrings

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

greptile-apps · 2025-12-03T06:00:44Z

Greptile Overview

Greptile Summary

Added comprehensive documentation for the operation fuser API including a detailed usage guide with code examples, diagrams illustrating operation fusion patterns, and improved docstring formatting across Python modules.

Created new docs/examples/op_fuser/op_fuser.rst with usage examples for basic operations, quantization, and branching operations
Added three PNG diagrams showing LayerNorm+MLP, FP8 LayerNorm+Linear, and residual LayerNorm+MLP fusion patterns
Included operation fuser classes in autogenerated API docs (docs/api/pytorch.rst)
Fixed docstring formatting across multiple Python modules to use proper reStructuredText syntax (double backticks for code, proper hyperlink spacing)
Corrected .gitignore pattern for .DS_Store files

Confidence Score: 5/5

This PR is safe to merge with minimal risk - it only adds documentation and fixes formatting
Score reflects that this is purely a documentation PR with no functional code changes. All previously identified logic issues with Linear argument ordering have been fixed. Only minor typos remain (spacing and spelling), which are non-critical.
Pay attention to transformer_engine/pytorch/ops/basic/activation.py to fix the two minor syntax issues (extra space and typo)

Important Files Changed

Filename	Overview
docs/examples/op_fuser/op_fuser.rst	New comprehensive documentation for op fuser API with usage examples and implementation details. Previously identified Linear argument issues have been corrected.
docs/api/pytorch.rst	Added autogenerated API documentation entries for operation fuser classes and operations.
transformer_engine/pytorch/ops/basic/activation.py	Fixed docstring formatting for hyperlinks, improved ClampedSwiGLU warning formatting. Minor typo and spacing issues found.
transformer_engine/pytorch/ops/basic/basic_linear.py	Improved docstring formatting by fixing backtick usage for code references and standardizing None/True/False formatting.
transformer_engine/pytorch/ops/linear.py	Updated docstrings to use proper reStructuredText formatting with double backticks for code references.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Docs as Documentation System
    participant API as API Docs Generator
    participant Code as Python Modules

    Dev->>Docs: Create op_fuser.rst guide
    Note over Docs: Basic usage examples<br/>Quantization guide<br/>Branching operations<br/>Implementation details
    
    Dev->>Docs: Add example diagrams
    Note over Docs: layernorm_mlp.png<br/>fp8_layernorm_linear.png<br/>residual_layernorm_mlp.png
    
    Dev->>Code: Update docstrings
    Note over Code: Fix backtick formatting<br/>Fix hyperlink spacing<br/>Standardize None/True/False
    
    Code->>Code: activation.py
    Code->>Code: basic_linear.py
    Code->>Code: linear.py
    Code->>Code: op.py
    Code->>Code: other ops modules
    
    Dev->>API: Add op fuser classes
    Note over API: Sequential<br/>FusibleOperation<br/>Linear<br/>All operation classes
    
    API->>Docs: Generate API reference
    
    Dev->>Docs: Update index.rst
    Docs->>Docs: Include op_fuser guide in TOC
    
    Note over Dev,Docs: Complete documentation<br/>for op fuser API

Signed-off-by: Tim Moon <tmoon@nvidia.com>

@greptile-apps

Review suggestion from @greptile-apps Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

docs/examples/op_fuser/op_fuser.rst

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

greptile-apps

Additional Comments (1)

transformer_engine/pytorch/ops/basic/activation.py, line 387 (link)

syntax: Extra space before period.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

_{19 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 · 2025-12-22T22:26:36Z

/te-ci core pytorch

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

docs/examples/op_fuser/op_fuser.rst

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ptrendx · 2026-01-14T19:41:03Z

docs/examples/op_fuser/op_fuser.rst

+At the most basic level, the operation fuser API involves two classes
+in the ``transformer_engine.pytorch.ops`` submodule:
+
+- ``FusibleOperation``: An abstract base class for tensor operations.
+  Examples include ``Linear``, ``LayerNorm``, and ``AllReduce``. It is
+  a subclass of ``torch.nn.Module``, so it can hold trainable
+  parameters and can be called to perform the operation's forward
+  pass.
+- ``Sequential``: A container of modules in sequential order. It has a
+  very similar interface as ``torch.nn.Sequential``. If it contains
+  any ``FusibleOperation`` s, then it may attempt to fuse them in the
+  forward and backward passes.
+
+Thus, using the operation fuser simply involves constructing
+``FusibleOperation`` s and passing them into a ``Sequential``.


Who is the intended audience of this documentation? On one hand it seems it is the user (since you show examples of how things could be written), on the other you also include the details of the implementation.

ptrendx · 2026-01-14T19:43:24Z

docs/examples/op_fuser/op_fuser.rst

+   This is an expert technique. Quantizer configurations can be quite
+   complicated, so the ``Quantize`` operation's quantizers may be
+   suboptimal.


Not sure what that means - any examples?

For MXFP8, it's not safe for the quantize op to produce a MXFP8Tensor with swizzled scales. There's no way to know if it will consumed by a GEMM or by something else.

ptrendx · 2026-01-14T19:47:02Z

docs/examples/op_fuser/op_fuser.rst

+   the block has been split into two sections, each with one branching
+   operation.
+
+Implementation details


Yeah, I think this file should be split into 2 (maybe 3) separate sections - one primarily user facing with the sections describing how to use sequential, maybe second one showing how to define your own fusion with a user-provided kernel, and then the third one showing those internal implementation details.

ptrendx · 2026-01-14T19:48:53Z

docs/examples/op_fuser/op_fuser.rst

+- **The op fuser is not interchangeable with the monolithic TE
+  modules**: Modules like ``Linear``, ``LayerNormLinear``, and
+  ``TransformerLayer`` support a wide range of features and advanced
+  workflows, which makes them challenging to decompose into simple
+  operations that work with the fuser. They are also carefully
+  hand-tuned to achieve maximum performance.


We would like to get to the point where the sequential is the default, right? So while right now this is true, it may not be in the future.

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-25T01:04:15Z

transformer_engine/pytorch/ops/basic/activation.py

-    `GLU Variants Improve Transformer<https://arxiv.org/abs/2002.05202>`__
-    and `Gaussian Error Linear Units (GELUs)<https://arxiv.org/abs/1606.08415>`__.
+    `GLU Variants Improve Transformer <https://arxiv.org/abs/2002.05202>`__
+    and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__ .


Extra space before period

Suggested change

and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__ .

and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-01-25T01:04:16Z

transformer_engine/pytorch/ops/basic/activation.py

+    .. warning::
+
+       The input tensor is chunked along the last dimension to get
+       gates/pre-activations which is differnt from GPT OSS


Typo: "differnt" should be "different"

Suggested change

gates/pre-activations which is differnt from GPT OSS

gates/pre-activations which is different from GPT OSS

timmoon10 added 2 commits December 2, 2025 20:50

Add documentation for operation fuser API

61e7ae1

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Include TE ops in PyTorch API docs

4ca507e

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 requested review from ksivaman, ptrendx and vthumbe1503 December 3, 2025 05:57

timmoon10 added the documentation Improvements or additions to documentation label Dec 3, 2025

This comment was marked as resolved.

Sign in to view

timmoon10 and others added 2 commits December 2, 2025 22:03

Fix error when building docs

ee35af6

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix typo

2a0111b

Review suggestion from @greptile-apps Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

timmoon10 commented Dec 3, 2025

View reviewed changes

docs/examples/op_fuser/op_fuser.rst Outdated Show resolved Hide resolved

Fix swapped args to te.ops.Linear

5102852

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into tmoon/te-ops-docs

544785c

greptile-apps bot reviewed Dec 17, 2025

View reviewed changes

pggPL self-requested a review December 17, 2025 12:47

Merge branch 'main' into tmoon/te-ops-docs

8d5063b

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

This comment was marked as resolved.

Sign in to view

timmoon10 added the 2.12.0 label Jan 8, 2026

Merge branch 'main' into tmoon/te-ops-docs

f10fa01

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 commented Jan 10, 2026

View reviewed changes

docs/examples/op_fuser/op_fuser.rst Outdated Show resolved Hide resolved

Update copyright year

d44c5fb

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

This comment was marked as outdated.

Sign in to view

greptile-apps bot reviewed Jan 10, 2026

View reviewed changes

ptrendx reviewed Jan 14, 2026

View reviewed changes

timmoon10 removed the 2.12.0 label Jan 25, 2026

Merge branch 'main' into tmoon/te-ops-docs

b047ae8

greptile-apps bot reviewed Jan 25, 2026

View reviewed changes

	and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__ .
	and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__.

	gates/pre-activations which is differnt from GPT OSS
	gates/pre-activations which is different from GPT OSS

[PyTorch] Documentation for op fuser API #2447

Are you sure you want to change the base?

[PyTorch] Documentation for op fuser API #2447

Uh oh!

Conversation

timmoon10 commented Dec 3, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as outdated.

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

timmoon10 commented Dec 22, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

ptrendx Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

timmoon10 Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Dec 3, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

timmoon10 Jan 15, 2026 •

edited

Loading