Skip to content

Conversation

@timmoon10
Copy link
Collaborator

Description

This PR adds a basic usage guide for the op fuser and includes it in the autogenerated API docs.

It is ready as-is, but if reviews take a while I may expand it with a guide on creating custom fused ops.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Add basic usage guide for op fuser
  • Include TE ops in autogenerated API docs
  • Debug TE ops docstrings

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 added the documentation Improvements or additions to documentation label Dec 3, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 3, 2025

Greptile Overview

Greptile Summary

Added comprehensive documentation for the operation fuser API including a detailed usage guide with code examples, diagrams illustrating operation fusion patterns, and improved docstring formatting across Python modules.

  • Created new docs/examples/op_fuser/op_fuser.rst with usage examples for basic operations, quantization, and branching operations
  • Added three PNG diagrams showing LayerNorm+MLP, FP8 LayerNorm+Linear, and residual LayerNorm+MLP fusion patterns
  • Included operation fuser classes in autogenerated API docs (docs/api/pytorch.rst)
  • Fixed docstring formatting across multiple Python modules to use proper reStructuredText syntax (double backticks for code, proper hyperlink spacing)
  • Corrected .gitignore pattern for .DS_Store files

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it only adds documentation and fixes formatting
  • Score reflects that this is purely a documentation PR with no functional code changes. All previously identified logic issues with Linear argument ordering have been fixed. Only minor typos remain (spacing and spelling), which are non-critical.
  • Pay attention to transformer_engine/pytorch/ops/basic/activation.py to fix the two minor syntax issues (extra space and typo)

Important Files Changed

Filename Overview
docs/examples/op_fuser/op_fuser.rst New comprehensive documentation for op fuser API with usage examples and implementation details. Previously identified Linear argument issues have been corrected.
docs/api/pytorch.rst Added autogenerated API documentation entries for operation fuser classes and operations.
transformer_engine/pytorch/ops/basic/activation.py Fixed docstring formatting for hyperlinks, improved ClampedSwiGLU warning formatting. Minor typo and spacing issues found.
transformer_engine/pytorch/ops/basic/basic_linear.py Improved docstring formatting by fixing backtick usage for code references and standardizing None/True/False formatting.
transformer_engine/pytorch/ops/linear.py Updated docstrings to use proper reStructuredText formatting with double backticks for code references.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Docs as Documentation System
    participant API as API Docs Generator
    participant Code as Python Modules

    Dev->>Docs: Create op_fuser.rst guide
    Note over Docs: Basic usage examples<br/>Quantization guide<br/>Branching operations<br/>Implementation details
    
    Dev->>Docs: Add example diagrams
    Note over Docs: layernorm_mlp.png<br/>fp8_layernorm_linear.png<br/>residual_layernorm_mlp.png
    
    Dev->>Code: Update docstrings
    Note over Code: Fix backtick formatting<br/>Fix hyperlink spacing<br/>Standardize None/True/False
    
    Code->>Code: activation.py
    Code->>Code: basic_linear.py
    Code->>Code: linear.py
    Code->>Code: op.py
    Code->>Code: other ops modules
    
    Dev->>API: Add op fuser classes
    Note over API: Sequential<br/>FusibleOperation<br/>Linear<br/>All operation classes
    
    API->>Docs: Generate API reference
    
    Dev->>Docs: Update index.rst
    Docs->>Docs: Include op_fuser guide in TOC
    
    Note over Dev,Docs: Complete documentation<br/>for op fuser API
Loading

greptile-apps[bot]

This comment was marked as resolved.

timmoon10 and others added 2 commits December 2, 2025 22:03
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Review suggestion from @greptile-apps

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
@timmoon10

This comment was marked as outdated.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as outdated.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. transformer_engine/pytorch/ops/basic/activation.py, line 387 (link)

    syntax: Extra space before period.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

19 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@pggPL pggPL self-requested a review December 17, 2025 12:47
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
@timmoon10
Copy link
Collaborator Author

/te-ci core pytorch

greptile-apps[bot]

This comment was marked as resolved.

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as outdated.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Comment on lines +40 to +54
At the most basic level, the operation fuser API involves two classes
in the ``transformer_engine.pytorch.ops`` submodule:

- ``FusibleOperation``: An abstract base class for tensor operations.
Examples include ``Linear``, ``LayerNorm``, and ``AllReduce``. It is
a subclass of ``torch.nn.Module``, so it can hold trainable
parameters and can be called to perform the operation's forward
pass.
- ``Sequential``: A container of modules in sequential order. It has a
very similar interface as ``torch.nn.Sequential``. If it contains
any ``FusibleOperation`` s, then it may attempt to fuse them in the
forward and backward passes.

Thus, using the operation fuser simply involves constructing
``FusibleOperation`` s and passing them into a ``Sequential``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is the intended audience of this documentation? On one hand it seems it is the user (since you show examples of how things could be written), on the other you also include the details of the implementation.

Comment on lines +151 to +153
This is an expert technique. Quantizer configurations can be quite
complicated, so the ``Quantize`` operation's quantizers may be
suboptimal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that means - any examples?

Copy link
Collaborator Author

@timmoon10 timmoon10 Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MXFP8, it's not safe for the quantize op to produce a MXFP8Tensor with swizzled scales. There's no way to know if it will consumed by a GEMM or by something else.

the block has been split into two sections, each with one branching
operation.

Implementation details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this file should be split into 2 (maybe 3) separate sections - one primarily user facing with the sections describing how to use sequential, maybe second one showing how to define your own fusion with a user-provided kernel, and then the third one showing those internal implementation details.

Comment on lines +246 to +251
- **The op fuser is not interchangeable with the monolithic TE
modules**: Modules like ``Linear``, ``LayerNormLinear``, and
``TransformerLayer`` support a wide range of features and advanced
workflows, which makes them challenging to decompose into simple
operations that work with the fuser. They are also carefully
hand-tuned to achieve maximum performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to get to the point where the sequential is the default, right? So while right now this is true, it may not be in the future.

@timmoon10 timmoon10 removed the 2.12.0 label Jan 25, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

`GLU Variants Improve Transformer<https://arxiv.org/abs/2002.05202>`__
and `Gaussian Error Linear Units (GELUs)<https://arxiv.org/abs/1606.08415>`__.
`GLU Variants Improve Transformer <https://arxiv.org/abs/2002.05202>`__
and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__ .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space before period

Suggested change
and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__ .
and `Gaussian Error Linear Units (GELUs) <https://arxiv.org/abs/1606.08415>`__.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

.. warning::
The input tensor is chunked along the last dimension to get
gates/pre-activations which is differnt from GPT OSS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "differnt" should be "different"

Suggested change
gates/pre-activations which is differnt from GPT OSS
gates/pre-activations which is different from GPT OSS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants