Skip to content
This repository was archived by the owner on Feb 24, 2026. It is now read-only.

[TL] Append Macro Test Case for GEMM and Dequant GEMM#190

Merged
LeiWang1999 merged 63 commits intomicrosoft:mainfrom
LeiWang1999:tl-layout
Sep 26, 2024
Merged

[TL] Append Macro Test Case for GEMM and Dequant GEMM#190
LeiWang1999 merged 63 commits intomicrosoft:mainfrom
LeiWang1999:tl-layout

Conversation

@LeiWang1999
Copy link
Contributor

@LeiWang1999 LeiWang1999 commented Sep 26, 2024

Introduce test cases for TL.

TODO Items

  • Pure TL GEMM with Swizzling
  • TL GEMM with Ladder Transformation
  • TL GEMM with Block Reduction
  • TL GEMM with Dequant

@LeiWang1999
Copy link
Contributor Author

The syntax:

# Perform Matrix Multiplication
ptx_macro_generator.MMA(ptx_macro_generator, A_local, B_local, C_local)

is a bit ugly, maybe we can find a way to avoid duplicating the ptx_macro_generator.

@LeiWang1999
Copy link
Contributor Author

optimize to:

  for ki in T.serial(0, (block_K // micro_size_k)):

      # Load A into fragment
      mma_emitter.ldmatrix_a(
          A_local,
          A_shared,
          ki,
          thread_bindings=thread_bindings,
      )

      # Load B into fragment
      mma_emitter.ldmatrix_b(
          B_local,
          B_shared,
          ki,
          thread_bindings=thread_bindings,
      )

      # Perform Matrix Multiplication
      mma_emitter.mma(A_local, B_local, C_local)

# Perform STMatrix
mma_emitter.stmatrix(
  C_local,
  C_shared,
  thread_bindings=thread_bindings,
)

@LeiWang1999 LeiWang1999 merged commit 150815b into microsoft:main Sep 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant