Skip to content
This repository was archived by the owner on Feb 24, 2026. It is now read-only.

[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling#125

Merged
LeiWang1999 merged 134 commits intomicrosoft:mainfrom
LeiWang1999:dev
Aug 5, 2024
Merged

[Fix][Dev] Typo fix for our workflow and enhance lop3 decode to support scaling#125
LeiWang1999 merged 134 commits intomicrosoft:mainfrom
LeiWang1999:dev

Conversation

@LeiWang1999
Copy link
Contributor

This pull request primarily focuses on enhancing the GPU intrinsic functions and updating the workflow configuration. The key changes include adding new decoding functions with scaling and offset capabilities, modifying the workflow configuration, and updating submodule references.

Enhancements to GPU Intrinsic Functions:

  • Added new decoding functions with scaling and offset capabilities in bitblas/gpu/intrin/lop3.py. These functions include decode_i4_to_f16_scale_offset, decode_i4_to_f16_scale_zeros_original_offset, decode_i4_to_f16_scale_zeros_rescale_offset, and decode_i2_to_f16_scale_zeros_original_offset. [1] [2] [3] [4]
  • Introduced get_func_arguments helper function to streamline the arguments passed to external functions.
  • Updated the fast_decode_impl function to use the new helper function and added offset factors for buffers. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Workflow Configuration:

  • Changed depends-on to needs in the .github/workflows/benchmark.yml file to improve workflow dependencies.

Submodule Update:

  • Updated the submodule reference for 3rdparty/tvm to a new commit.

@LeiWang1999 LeiWang1999 merged commit fa0f7b1 into microsoft:main Aug 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant