Skip to content
This repository was archived by the owner on Feb 24, 2026. It is now read-only.

[Dev] Append Efficient CUDA test for low precision batch decoding#80

Merged
LeiWang1999 merged 21 commits intomicrosoft:mainfrom
LeiWang1999:dev
Jul 8, 2024
Merged

[Dev] Append Efficient CUDA test for low precision batch decoding#80
LeiWang1999 merged 21 commits intomicrosoft:mainfrom
LeiWang1999:dev

Conversation

@LeiWang1999
Copy link
Contributor

This pull request includes changes to improve the codebase and add new functionalities. The most significant changes include adding a new notice for IST-DASLab/marlin in the THIRDPARTYNOTICES.txt file, refactoring the MatMulNTDequantizeEmitter class in the matmul_dequantize_impl.py file, and adding new test directories in the CMakeLists.txt file under the testing/cpp directory.

Addition of third party notices:

Refactoring of existing code:

Addition of new test directories:

@LeiWang1999 LeiWang1999 changed the title [Dev] Append Efficient CUDA implementation for low precision batch decoding [Dev] Append Efficient CUDA test for low precision batch decoding Jul 7, 2024
@LeiWang1999 LeiWang1999 merged commit c6f3ca5 into microsoft:main Jul 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant