Skip to content

Comments

Fix #1638: Faster returning of results#1647

Merged
mdboom merged 2 commits intoNVIDIA:mainfrom
mdboom:fast-result
Feb 18, 2026
Merged

Fix #1638: Faster returning of results#1647
mdboom merged 2 commits intoNVIDIA:mainfrom
mdboom:fast-result

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Feb 18, 2026

See #1638 for details as to why this works.

On my machine with the #659 benchmark, I see a reduction from 3.02us to 2.54us per iteration.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 18, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom mdboom self-assigned this Feb 18, 2026
@mdboom
Copy link
Contributor Author

mdboom commented Feb 18, 2026

/ok to test

@github-actions

This comment has been minimized.

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit!

@mdboom mdboom added cuda.bindings Everything related to the cuda.bindings module performance labels Feb 18, 2026
@mdboom mdboom merged commit 7a73c60 into NVIDIA:main Feb 18, 2026
93 checks passed
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang
Copy link
Member

leofang commented Feb 18, 2026

@mdboom sorry for noticing this late, but I am confused by the diff -- it seems like this PR just fixes a recently-introduced (I think) regression? We've fixed this enum issue long time ago by a fast dict lookup (#546), but that fix seems to be gone (and replaced by a presumably even-faster memoization in this PR).

@leofang
Copy link
Member

leofang commented Feb 19, 2026

it seems like this PR just fixes a recently-introduced (I think) regression?

Ah, I see. The "regression" was introduced in the fast enum refactoring (#1581). So with this PR it means regardless of how fast the enum implementation (builtin or custom) is the memoization is always needed... 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants