-
Notifications
You must be signed in to change notification settings - Fork 815
feat: Add SVE kernels for TopKV #1256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| ARM_COMPUTE_ERROR_ON(c >= C); | ||
|
|
||
| uint32_t t = 0; | ||
| std::memcpy(&t, tgt_it.ptr(), sizeof(uint32_t)); // target sample idx for this class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memcpy seems to be a complicated way to read data from this pointer. why not reinterpret cast & de-reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in next patch
filelist.json
Outdated
| "src/cpu/kernels/topkv/generic/neon/qasymm8_signed.cpp" | ||
| ] | ||
| } | ||
| "files": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the indentation change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in next patch
| while (idx < N) | ||
| { | ||
| const uint32_t remaining = N - idx; | ||
| const uint32_t bw = block_width<Scalar>(remaining); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function gets the SVE vector length for each element in the tensor and compares it with remaining. I think we should get it once before the execute_window_loop and reuse. The svcnt in the wrapper namespace can help with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in next patch
| template <> | ||
| inline uint32_t count_gt_block<int32_t>(const int32_t *ptr, int32_t thr, uint32_t remaining) | ||
| { | ||
| const uint32_t bw = block_width<int32_t>(remaining); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to do this here again. The parameter remaining is already the min(vector_len, remaining-in-the-caller). We're taking another min on top of this result, i.e. if I'm not mistaken, we're doing smth like
min(min(x,y)) = min(x,y)
Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
1deb127 to
7944860
Compare
Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799