[SYSTEMDS-3253] Add combined rewrite and lop instruction for union#2286
[SYSTEMDS-3253] Add combined rewrite and lop instruction for union#2286chihsinh wants to merge 8 commits intoapache:mainfrom
Conversation
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique(), are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output.
|
Thanks for the patch @chihsinh - could you please fix the missing license headers, revert the replacement of the detailed imports with a wildcard import, and benchmark this hash map list of double implementation against a hash map of sliced out matrix blocks. |
|
Btw, the rewrite test seems to fail because |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2286 +/- ##
============================================
- Coverage 72.96% 72.94% -0.02%
- Complexity 46097 46109 +12
============================================
Files 1479 1480 +1
Lines 172654 172757 +103
Branches 33796 33818 +22
============================================
+ Hits 125970 126025 +55
- Misses 37192 37241 +49
+ Partials 9492 9491 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… after benchmarking
|
LGTM - thanks for the patch @chihsinh. The code looked already pretty good. During the merge, I only removed the stdout printing from the instruction, and deduplicated the core kernels for single-column union and multi-column union a bit. |
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique() are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output. Closes apache#2286.
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique() are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output. Closes apache#2286.
This patch refines the current union operation to an internal LOP operation. Currently, two subsequent operations -- rbind() and unique(), are used to perform the union operation. We rewrite the operation with an internal LOP that uses a HashSet to compute the unique entries and returns them in a matrix. This improves the efficiency of the operation, as it avoids unique(). The order of the input entries is preserved in the output.