GH-25025: [C++] Move non core compute kernels into separate shared library#46261
GH-25025: [C++] Move non core compute kernels into separate shared library#46261raulcd merged 53 commits intoapache:mainfrom
Conversation
|
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
e8fc17f to
f21df32
Compare
|
It seems I am down to one job failure to the $ g++ -shared -s -static-libgcc -o arrow.dll tmp.def RTasks.o altrep.o array.o array_to_vector.o arraydata.o arrowExports.o bridge.o buffer.o chunkedarray.o compression.o compute-exec.o compute.o config.o csv.o dataset.o datatype.o expression.o extension-impl.o feather.o field.o filesystem.o io.o json.o memorypool.o message.o parquet.o r_to_arrow.o recordbatch.o recordbatchreader.o recordbatchwriter.o safe-call-into-r-impl.o scalar.o schema.o symbols.o table.o threadpool.o type_infer.o -L../windows/arrow-20.0.0.9000/lib-14.2.0/x64 -L../windows/arrow-20.0.0.9000/lib/x64-ucrt -larrow_dataset -larrow_acero -lparquet -larrow_compute -larrow -larrow_bundled_dependencies -lutf8proc -lsnappy -lz -lzstd -llz4 -lbz2 -lbrotlienc -lbrotlidec -lbrotlicommon -lole32 -lbcrypt -lpsapi -lcrypto -lcrypt32 -lre2 -luserenv -lversion -lws2_32 -lbcrypt -lwininet -lwinhttp -lsecur32 -lshlwapi -lncrypt -lcurl -lnormaliz -lssh2 -lgdi32 -lssl -lcrypto -lcrypt32 -lwldap32 -lz -lws2_32 -lnghttp2 -ldbghelp -LC:/rtools45/x86_64-w64-mingw32
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_dataset.a(partition.cc.obj):(.text+0x7d9a): undefined reference to `__imp__ZN5arrow7compute7Grouper4MakeERKSt6vectorINS_10TypeHolderESaIS3_EEPNS0_11ExecContextE'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_dataset.a(partition.cc.obj):(.text+0x7f16): undefined reference to `__imp__ZN5arrow7compute7Grouper13MakeGroupingsERKNS_12NumericArrayINS_10UInt32TypeEEEjPNS0_11ExecContextE'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text+0x1f2e): undefined reference to `__imp__ZN5arrow4util15TempVectorStack5allocEjPPhPi'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text+0x1fb5): undefined reference to `__imp__ZN5arrow7compute9Hashing329HashBatchERKNS0_9ExecBatchEPjRSt6vectorINS0_14KeyColumnArrayESaIS7_EExPNS_4util15TempVectorStackExx'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text+0x20b5): undefined reference to `__imp__ZN5arrow4util15TempVectorStack7releaseEij'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text+0x2256): undefined reference to `__imp__ZN5arrow4util15TempVectorStack7releaseEij'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text+0x7c05): undefined reference to `__imp__ZN5arrow4util15TempVectorStack4InitEPNS_10MemoryPoolEx'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text+0x7c6f): undefined reference to `__imp__ZN5arrow4util15TempVectorStackD1Ev'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text$_ZN5arrow5acero12HashJoinNodeD1Ev[_ZN5arrow5acero12HashJoinNodeD1Ev]+0x2f): undefined reference to `__imp__ZN5arrow4util15TempVectorStackD1Ev'
C:\rtools45\x86_64-w64-mingw32.static.posix\bin/ld.exe: ../windows/arrow-20.0.0.9000/lib/x64-ucrt/libarrow_acero.a(hash_join_node.cc.obj):(.text$_ZN5arrow5acero26BloomFilterPushdownContext17FilterSingleBatchEyPNS_7compute9ExecBatchE[_ZN5arrow5acero26BloomFilterPushdownContext17FilterSingleBatchEyPNS_7compute9ExecBatchE]+0x64f): undefined reference to `__imp__ZN5arrow7compute9Hashing329HashBatchERKNS0_9ExecBatchEPjRSt6vectorINS0_14KeyColumnArrayESaIS7_EExPNS_4util15TempVectorStackExx'Still other improvements to be done but I wanted to have a clean CI PR to move to the cleaning and improvement stage. |
|
Acero and Dataset static libraries weren't built with Could you add the following if(ARROW_BUILD_STATIC AND WIN32)
target_compile_definitions(arrow_compute_static PUBLIC ARROW_COMPUTE_STATIC)
endif()like we did for Flight? arrow/cpp/src/arrow/flight/CMakeLists.txt Lines 222 to 224 in baf97fd |
|
@github-actions crossbow submit -g nightly-tests |
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit -g nightly-tests |
This comment was marked as outdated.
This comment was marked as outdated.
* Revert manual edit of arrowExports.cpp * Re-do R initialization change
|
@github-actions crossbow submit -g cpp |
|
Revision: 9001ec2 Submitted crossbow builds: ursacomputing/crossbow @ actions-74be4cb1e9 |
pitrou
left a comment
There was a problem hiding this comment.
LGTM in general, I posted two small suggestions.
Co-authored-by: Rossi Sun <zanmato1984@gmail.com>
zanmato1984
left a comment
There was a problem hiding this comment.
+1.
Excellent work @raulcd !
…f ARROW_COMPUTE is enabled
rok
left a comment
There was a problem hiding this comment.
I was only able to do a quick pass. Looks great, love the ARROW_COMPUTE_EXPORT macro.
jonkeane
left a comment
There was a problem hiding this comment.
Thank you for all of this, it's been a long journey!
|
Thanks everyone for taking the time to review. If no more concerns are raised, it has already been approved by 4 committers, I plan to merge in a couple days. |
|
Thanks everyone, I am merging! |
|
After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 6a5db61. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
Rationale for this change
Arrow is quite a heavy dependency and some users don't need all the tools we provide bundled with libarrow. Moving Arrow Compute to its own shared library allows users installations that better suit their needs having smaller memory footprint if necessary.
It might also help some users adding new kernels into an existing Arrow without recompiling it.
What changes are included in this PR?
ArrowComputeshared library (libarrow_compute.so).FunctionRegistryarrow::compute::Initialize()Are these changes tested?
Yes on all CI jobs.
Are there any user-facing changes?
Yes. The Arrow compute functions will be provided as a different library. Any user using Arrow Compute from C++ directly will require a call to
arrow::compute::Initialize()in order for the functions and kernels to be registeredThis PR includes breaking changes to public APIs.