Skip to content

GH-29847: [C++] Build with Azure SDK for C++#36835

Merged
kou merged 93 commits intoapache:mainfrom
Tom-Newton:tomnewton/build_azure_sdk
Aug 30, 2023
Merged

GH-29847: [C++] Build with Azure SDK for C++#36835
kou merged 93 commits intoapache:mainfrom
Tom-Newton:tomnewton/build_azure_sdk

Conversation

@Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Jul 24, 2023

Rationale for this change

We want to use the Azure SDK for C++ to read/write to Azure blob storage. Obviously this is pretty important for building an AzureFileSystem.

What changes are included in this PR?

Builds the the relevant parts of the azure SDK as a cmake external project. Adds a couple of simple tests that just assert that the Azure SDK is working and a couple of lines in AzureFileSystem to initialise the blob storage client to ensure the build is working correctly in all environments.

I started with the build setup from #12914 but I did make few changes.

  1. Although its atypical for this project we chose to switch from cmake's ExternalProject to FetchContent. FetchContent is recomended by the Azure docs https://github.com/Azure/azure-sdk-for-cpp#cmake-project--fetch-content. It also solves a few problems including: automatically linking system curl and ssl instead of bootstrapping vcpkg and installing curl and ssl from there.
  2. Only build one version of the Azure SDK for C++ because it contains all the components. Previously we were unnecessarily building 5 different versions of the whole thing on top of each other. This created race conditions for which version each component came from.
  3. We are using azure-core_1.10.2 which is a very recent version. There are a couple of important reasons for this 1. an important managed identity fix, 2. fixed support for curl versions < 7.71.0.

There will be follow up PRs to enable Azure in the manylinux builds. We need to update vcpkg first so we can get a version of the Azure SDK which contains an important managed identity fix.

Are these changes tested?

Yes. There is a simple test that just runs the Azure client against azurite. Additionally just initialising the client in AzureFileSystem goes a long way towards ensuring the build is working.

Are there any user-facing changes?

No

@github-actions
Copy link

⚠️ GitHub issue #29847 has been automatically assigned in GitHub to PR creator.

@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 4e3e7a5 to d0f5b65 Compare July 24, 2023 07:43
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from d0f5b65 to 8915352 Compare August 1, 2023 22:36
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 8915352 to 9d14edd Compare August 7, 2023 08:08
kou added a commit that referenced this pull request Aug 7, 2023
…C++ filesystem (#36988)

### Rationale for this change
We need to write tests for #18014. azurite is like a fake Azure blob storage so it can be used to write integration tests

### What changes are included in this PR?
Extract the `azurite` related changes from #12914 to create a smaller PR that's easier to review. I have made very minimal changes compared to that PR. 

Currently `azurite` is configured for all the environments where `ARROW_AZURE` was enabled by #35701. I assume its deliberate that its not enabled yet for windows, alpine, conda, debian or fedora builds. 

### Are these changes tested?
Its tested by there aren't really any good tests in this PR. I used this `azurite` config in #36835 to make an integration test that uses the Azure C++ SDK. On its own we can't really write tests for this `azurite` setup PR. 

### Are there any user-facing changes?
No

* Closes: #36886

Lead-authored-by: Thomas Newton <thomas.w.newton@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 9d14edd to 6ec7910 Compare August 8, 2023 07:39
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 12, 2023
@Tom-Newton Tom-Newton changed the title WIP GH-29847: [C++] Build with azure C++ sdk GH-29847: [C++] Build with azure C++ sdk Aug 12, 2023
@Tom-Newton Tom-Newton force-pushed the tomnewton/build_azure_sdk branch from 8aa551a to 2560caf Compare August 13, 2023 12:06
@Tom-Newton Tom-Newton marked this pull request as ready for review August 13, 2023 12:07
@kou kou changed the title GH-29847: [C++] Build with azure C++ sdk GH-29847: [C++] Build with Azure SDK for C++ Aug 14, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Aug 14, 2023
@pitrou
Copy link
Member

pitrou commented Aug 24, 2023

I'm trying to build with gcc 12.3.0 and I get the following error:

In file included from /build/build-test/_deps/azure_sdk-src/sdk/identity/azure-identity/src/environment_credential.cpp:6:
/build/build-test/_deps/azure_sdk-src/sdk/identity/azure-identity/inc/azure/identity/client_certificate_credential.hpp:68:9: error: 'Azure::Identity::ClientCertificateCredential' declared with greater visibility than the type of its field 'Azure::Identity::ClientCertificateCredential::m_pkey' [-Werror=attributes]
   68 |   class ClientCertificateCredential final : public Core::Credentials::TokenCredential {
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

It seems like we're passing -Wall -Werror when building bundled dependencies? It makes us heavily dependent on maintenance policies of third-party projects.

@pitrou
Copy link
Member

pitrou commented Aug 24, 2023

We normally use the EP_CXX_FLAGS cmake variable when compiling bundled dependencies, but it seems that isn't forwarded by the FetchContent-based directives?

@assignUser
Copy link
Member

It seems like we're passing -Wall -Werror when building bundled dependencies? It makes us heavily dependent on maintenance policies of third-party projects.

I agree dependencies should not be build with -Werror etc. This is an issue with the use of the flag variables directly. As we can now use cmake 3.16 we should move to target based properties vs global flags but that is of course a major refactor...

@Tom-Newton the use of external project within arrow is 'historic' as we just recently increased our minimum cmake version enough to make use of fc. Eventually it would be great to move everything to fc so adding new deps with fc instead of ep is in my eyes encouraged!

@assignUser
Copy link
Member

We normally use the EP_CXX_FLAGS cmake variable when compiling bundled dependencies, but it seems that isn't forwarded by the FetchContent-based directives?

Contrary to external project things added via fetchcontent are configured with the parent project and inherit variables and flags as if using add_subdirectory. So it likely is not necessary but I haven't looked at our flags script recently so 🤷

Tom-Newton and others added 2 commits August 24, 2023 22:11
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 25, 2023
@Tom-Newton
Copy link
Contributor Author

Tom-Newton commented Aug 25, 2023

I agree dependencies should not be build with -Werror etc. This is an issue with the use of the flag variables directly. As we can now use cmake 3.16 we should move to target based properties vs global flags but that is of course a major refactor...

So what do we think is the best option today? Is disabling -Werror at a global level an acceptable solution? Being able to disable -Werror would also be helpful for supporting Ubuntu 20 #36835 (comment)

@kou
Copy link
Member

kou commented Aug 26, 2023

I'll provide a patch for -Werror later. Please wait for a few days...

@kou
Copy link
Member

kou commented Aug 26, 2023

This is an ad-hoc patch but this will work. We need a real improvement later.

diff --git a/cpp/cmake_modules/ThirdpartyToolchain.cmake b/cpp/cmake_modules/ThirdpartyToolchain.cmake
index 1dfaf71b4..9ff91f978 100644
--- a/cpp/cmake_modules/ThirdpartyToolchain.cmake
+++ b/cpp/cmake_modules/ThirdpartyToolchain.cmake
@@ -5082,6 +5082,13 @@ function(build_azure_sdk)
   set(CMAKE_EXPORT_NO_PACKAGE_REGISTRY TRUE)
   set(DISABLE_AZURE_CORE_OPENTELEMETRY TRUE)
   set(ENV{AZURE_SDK_DISABLE_AUTO_VCPKG} TRUE)
+  if(MSVC)
+    string(REPLACE "/WX" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
+    string(REPLACE "/WX" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
+  else()
+    string(REPLACE "-Werror" "" CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG}")
+    string(REPLACE "-Werror" "" CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG}")
+  endif()
   fetchcontent_makeavailable(azure_sdk)
   set(AZURE_SDK_VENDORED
       TRUE

@github-actions github-actions bot removed the awaiting changes Awaiting changes label Aug 26, 2023
@kou
Copy link
Member

kou commented Aug 28, 2023

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: e3fb84c

Submitted crossbow builds: ursacomputing/crossbow @ actions-ea66ccd89b

Task Status
test-alpine-linux-cpp Github Actions
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-cuda-cpp Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-minimal-with-formats Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions
test-ubuntu-22.04-cpp-20 Github Actions
test-ubuntu-22.04-cpp-no-threading Github Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I'll merge this tomorrow if nobody objects it.

@Tom-Newton
Copy link
Contributor Author

Thanks for your help on this @kou. I feel kind of bad about how much work I created for you during review, with my lack of C++ experience. Hopefully the native Azure support is worth it 🙂.

@felipecrv
Copy link
Contributor

Thank you for this PR @Tom-Newton!

@kou
Copy link
Member

kou commented Aug 31, 2023

Don't worry. :-)
I'm happy that we have more contributors like you!

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 3cb4481.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Add azure-sdk-for-cpp to ThirdpartyToolchain

6 participants