Migrate Avro reader to arrow-avro and remove internal conversion code#17861
Migrate Avro reader to arrow-avro and remove internal conversion code#17861getChan wants to merge 62 commits intoapache:mainfrom
Conversation
|
❤️ amazing! Thank you @getChan |
|
Hi @getChan -- I am preparing to make an arrow release -- have you hit any blockers while integrating the new arrow-avro crate into DataFusion? |
No, not yet. Thanks for release. |
|
Thanks for jumping on this @getChan; let me know if I can help! |
# Conflicts: # Cargo.lock # Cargo.toml # datafusion/common/Cargo.toml
|
FYI I merged the arrow 57 upgrade to DataFusion -- so if you rebase this PR against main you'll have access to the new arrow-avro crate |
# Conflicts: # Cargo.lock
|
Hello. @alamb. |
This is great news @getChan -- thank you! Very exciting. What API change are you referring to? If you mean API changes for any users of DataFusion, then I think the best thing to do is the same as for any breaking public API change. It should documented in the upgrading.md guide: |
|
Status update:
|
|
Here is a PR that updates DataFusion to use arrow 58: |
443830c to
b1351b6
Compare
|
I am just waiting for someone to approve #19728 and we'll be good to go |
# Conflicts: # Cargo.lock # Cargo.toml # datafusion/core/src/datasource/file_format/avro.rs
|
Now ready for review. Are these breaking changes acceptable for the |
Which issue does this PR close?
arrow-avrofor performance and improved type support #14097Rationale for this change
DataFusion previously maintained custom Avro-to-Arrow conversion logic.
This PR migrates Avro reading to
arrow-avroto align behavior with upstream Arrow and remove duplicated implementation.What changes are included in this PR?
arrow-avro(ReaderBuilder)arrow-avroand removed priorapache-avrodependency usage in affected pathsarrow-avroprojection supportAre these changes tested?
Yes.
datafusion/datasource-avro(including projection and timestamp logical types)datafusion/sqllogictest/test_files/avro.sltAre there any user-facing changes?
Yes.
DataFusionError::AvroErroris removed.From<apache_avro::Error> for DataFusionErroris removed.datafusion::apache_avrotodatafusion::arrow_avro.datafusioncrateavrofeature no longer enablesdatafusion-common/avrodatafusion-protocrateavrofeature no longer enablesdatafusion-common/avroarrow-avrosemantics, including:stringvalues being read as ArrowBinaryin this pathtimestamp-*logical types read as UTC timezone-aware timestamps (Timestamp(..., Some("+00:00")))local-timestamp-*remaining timezone-naive (Timestamp(..., None))Upgrade notes are documented in:
docs/source/library-user-guide/upgrading/53.0.0.md