Skip to content

Optimize delta binary decoder in the case where bitwidth=0#9477

Draft
etseidl wants to merge 2 commits intoapache:mainfrom
etseidl:delta_binary_bit_zero
Draft

Optimize delta binary decoder in the case where bitwidth=0#9477
etseidl wants to merge 2 commits intoapache:mainfrom
etseidl:delta_binary_bit_zero

Conversation

@etseidl
Copy link
Contributor

@etseidl etseidl commented Feb 25, 2026

Which issue does this PR close?

Rationale for this change

Explore if we can achieve the speedups seen in arrow-cpp (apache/arrow#49296).

What changes are included in this PR?

Adds special cases to the delta binary packed decoder when bitwidth for a miniblock is 0. The optimization avoids relying on previous values to decode current ones.

Are these changes tested?

Yes, tests have been added, as well as new benchmarks.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Feb 25, 2026
@etseidl
Copy link
Contributor Author

etseidl commented Feb 25, 2026

Not seeing the huge improvement from arrow-cpp, but still a nice speedup, and it doesn't seem to be impacting cases where the optimization can't be used.

New benchmarks on my workstation (x86 i7-12700K) comparing main (no_opt) to this branch (opt)

group                                                                                no_opt                                 opt
-----                                                                                ------                                 ---
arrow_array_reader/INT32/Decimal128Array/binary packed increasing value              1.18     45.9±0.62µs        ? ?/sec    1.00     39.0±0.40µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed single value                  1.21     45.7±0.22µs        ? ?/sec    1.00     37.9±0.77µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed increasing value              1.18     48.2±0.50µs        ? ?/sec    1.00     40.7±0.65µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed single value                  1.19     48.1±0.19µs        ? ?/sec    1.00     40.5±0.24µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed increasing value                         1.24     38.0±1.08µs        ? ?/sec    1.00     30.7±0.18µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed single value                             1.26     37.1±0.23µs        ? ?/sec    1.00     29.4±0.12µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed increasing value                         1.23     32.6±0.46µs        ? ?/sec    1.00     26.6±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed single value                             1.28     32.5±0.19µs        ? ?/sec    1.00     25.3±0.38µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed increasing value                         1.21     35.2±0.15µs        ? ?/sec    1.00     29.0±0.12µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed single value                             1.21     35.2±0.36µs        ? ?/sec    1.00     29.1±0.21µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed increasing value                          1.20     37.5±0.25µs        ? ?/sec    1.00     31.3±0.26µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed single value                              1.27     37.9±0.59µs        ? ?/sec    1.00     30.0±0.18µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed increasing value                        1.22     37.4±0.53µs        ? ?/sec    1.00     30.6±0.10µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed single value                            1.25     37.0±0.16µs        ? ?/sec    1.00     29.6±0.19µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed increasing value                        1.24     33.6±0.18µs        ? ?/sec    1.00     27.0±0.14µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed single value                            1.30     33.5±0.18µs        ? ?/sec    1.00     25.9±0.16µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed increasing value                        1.22     35.3±0.13µs        ? ?/sec    1.00     28.9±0.20µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed single value                            1.21     35.3±0.14µs        ? ?/sec    1.00     29.3±0.39µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed increasing value                         1.19     37.4±0.25µs        ? ?/sec    1.00     31.4±0.17µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed single value                             1.23     37.4±0.19µs        ? ?/sec    1.00     30.3±0.45µs        ? ?/sec

And the rest of the binary packed benches

Details
group                                                                                no_opt                                 opt
-----                                                                                ------                                 ---
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs     1.01     50.9±0.29µs        ? ?/sec    1.00     50.3±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs    1.01     59.7±0.78µs        ? ?/sec    1.00     59.4±0.82µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs      1.00     51.8±0.22µs        ? ?/sec    1.00     51.7±0.21µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs          1.01     73.9±0.50µs        ? ?/sec    1.00     73.4±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs         1.02    102.4±2.40µs        ? ?/sec    1.00    100.8±1.01µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs           1.00     75.7±0.35µs        ? ?/sec    1.00     75.8±0.36µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs     1.02     53.4±0.57µs        ? ?/sec    1.00     52.4±0.24µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs    1.00     61.2±0.26µs        ? ?/sec    1.00     61.5±0.39µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs      1.00     53.7±0.28µs        ? ?/sec    1.00     53.8±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs          1.00     79.1±0.86µs        ? ?/sec    1.00     79.0±0.50µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs         1.05    110.0±2.47µs        ? ?/sec    1.00    105.0±1.37µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs           1.01     81.4±0.33µs        ? ?/sec    1.00     80.2±0.23µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                1.00     37.8±0.51µs        ? ?/sec    1.00     37.7±0.68µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs               1.00     48.9±0.21µs        ? ?/sec    1.03     50.2±0.53µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                 1.00     38.7±0.39µs        ? ?/sec    1.00     38.8±0.60µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                     1.01     51.4±0.52µs        ? ?/sec    1.00     50.9±0.72µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                    1.00     81.5±0.94µs        ? ?/sec    1.00     81.3±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                      1.00     53.6±0.22µs        ? ?/sec    1.00     53.6±0.40µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                1.01     38.3±0.28µs        ? ?/sec    1.00     37.8±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs               1.00     47.4±0.33µs        ? ?/sec    1.01     47.8±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                 1.00     39.1±0.18µs        ? ?/sec    1.03     40.3±1.11µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                     1.01     47.4±0.18µs        ? ?/sec    1.00     46.9±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                    1.01     78.3±1.17µs        ? ?/sec    1.00     77.7±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                      1.00     52.2±0.35µs        ? ?/sec    1.00     52.3±0.66µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                1.00     40.2±0.15µs        ? ?/sec    1.01     40.6±0.14µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs               1.00     49.7±0.26µs        ? ?/sec    1.00     49.6±0.23µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                 1.00     41.5±0.13µs        ? ?/sec    1.02     42.3±0.16µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                     1.00     54.8±0.25µs        ? ?/sec    1.02     56.0±0.22µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                    1.01     81.5±0.48µs        ? ?/sec    1.00     80.5±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                      1.00     56.9±0.31µs        ? ?/sec    1.01     57.4±0.22µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                 1.01     40.2±0.29µs        ? ?/sec    1.00     40.0±0.26µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                1.01     50.7±0.69µs        ? ?/sec    1.00     50.4±0.50µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                  1.00     41.5±0.19µs        ? ?/sec    1.00     41.4±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                      1.00     54.1±0.19µs        ? ?/sec    1.00     54.0±0.29µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                     1.00     84.3±0.32µs        ? ?/sec    1.00     84.1±1.11µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                       1.00     57.2±0.69µs        ? ?/sec    1.00     57.3±0.42µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs               1.01     44.7±0.30µs        ? ?/sec    1.00     44.4±0.51µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs              1.01     53.2±0.38µs        ? ?/sec    1.00     52.8±0.60µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                1.00     45.8±0.20µs        ? ?/sec    1.00     45.8±0.61µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                    1.01     59.9±0.27µs        ? ?/sec    1.00     59.5±0.28µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                   1.00     87.4±1.04µs        ? ?/sec    1.00     87.9±0.63µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                     1.01     62.7±0.56µs        ? ?/sec    1.00     62.2±0.51µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs               1.01     38.8±0.67µs        ? ?/sec    1.00     38.5±0.69µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs              1.00     48.2±0.20µs        ? ?/sec    1.00     48.1±0.23µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                1.00     40.1±0.44µs        ? ?/sec    1.00     40.2±0.42µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                    1.01     50.0±0.46µs        ? ?/sec    1.00     49.5±0.49µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                   1.00     77.7±0.34µs        ? ?/sec    1.01     78.1±0.24µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                     1.01     52.6±0.34µs        ? ?/sec    1.00     52.1±0.45µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs               1.00     40.3±0.21µs        ? ?/sec    1.01     40.6±0.15µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs              1.00     49.8±0.67µs        ? ?/sec    1.00     49.7±0.46µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                1.00     41.4±0.15µs        ? ?/sec    1.02     42.1±0.40µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                    1.00     55.5±0.31µs        ? ?/sec    1.01     56.1±0.53µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                   1.02     81.8±0.62µs        ? ?/sec    1.00     80.5±0.41µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                     1.00     57.5±1.01µs        ? ?/sec    1.00     57.3±0.67µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                1.01     42.7±0.52µs        ? ?/sec    1.00     42.4±0.23µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs               1.00     52.3±0.40µs        ? ?/sec    1.00     52.0±0.72µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                 1.00     43.9±0.19µs        ? ?/sec    1.01     44.1±0.20µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                     1.03     57.7±0.89µs        ? ?/sec    1.00     56.1±0.26µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                    1.00     85.9±0.54µs        ? ?/sec    1.00     85.7±0.35µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                      1.00     59.9±0.19µs        ? ?/sec    1.01     60.6±0.69µs        ? ?/sec

@etseidl
Copy link
Contributor Author

etseidl commented Feb 25, 2026

Currently unknown what impact this optimization will have on the other delta encodings. There could be a good speedup for situations like constant length strings + DELTA_LENGTH_BYTE_ARRAY (think UUIDs or hashes), as well as long runs of the same prefix or long runs of strings with no shared prefix with DELTA_BYTE_ARRAY.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes DELTA_BINARY_PACKED decoding for the common case where a miniblock has bit_width == 0, aiming to match speedups observed in Arrow C++ and address issue #9476.

Changes:

  • Add a fast-path in the delta binary packed decoder for bit_width == 0 miniblocks (including constant-value and arithmetic-progression cases).
  • Add larger regression tests covering constant, increasing, and mixed patterns for Int32/Int64.
  • Add new benchmark cases targeting constant and monotonically increasing delta-encoded pages.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
parquet/src/encodings/decoding.rs Adds bit_width == 0 decoding fast-path and new large tests for delta decoding correctness.
parquet/benches/arrow_reader.rs Adds a page generator and benchmark cases for delta binary packed constant/increasing patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speedup DELTA_BINARY_PACKED decoding when bitwidth is 0

2 participants