-
Notifications
You must be signed in to change notification settings - Fork 508
Description
Search before asking
- I searched in the issues and found nothing similar.
Description:
Apache Arrow currently calculates chunk sizes based on the assumption that Netty's PooledByteBufAllocatorL uses a 16MB page size. If size is blow 16MB, will use nextPowerOfTwo of requestSize.
public long getRoundedSize(long requestSize) {
return requestSize < chunkSize ? CommonUtil.nextPowerOfTwo(requestSize) : requestSize;
}However, Netty's default maximum buffer size has been reduced from 16MB to 4MB in recent versions (see Netty PR #12108). This mismatch leads to memory inefficiency in scenarios where chunk sizes fall between 4MB and 16MB.
Problem Details
For example, if fluss batch size is 4.1MB, then arrow will getRoundedSize as 8MB. 50% is wasted.
Impact on Flink:
Flink's default off-heap memory is only 128MB, and with multiple slots, this is divided further. It is not only used for Fluss (batch reading, decompression, and Netty network requests) but also for the framework itself or other connectors.
In such constrained environments, Arrow's over-allocation exacerbates memory pressure and reduces throughput.
Willingness to contribute
- I'm willing to submit a PR!