Skip to content

Clear VectorSchemaRoot to release buffer as soon as possible after a batch read finined. #2726

@loserwang1024

Description

@loserwang1024

Search before asking

  • I searched in the issues and found nothing similar.

Description

Description:

When reading a batch of records using LogRecordBatch.ReadContext#getVectorSchemaRoot, the current implementation retains memory buffers even after the batch is processed. According to the Javadoc:

"DO NOT close the vector schema root because it is shared across multiple batches."

While this is correct (as new buffer loads replace old ones), it leads to a temporary duplication of buffers during batch transitions.

For example:

  1. Old buffer (from previous batch) is still referenced by the VectorSchemaRoot.
  2. New buffer (for current batch) is loaded into the same VectorSchemaRoot.
  3. The old buffer is not released until loadFieldBuffers.
Image

Between 2 and 3, old buffer and new buffer are existed at same time, and cannot reuse.

Proposed Solution:

Still "DO NOT close the vector schema root because it is shared across multiple batches". However explicitly clear the VectorSchemaRoot at the end of each batch to release old buffers immediately.

vectorSchemaRoot.clear(); // Releases buffers but retains schema structure

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions