Skip to content

Reduce allocation pressure of Parquet reads #151

@mchav

Description

@mchav

When reading Parquet we make an immutable intermediate data structures and accumulate them inti the final result. This is extremely expensive and slow. Since we know the number of rows/types we are reading ahead of time we can optimize the read time by:

  • preallocating mutable buffers
  • representing missingness with a validity bitmap rather than unboxing things into Maybe
  • avoiding linked lists as an intermediate structure wherever possible

These should be fairly simple changes. They will also help us clean up the parquet implementation as we go.

Original discussion: #147
Similar issue: #133

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions