-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
When reading Parquet we make an immutable intermediate data structures and accumulate them inti the final result. This is extremely expensive and slow. Since we know the number of rows/types we are reading ahead of time we can optimize the read time by:
- preallocating mutable buffers
- representing missingness with a validity bitmap rather than unboxing things into Maybe
- avoiding linked lists as an intermediate structure wherever possible
These should be fairly simple changes. They will also help us clean up the parquet implementation as we go.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels