Skip to content

Improve unparsing to handle aggregations with grouping set#36

Merged
sgrebnov merged 3 commits intospiceai-42from
sgrebnov/improve-unparsing-grouping-set
Oct 1, 2024
Merged

Improve unparsing to handle aggregations with grouping set#36
sgrebnov merged 3 commits intospiceai-42from
sgrebnov/improve-unparsing-grouping-set

Conversation

@sgrebnov
Copy link

@sgrebnov sgrebnov commented Oct 1, 2024

Which issue does this PR close?

PR improves Aggregation unparsing logic to handle plans with grouping sets. When Aggregation is created (below) grouping_set_to_exprlist is used to retrieve grouping set expressions and populate schema (qualified_fields)

https://github.com/apache/datafusion/blob/f54712d8c2545fc0dd60cdc693704992e9d82eef/datafusion/expr/src/logical_plan/plan.rs#L2953C9-L2977C6

        let group_expr = enumerate_grouping_sets(group_expr)?;

        let is_grouping_set = matches!(group_expr.as_slice(), [Expr::GroupingSet(_)]);

        let grouping_expr: Vec<&Expr> = grouping_set_to_exprlist(group_expr.as_slice())?;

        let mut qualified_fields = exprlist_to_fields(grouping_expr, &input)?;
..
        qualified_fields.extend(exprlist_to_fields(aggr_expr.as_slice(), &input)?);

        let schema = DFSchema::new_with_metadata(
            qualified_fields,
            input.schema().metadata().clone(),
        )?;

        Self::try_new_with_schema(input, group_expr, aggr_expr, Arc::new(schema))
    }

PR adds the same step/logic to correctly retrieve all aggregation expressions when unproject columns. W/o this logic current unproject logic fails as schema does not match agg.group_exp + agg.aggr_expr (must be grouping_set_to_exprlist(agg.group_exp) + agg.aggr_expr) : can't get and then unwrap unprojected expression by index: https://github.com/apache/datafusion/blob/f54712d8c2545fc0dd60cdc693704992e9d82eef/datafusion/sql/src/unparser/utils.rs#L92

Concern

grouping_set_to_exprlist is used directly in find_agg_exp so invoked for each column to unproject (in case of grouping set). This could be optimized to do this only once. Decided to keep this logic this way as pre-calculating and passing such expressions as argument or changing find_agg_exp signature will make code less clean and grouping_set_to_exprlist does NOT create expressions copies (operates by Vec<&Expr>)

@sgrebnov sgrebnov changed the title Sgrebnov/improve unparsing grouping set Improve unparsing to handle aggregations with grouping set Oct 1, 2024
@sgrebnov sgrebnov self-assigned this Oct 1, 2024
@sgrebnov sgrebnov merged commit b15debb into spiceai-42 Oct 1, 2024
@sgrebnov sgrebnov deleted the sgrebnov/improve-unparsing-grouping-set branch October 1, 2024 23:45
@sgrebnov
Copy link
Author

sgrebnov commented Oct 1, 2024

Created PR to datafusion combining this and previous improvement. Merged this PR to unblock TPC-DS queries, will add required changes based on review/feedback if any:
apache#12705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants