-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does
Description
Describe the bug
When executing a hash join with multiple join keys where one column is dictionary-encoded with fewer unique values than rows, DataFusion panics with:
InvalidArgumentError("Incorrect array length for StructArray field \"c1\", expected N got M")
To Reproduce
-- Small table with dictionary-encoded region (2 rows, 1 unique value)
CREATE TABLE small AS
SELECT id, arrow_cast(region, 'Dictionary(Int32, Utf8)') as region
FROM (VALUES (1, 'west'), (2, 'west')) AS t(id, region);
CREATE TABLE large AS
SELECT id, region, value
FROM (VALUES (1, 'west', 100), (2, 'west', 200), (3, 'east', 300)) AS t(id, region, value);
-- Multi-column join triggers panic
SELECT s.id, s.region, l.value
FROM small s
JOIN large l ON s.id = l.id AND s.region = l.region;Expected behavior
Query returns 2 rows:
+----+--------+-------+
| id | region | value |
+----+--------+-------+
| 1 | west | 100 |
| 2 | west | 200 |
+----+--------+-------+
Actual behavior
Panic:
thread 'main' panicked at arrow-array/src/array/struct_array.rs:91:46:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Incorrect array length for StructArray field \"c1\", expected 3 got 2")
Root cause
In flatten_dictionary_array introduced by #18393:
fn flatten_dictionary_array(array: &ArrayRef) -> ArrayRef {
downcast_dictionary_array! {
array => {
flatten_dictionary_array(array.values())
}
_ => Arc::clone(array)
}
}The function calls array.values() which returns the dictionary's unique values array, not the full array of values.
When building a StructArray for multi-column join keys, StructArray::try_new_with_length() detects the length mismatch:
if a.len() != len {
return Err(ArrowError::InvalidArgumentError(format!(
"Incorrect array length for StructArray field {:?}, expected {} got {}",
f.name(), len, a.len()
)));
}Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does