-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The Dynamic Filter Pushdown optimization (#16445) is causing joins to return incorrect results when the predicate has multiple conditions.
To Reproduce
copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
In the following query, sometimes only v=1
is evaluated, while in others only v=10000000
is evaluated:
select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10000000;
We can see that the DynamicFilterPhysicalExpr
can change:
+---+---+---+
| k | k | v |
+---+---+---+
| 1 | 1 | 1 |
+---+---+---+
predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 1 ]
+----------+----------+----------+
| k | k | v |
+----------+----------+----------+
| 10000000 | 10000000 | 10000000 |
+----------+----------+----------+
predicate=DynamicFilterPhysicalExpr [ k@0 >= 10000000 AND k@0 <= 10000000 ]
If we use a smaller table, both v=1
and v=10000000
will be considered, so maybe the pushdown is not waiting until the filtered side is fully completed? Here is a bigger example to show how it changes:
copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i % 10000 as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';
select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10;
And a few runs:
output_rows=1944
predicate=DynamicFilterPhysicalExpr [ k@0 >= 200010 AND k@0 <= 9910001 ]
output_rows=1990
predicate=DynamicFilterPhysicalExpr [ k@0 >= 30010 AND k@0 <= 9970010 ]
output_rows=1978
predicate=DynamicFilterPhysicalExpr [ k@0 >= 60010 AND k@0 <= 9920001 ]
Expected behavior
Return the correct results.
Additional context
Disabling datafusion.optimizer.enable_dynamic_filter_pushdown
returns the correct results.
adriangbjonathanc-n
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working