Skip to content

Dynamic Filter Pushdown causes JOIN to return incorrect results #17188

@nuno-faria

Description

@nuno-faria

Describe the bug

The Dynamic Filter Pushdown optimization (#16445) is causing joins to return incorrect results when the predicate has multiple conditions.

To Reproduce

copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';
create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';

In the following query, sometimes only v=1 is evaluated, while in others only v=10000000 is evaluated:

select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10000000;

We can see that the DynamicFilterPhysicalExpr can change:

+---+---+---+
| k | k | v |
+---+---+---+
| 1 | 1 | 1 |
+---+---+---+

predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 1 ]
+----------+----------+----------+
| k        | k        | v        |
+----------+----------+----------+
| 10000000 | 10000000 | 10000000 |
+----------+----------+----------+

predicate=DynamicFilterPhysicalExpr [ k@0 >= 10000000 AND k@0 <= 10000000 ]

If we use a smaller table, both v=1 and v=10000000 will be considered, so maybe the pushdown is not waiting until the filtered side is fully completed? Here is a bigger example to show how it changes:

copy (select i as k from generate_series(1, 10000000) as t(i)) to 't1.parquet';
copy (select i as k, i % 10000 as v from generate_series(1, 10000000) as t(i)) to 't2.parquet';

create external table t1 stored as parquet location 't1.parquet';
create external table t2 stored as parquet location 't2.parquet';

select *
from t1
join t2 on t1.k = t2.k
where v = 1 or v = 10;

And a few runs:

output_rows=1944
predicate=DynamicFilterPhysicalExpr [ k@0 >= 200010 AND k@0 <= 9910001 ]

output_rows=1990
predicate=DynamicFilterPhysicalExpr [ k@0 >= 30010 AND k@0 <= 9970010 ]

output_rows=1978
predicate=DynamicFilterPhysicalExpr [ k@0 >= 60010 AND k@0 <= 9920001 ]

Expected behavior

Return the correct results.

Additional context

Disabling datafusion.optimizer.enable_dynamic_filter_pushdown returns the correct results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions