Add relational expression simplification with predicate inference #396

knassre-bodo · 2025-07-19T01:20:15Z

Adds a simplification pass in the relational optimizer that simplifies relational expressions using predicate inference & constant folding. Built via a relational visitor which modifies the relational nodes in place by invoking a relational shuttle that transforms the expressions, both of which propagate information via PredicateSet objects which contain information about whether an expression matches certain criteria (positive, non-negative, non-null).

Also updates the testing infrastructure in several ways:

PyDoughPandasTest can now take in a string instead of a function (evaluated via pydough.from_string).
The args argument to PyDoughPandasTest is replaced with kwargs, passed in as keyword arguments to functions or as an environment to pydough.from_string.

… performant with fewer window functions [RUN CI]

knassre-bodo · 2025-07-24T05:58:47Z

pydough/conversion/relational_converter.py

@@ -1531,6 +1540,9 @@ def convert_ast_to_relational(
    raw_result: RelationalRoot = postprocess_root(node, columns, hybrid, output)

    # Invoke the optimization procedures on the result to clean up the tree.
-    optimized_result: RelationalRoot = optimize_relational_tree(raw_result, configs)
+    additional_shuttles: list[RelationalExpressionShuttle] = []


This list will be used down the lines for when additional configurations are set that allow more specific/advanced transformations to happen during simplification.

pydough/conversion/relational_simplification.py

john-sanchez31 · 2025-07-28T20:31:37Z

tests/test_pipeline_defog_custom.py

+                " s03 = DEFAULT_TO(None, 0) > 0,"  # -> False
+                " s04 = DEFAULT_TO(None, 0) <= 0,"  # -> True
+                " s05 = DEFAULT_TO(None, 0) < 0,"  # -> False
+                " s06 = DEFAULT_TO(None, 0) == None,"  # -> None


For tests from s06 to s11 shouldn't we expect True/False values ? Why are they None?

x == NULL is always NULL, the same goes for other operators. You can test it in SQL. Some dialects have "null safe" variants (for example, in Snowflake you can do EQUAL_NULL(1, NULL) which is False, or EQUAL_NULL(NULL, NULL) which is True)

john-sanchez31 · 2025-07-28T20:37:55Z

tests/test_pipeline_defog_custom.py

+                " s03 = MONOTONIC(1, 4, 3),"  # -> False
+                " s04 = MONOTONIC(1, 2, 1),"  # -> False
+                " s05 = MONOTONIC(1, 0, 1),"  # -> False
+                " s06 = MONOTONIC(1, LENGTH('foo'), COUNT(customers)),"  # -> 3 <= COUNT(customers)


Are the comment of s06 and s08 comments accurate? Taking in count that below we are expecting 1 and '0' respectively

The comments indicate what they get simplified to, not what they evaluate to. The simplified expression still needs to get evaluated. In this case, s06 simplifies to MONOTONIC(1, 3, COUNT(customers)), which is equivalent to (1 <= 3) & (3 <= COUNT(customers), which simplifies to True & (3 <= COUNT(customers)), which just becomes 3 <= COUNT(customers).

Also there is no True/False in sqlite, it just returns 0/1.

hadia206

Overall code seems to do what the PR description is saying.
Please see my comments below, I'm mainly looking for some more tests.

pydough/conversion/relational_simplification.py

hadia206 · 2025-07-30T21:51:42Z

pydough/conversion/relational_simplification.py

+        if len(predicates) == 0:
+            return result
+        else:
+            result |= predicates[0]


Shouldn't this be result = predicates[0]?
My understanding is that result |= predicates[0] will be a no-op since result is the default PredicateSet(), and | will be a union

Because I don't want the intersect operator to actually mutate any of the input sets (since everything is mutable). If I do result |= predicates[0] then in the next section do result &= pred, this will mutate both result and predicates[0], which is a problem if predicates[0] is being used elsewhere.

hadia206 · 2025-07-30T21:56:26Z

pydough/conversion/relational_simplification.py

+        """
+        result: PredicateSet = PredicateSet()
+        if len(predicates) == 0:
+            return result


Does that man both union and intersection with empty set will have the same result value?

This is saying if we call it on an empty list of sets, the output assumption is that we know nothing.

hadia206 · 2025-07-30T22:00:07Z

pydough/conversion/relational_simplification.py

+        output_predicates: PredicateSet = PredicateSet()
+        if literal_expression.value is not None:
+            output_predicates.not_null = True
+            if isinstance(literal_expression.value, (int, float)):


what about booleans? 0/1

hadia206 · 2025-07-30T22:00:29Z

pydough/conversion/relational_simplification.py

+                if literal_expression.value >= 0:
+                    output_predicates.not_negative = True
+                    if literal_expression.value > 0:
+                        output_predicates.positive = True


Should we have a default or fallback for other types? (string, boolean, dates, ...)

We start with the default predicate set (we know nothing to be true) and if the value happens to be in these cases then we add predicates to it. The default is we don't enter these cases so we just return the empty predicate set.

pydough/conversion/relational_simplification.py

hadia206 · 2025-07-30T22:50:34Z

tests/test_pipeline_defog_custom.py

+                "simplification_3",
+            ),
+            id="simplification_3",
+        ),


Other tests to include

Tests for ADD, MUL, DIV simplification outside ABS and DEFAULT_TO

Test multiplication by negative or zero explicitly to confirm positive/not-negative predicates.

Tests for COUNT(*), COUNT(non-null column), and their positive/non-null predicates outside ABS and DEFAULT_TO

Logical operators with mixed literal and non-literal inputs

XOR (BXR) and LIKE operators

Tests for Window call operators with and without frames

Tests for aggregate functions

Tests where expressions are nested multiple levels to verify recursive simplification works correctly. Example: ABS(ADD(COUNT(customers), DEFAULT_TO(None, 5))).

Empty inputs, zero-length strings in string functions.

DEFAULT_TO with more than two arguments.

Boolean operators with None or unexpected values.

Tests for ADD, MUL, DIV simplification outside ABS and DEFAULT_TO

Sure thing

Tests for COUNT(*), COUNT(non-null column), and their positive/non-null predicates outside ABS and DEFAULT_TO

There isn't really a way to test their positive/non-null predicates w/o ABS / DEFAULT_TO (or similarly sensitive operators like boolean operators). The point of those two is that they only get simplified correctly if the predicates to their arguments are correct.

Logical operators with mixed literal and non-literal inputs

We have some of these already

Tests where expressions are nested multiple levels to verify recursive simplification works correctly

We have some of those already: MONOTONIC(1, LENGTH('foo'), COUNT(customers)), ABS(DEFAULT_TO(AVG(ABS(DEFAULT_TO(LENGTH(customers.name), 0))), 0))

I think I now got most of the ones you suggested in some form.

Co-authored-by: Hadia Ahmed <[email protected]>

hadia206

Thanks Kian

john-sanchez31

Looks fine

Extension of #396 to further the transformation of LEFT joins into INNER joins when there are filters on top of the join if the filters would be false if the RHS columns were all-null (relies on simplification to deduce if this is the case). Also creates a new `RelationalShuttle` base class so relational nodes can use the shuttle design pattern, and refactors filter pushdown to use this pattern. Also refactors several other utilities to use shuttles that should have been doing so already.

knassre-bodo added 30 commits July 9, 2025 12:38

WIP

18ba964

WIP improvements on projection pullup

684b5d7

Fixing filter/join cases

cc004ec

Bugfixes, testing for correctness [RUN CI]

de7d4e6

Merge branch 'main' into kian/projection_pullup

e6dea43

Finished dealing with JOIN pull-up

3827276

Fixed pullup bugs

1fff8ea

Pullup with LIMIT [RUN CI]

07136d2

Merge branch 'main' into kian/projection_pullup

602f292

Adding extra round of bubbling

d7ec696

Compressing limit into root

d88cdb5

Restoring filter modifications

665a9dd

Started aggregation project pullup

d1fe25b

Added SUM(1)->COUNT() optimization

0892581

Cleanup merge projects

a69d764

Added some adjacent aggregaiton merging

13d9844

Added min/min, max/max, anything/anything cases

c497127

Adding more aggregation simplification and comments

f74fc5c

Added more aggregation simplification + tests

d260428

Adjusting parameters of optimization

856e1a9

Pulled out common logic from filter/join/limit and added comments

c4298cb

Added remaining comments

5e9f09d

[RUN CI]

c45b4f2

Resolving conflicts [RUN CI]

6a7bc49

Added PageRank tests and fixed bugs found along the way

416fbad

Started adding comments

b121c31

Started adding comments

05f7147

Added comments

2697b10

Fixing c4 test and refactoring the PageRank impl to be simpler & more…

94726ff

… performant with fewer window functions [RUN CI]

Merge branch 'main' into kian/projection_pullup

e21cf57

knassre-bodo added 6 commits July 24, 2025 00:57

Merge branch 'main' into kian/simplify

8f0fbd3

Adding docstrings

f150cd5

Revisions

8d0fc6b

Stack cleanup

22a94ab

Adding additional shuttle framework

a971676

[RUN CI]

9c6caa2

knassre-bodo marked this pull request as ready for review July 24, 2025 05:45

knassre-bodo requested review from hadia206, john-sanchez31 and juankx-bodo July 24, 2025 05:54

knassre-bodo commented Jul 24, 2025

View reviewed changes

knassre-bodo mentioned this pull request Jul 24, 2025

Fixing bug with cross of two partition nodes with correlates #398

Merged

Merge branch 'main' into kian/simplify

604a7a6

john-sanchez31 reviewed Jul 28, 2025

View reviewed changes

hadia206 reviewed Jul 30, 2025

View reviewed changes

knassre-bodo and others added 5 commits July 31, 2025 00:36

Added more simplfication patterns to tests

b63c5d4

Revisions

02c24bd

Apply suggestions from code review

2c65773

Co-authored-by: Hadia Ahmed <[email protected]>

edit

cc12363

Merge remote-tracking branch 'origin/kian/simplify' into kian/simplify

5f19dcd

knassre-bodo requested review from hadia206 and john-sanchez31 July 31, 2025 17:07

hadia206 approved these changes Jul 31, 2025

View reviewed changes

john-sanchez31 approved these changes Jul 31, 2025

View reviewed changes

juankx-bodo approved these changes Aug 1, 2025

View reviewed changes

knassre-bodo added 2 commits August 1, 2025 10:45

[RUN CI]

6ec13f1

Fixing SQL test [RUN CI]

9344c9a

knassre-bodo merged commit d7612bc into main Aug 1, 2025
7 checks passed

knassre-bodo deleted the kian/simplify branch August 1, 2025 15:06

Add relational expression simplification with predicate inference #396

Add relational expression simplification with predicate inference #396

Uh oh!

Conversation

knassre-bodo commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadia206 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knassre-bodo Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadia206 left a comment

Choose a reason for hiding this comment

Uh oh!

john-sanchez31 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

knassre-bodo commented Jul 19, 2025 •

edited

Loading

knassre-bodo Jul 29, 2025 •

edited

Loading

knassre-bodo Jul 31, 2025 •

edited

Loading

knassre-bodo Jul 31, 2025 •

edited

Loading