Skip to content

fix: preserve the order of elements of the list extract from defaults.ini #660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 8, 2024

Conversation

tromai
Copy link
Member

@tromai tromai commented Mar 5, 2024

This PR is created to addressed to issue mentioned here #641 (comment).

This issue is addressed by removing the parameter duplicate_ok in the get_list method

def get_list(
self,
section: str,
item: str,
delimiter: str | None = "\n",
fallback: list[str] | None = None,
duplicated_ok: bool = False,
strip: bool = True,
) -> list[str]:
. This essentially will make the get_list method not modifying the order of elements in the return list of strings.

As a result, the unit tests in tests/config/test_defaults.py are also re-implemented to reflect this new behavior, hence a major modification within that file.

In additions, I have also rename the parameter item to option according to the documentation of ConfigParser - https://docs.python.org/3/library/configparser.html#module-configparser

@tromai tromai added the bug Something isn't working label Mar 5, 2024
@tromai tromai self-assigned this Mar 5, 2024
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 5, 2024
@tromai tromai force-pushed the tromai/improve-get-list-defaults-ini branch from 36667fd to 4823341 Compare March 6, 2024 00:12
@tromai tromai marked this pull request as ready for review March 6, 2024 00:22
@tromai tromai requested a review from behnazh-w as a code owner March 6, 2024 00:22
@behnazh-w behnazh-w requested a review from benmss March 6, 2024 00:24
@nathanwn
Copy link
Member

nathanwn commented Mar 6, 2024

I have a question: why did we have this de-duplicating logic in the first place, and if we remove it does it break any of the current behaviors (i.e. those that may not be verified in any unit tests)?

@behnazh-w
Copy link
Member

I have a question: why did we have this de-duplicating logic in the first place, and if we remove it does it break any of the current behaviors (i.e. those that may not be verified in any unit tests)?

That's useful when the user makes a mistake in the configuration file while we expect to have unique values. We shouldn't remove this feature.

@tromai
Copy link
Member Author

tromai commented Mar 6, 2024

That's useful when the user makes a mistake in the configuration file while we expect to have unique values. We shouldn't remove this feature.

Hmm interesting, I wonder what would be the examples in Macaron where the caller of defaults.get_list is expecting a unique values @behnazh-w? I was not aware of places that removing the de-duplication would break the existing logic. Therefore, resolving those places must be done in this PR.
My implementation was following the principle that defaults.get_list will only return the same value as defined in defaults.ini, hence the de-duplication was removed from defaults.get_list.
If we were to preserve the order of elements in the returned list and de-duplicate the list inside defaults.get_list, it would be quite complicated. This is because defaults.get_list doesn't know what is the context of the values in the list. Therefore, it cannot de-duplicate the values without breaking the original order (as defined in defaults.ini). For example:

[section]
option = 
    foo
    bar
    foo

defaults.get_list won't know whether it should return ["bar", "foo"] or ["foo", "bar"] because essentially both are the same and without any further context, we wouldn't know which of them should be the order the user is expecting 🤔.
Therefore, my suggestion is that we remove the de-duplication from defaults.get_list and instead we handle the de-duplication separated at places where we call defaults.get_list and expect a list with unique elements. And to do that, I think I need to go through the places where defaults.get_list is called again to see at which location, a list with unique elements from defaults.get_list is expected.

@behnazh-w
Copy link
Member

@tromai

Therefore, my suggestion is that we remove the de-duplication from defaults.get_list and instead we handle the de-duplication separated at places where we call defaults.get_list and expect a list with unique elements.

I'm not sure I agree. Checking for duplicates as a way to catch mistakes in the configuration can be the default and be done in one place. There is no point to repeat that outside the function call. I would argue that if a config needs to allow duplicates, that's a specific case and should be turned on explicitly.

In general, I prefer to be as explicit as possible. So, I suggest to add a new argument to preserve the order, like preserve_order, which can be turned off by default. If both duplicated_ok and preserve_order are turned on, we can remove the duplicate item that comes last and explain that in the docstring.

@nathanwn
Copy link
Member

nathanwn commented Mar 6, 2024

My suggestion: I think that we should always keep the order (which is a sensible default behavior while there is no good reason to not keep the order), while optionally allowing for duplication.

The logic would be different from the original one. I am thinking something similar to this (which is probably similar to what Behnaz is suggesting):

# content is the list of values read from the config and split apart with the delimiter
# Therefore, content retains the initial order.

if duplication_allowed:
    return content
else:
    values = []
    for item in content:
        if item in values:
            continue
        values.append(item)
    return values

A dedicated set, for example already = set(), can be used to check for duplication instead of checking on values. However, we need to use the values list to store and return the values anyway, and asymptotic time complexity is not an issue here because configurations cannot be very large.

@tromai
Copy link
Member Author

tromai commented Mar 6, 2024

I think I agree with most of what @nathanwn suggested.
The issue of ordering arises because I used a set to de-duplicate the list of strings. At the beginning, I don't have any use case for not preserving the original order of elements, therefore after we fix the order, I don't know if preserve_order would still need to be there.
I agree with @behnazh-w that we could now put the de-duplication inside defaults.get_list and always remove the duplicate item that comes last. I was thinking of a similar solution as proposed by @nathanwn .

Copy link
Member

@nathanwn nathanwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix.

@tromai tromai requested a review from benmss March 6, 2024 13:10
@tromai tromai requested a review from behnazh-w March 8, 2024 00:10
@tromai tromai merged commit 44dbf0a into staging Mar 8, 2024
@tromai tromai deleted the tromai/improve-get-list-defaults-ini branch March 8, 2024 01:20
benmss pushed a commit that referenced this pull request Mar 14, 2024
art1f1c3R pushed a commit that referenced this pull request Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants