Skip to content

Support multiple initial programs #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

0x0f0f0f
Copy link

@0x0f0f0f 0x0f0f0f commented Jul 7, 2025

Closes #55

nargs="?",
help="Path to the initial program file",
default=None,
)

Copy link
Author

@0x0f0f0f 0x0f0f0f Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss the positioning of arguments

The MR currently gives the openevolve-run.py CLI this usage:

usage: openevolve-run.py [-h] [--config CONFIG] [--output OUTPUT] [--iterations ITERATIONS] [--target-score TARGET_SCORE]
                         [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--checkpoint CHECKPOINT] [--api-base API_BASE] [--primary-model PRIMARY_MODEL]
                         [--secondary-model SECONDARY_MODEL] [--initial-programs-dir INITIAL_PROGRAMS_DIR]
                         evaluation_file [initial_program]

Before, initial_program was the first positional argument. I've swapped them, and also added [--initial-programs-dir INITIAL_PROGRAMS_DIR].

It would probably best, if instead of passing the directory, the first argument stayed the evaluator, and then we require one or more initial programs as positional arguments, so users can do something like python /path/to/openevolve-run.py evaluator.py initial_program1.py initial_program2.py other_programs_dir/*.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in latest version

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we keep the existing ordered args for backwards compatibility, in addition we can add a --initial-programs argument that can take either a directory or a list of paths.

Copy link
Author

@0x0f0f0f 0x0f0f0f Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was what I attempted in the previous commit, and the logic for args parsing was not looking very good. With the current state, we can definitely do python openevolve-run.py evaluator.py prog1.py prog2.py ./a/b/c/*.py and all other bash goodies. Having positional arguments be initial_program.py evaluator.py and then --initial_programs dir/*.py then causes some issues. One has to always provide one initial program, and cannot omit the first positional argument, and the behavior/ux of defining both initial_program argument and --initial_programs together is not really clear.

I would go for the breaking change if possible! (It also keeps things very simple on OpenEvolve side, no directory listing, etc...)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example where multiple initial programs would be required? If the idea is to evolve like an entire codebase at once, then just having a folder path instead of file path as first argument should be sufficient. We can then use the config.yaml to control what file types or other things are in scope of evolution v.s. out of scope similar to what was suggested in #111

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my use case, then I just want to pick a single output program from many starting programs.

The issue with having the folder path as the first argument (which is what I attempted at the beginning of this branch) is that it requires openevolve cli to perform a lot of extra logic: directory listing, extension parsing, etc...

Having a + vararg allowed me to remove all of this extra logic in latest commits. If the user wants to have many initial programs which are initialized per-island, then the logic of selecting these programs from a directory then can be done via normal bash globbing: openevolve-run evaluator.py file1 file2 dir1/foo/{a.py,b.py}, dir2/**.py.

If we go for something like openevolve.py initial_program.py evaluator.py --initial-programs=dir/ then we enter problems:

  1. All the extra directory listing logic is required in openevolve: (Ignore readmes, check file extensions, etc...)
  2. Users then have to move their initial programs to a directory with no other files.
  3. Being the first positional argument, initial_program.py is not optional, so initial_program.py and --initial-programs=dir/ would not be interchangeable.

So I think this breaking change is worth it, to not reinvent the wheel :)


For evolving an entire codebase, which is a different goal from mine, we could have a openevolve-run.py --codebase mode later. The logic would be that each starting program will need an output program, so it's quite different.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is about just assigning different initial programs to different islands, we can do it via the config.yaml We can have one initial program.py as we do and in teh config.yaml with the island configs we can also add path to other initial programs which can be initialized to the islands. Will need to ensure that the config has sufficient islands so it may be better to define them at the same place in the config itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to ensure that the config has sufficient islands

This is done in this PR. After initializing config, if number of islands is < number of programs then it throws. Very simple check

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done when loading the programs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about that, but it might also introduce additional logic. I don't really like the idea, because having inputs defined in a config feels like violating the UNIX philosophy

              | Config           
              v                  
        +----------------+       
Input(s)|                | Output
 ------>|   Program      +------>
        |                |       
        +----------------+       

@jvm123 jvm123 mentioned this pull request Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Starting from multiple primitive programs
2 participants