Skip to content

Commit aabd5f1

Browse files
authored
Merge pull request #215 from codelion/feat-add-genselect
Add GenSelect plugin for solution selection
2 parents 26bb5c0 + 7980b46 commit aabd5f1

20 files changed

+1672
-431
lines changed

.github/workflows/test.yml

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
name: Run Tests
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ['3.12']
15+
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v4
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
24+
- name: Cache pip packages
25+
uses: actions/cache@v3
26+
with:
27+
path: ~/.cache/pip
28+
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
29+
restore-keys: |
30+
${{ runner.os }}-pip-
31+
32+
- name: Install dependencies
33+
run: |
34+
python -m pip install --upgrade pip
35+
pip install -r requirements.txt
36+
pip install -r tests/requirements.txt
37+
38+
- name: Run unit tests
39+
run: |
40+
# Run quick CI tests
41+
python tests/test_ci_quick.py
42+
43+
# Run plugin tests with pytest if available
44+
python -m pytest tests/test_plugins.py -v --tb=short || python tests/test_plugins.py
45+
46+
# Run approach tests
47+
python tests/test_approaches.py
48+
49+
integration-test:
50+
runs-on: ubuntu-latest
51+
needs: test
52+
if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
53+
# Only run integration tests on PRs from the same repository (not forks)
54+
# This ensures secrets are available
55+
56+
steps:
57+
- uses: actions/checkout@v4
58+
59+
- name: Set up Python
60+
uses: actions/setup-python@v4
61+
with:
62+
python-version: '3.12'
63+
64+
- name: Install dependencies
65+
run: |
66+
python -m pip install --upgrade pip
67+
pip install -r requirements.txt
68+
69+
- name: Run integration test with OpenAI
70+
if: env.OPENAI_API_KEY != ''
71+
run: |
72+
# Start OptILLM server
73+
python optillm.py &
74+
SERVER_PID=$!
75+
76+
# Wait for server
77+
sleep 5
78+
79+
# Run simple integration test
80+
python tests/test.py --approaches none --single-test "Simple Math Problem" --base-url http://localhost:8000/v1 --model gpt-4o-mini || true
81+
82+
# Stop server
83+
kill $SERVER_PID || true
84+
env:
85+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
86+
continue-on-error: true

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,4 @@ cython_debug/
170170

171171
scripts/results/
172172
results/
173+
test_results.json

README.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,7 @@ Check this log file for connection issues, tool execution errors, and other diag
377377
| Read URLs | `readurls` | Reads all URLs found in the request, fetches the content at the URL and adds it to the context |
378378
| Execute Code | `executecode` | Enables use of code interpreter to execute python code in requests and LLM generated responses |
379379
| JSON | `json` | Enables structured outputs using the outlines library, supports pydantic types and JSON schema |
380+
| GenSelect | `genselect` | Generative Solution Selection - generates multiple candidates and selects the best based on quality criteria |
380381

381382
## Available parameters
382383

@@ -564,6 +565,46 @@ called patchflows. We saw huge performance gains across all the supported patchf
564565

565566
![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)
566567

568+
## Testing
569+
570+
OptILLM includes a comprehensive test suite to ensure reliability and compatibility.
571+
572+
### Running Tests
573+
574+
The main test suite can be run from the project root:
575+
```bash
576+
# Test all approaches with default test cases
577+
python tests/test.py
578+
579+
# Test specific approaches
580+
python tests/test.py --approaches moa bon mcts
581+
582+
# Run a single test
583+
python tests/test.py --single-test "Simple Math Problem"
584+
```
585+
586+
### Unit and Integration Tests
587+
588+
Additional tests are available in the `tests/` directory:
589+
```bash
590+
# Run all tests (requires pytest)
591+
./tests/run_tests.sh
592+
593+
# Run specific test modules
594+
pytest tests/test_plugins.py -v
595+
pytest tests/test_api_compatibility.py -v
596+
```
597+
598+
### CI/CD
599+
600+
All tests are automatically run on pull requests via GitHub Actions. The workflow tests:
601+
- Multiple Python versions (3.10, 3.11, 3.12)
602+
- Unit tests for plugins and core functionality
603+
- API compatibility tests
604+
- Integration tests with various approaches
605+
606+
See `tests/README.md` for more details on the test structure and how to write new tests.
607+
567608
## References
568609
- [Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques](https://arxiv.org/abs/2506.08060)
569610
- [AutoThink: efficient inference for reasoning LLMs](https://dx.doi.org/10.2139/ssrn.5253327) - [Implementation](optillm/autothink)
@@ -587,6 +628,7 @@ called patchflows. We saw huge performance gains across all the supported patchf
587628
- [Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](optillm/rto.py)
588629
- [Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](optillm/moa.py)
589630
- [Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](ptillm/rto.py)
631+
- [AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset](https://arxiv.org/abs/2504.16891) - [Implementation](optillm/plugins/genselect_plugin.py)
590632

591633
## Citation
592634

optillm.py

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -302,9 +302,9 @@ def execute_single_approach(approach, system_prompt, initial_query, client, mode
302302
if hasattr(request, 'json'):
303303
data = request.get_json()
304304
messages = data.get('messages', [])
305-
# Copy all parameters except 'stream', 'model' , 'n' and 'messages'
305+
# Copy all parameters except 'stream', 'model' and 'messages'
306306
kwargs = {k: v for k, v in data.items()
307-
if k not in ['model', 'messages', 'stream', 'n', 'optillm_approach']}
307+
if k not in ['model', 'messages', 'stream', 'optillm_approach']}
308308
response = none_approach(original_messages=messages, client=client, model=model, **kwargs)
309309
# For none approach, we return the response and a token count of 0
310310
# since the full token count is already in the response
@@ -641,17 +641,8 @@ def proxy():
641641
contains_none = any(approach == 'none' for approach in approaches)
642642

643643
if operation == 'SINGLE' and approaches[0] == 'none':
644-
# For none approach with n>1, make n separate calls
645-
if n > 1:
646-
responses = []
647-
completion_tokens = 0
648-
for _ in range(n):
649-
result, tokens = execute_single_approach(approaches[0], system_prompt, initial_query, client, model, request_config)
650-
responses.append(result)
651-
completion_tokens += tokens
652-
result = responses
653-
else:
654-
result, completion_tokens = execute_single_approach(approaches[0], system_prompt, initial_query, client, model, request_config)
644+
# Pass through the request including the n parameter
645+
result, completion_tokens = execute_single_approach(approaches[0], system_prompt, initial_query, client, model, request_config)
655646

656647
logger.debug(f'Direct proxy response: {result}')
657648

0 commit comments

Comments
 (0)