Skip to content

Releases: OpenAdaptAI/openadapt-ml

v0.16.1

12 Jun 21:54

Choose a tag to compare

v0.16.1 (2026-06-12)

Bug Fixes

  • Repair 8 latent import/kwarg bugs; add import-integrity and packaging guards (#64, 04e9e67)

Follow-up to OpenAdaptAI/OpenAdapt#999: green CI shipped a fully broken CLI because no test exercised lazy imports, internal call seams, or the built wheel. This adds three guards and fixes everything they caught on current main:

New tests (dependency-free, <3s total): - tests/test_import_integrity.py::test_no_phantom_imports — AST-walks every from openadapt_ml.x import y (including inside function bodies) and verifies y exists in x - tests/test_import_integrity.py::test_no_phantom_kwargs — verifies keyword args passed to internal functions exist in their signatures - tests/test_packaging.py — builds the wheel and asserts bundled configs, core modules, and version match

Latent bugs the new tests caught on main, fixed here: - cloud/local.py: cmd_serve called regenerate_local_dashboard with a keep_polling kwarg that no longer exists (dashboard regeneration failed on every serve, downgraded to a warning) - scripts/compare.py + experiments/demo_prompt/run_experiment.py: capture_to_episode(goal=) — same kwarg bug as #999 bug 3, two more call sites - ingest/init.py: re-exported capture_to_session and load_captures_as_sessions, which don't exist; the except ImportError guard meant capture_to_episode was silently never exported either - evals/grounding.py: TYPE_CHECKING import from missing module openadapt_ml.data.types (Episode lives in openadapt_ml.schema) - cloud/azure_inference.py: imported QwenVLAdapter from missing module openadapt_ml.adapters.qwen (lives in models.qwen_vl) and constructed it with a non-existent model_name kwarg (needs from_pretrained); imported generate_comparison which was refactored away — restored as a thin wrapper over
generate_comparison_data/_html in compare.py

Also: - release.yml: file/append a GitHub issue when the release workflow fails. Releases failed silently Mar-Jun 2026 while PyPI went stale, which is what forced #999's reporter onto git installs - pyproject: add build to the dev extra for the packaging tests

Co-authored-by: Claude Fable 5 noreply@anthropic.com


Detailed Changes: v0.16.0...v0.16.1

v0.16.0

12 Jun 15:22

Choose a tag to compare

v0.16.0 (2026-06-12)

Bug Fixes

  • Align GRPO prompt format with SFT training format (#61, 04e6e9f)

The GRPO rollout prompt was missing the "Thought:" line and action history that the SFT training uses. Models fine-tuned via SFT output "Thought: ...\nAction: CLICK(...)" but the GRPO prompt didn't prompt for this format, causing verbose free-form output that couldn't be parsed → reward 0.0 → zero gradients.

Changes:

  • Add "Thought:" and "Action:" prompt lines matching SFT format
  • Add action_history parameter for step context
  • Parser extracts action from "Action: ..." line before regex matching
  • Parser handles JSON format {"action_type": "click", "coordinate": [x,y]}
  • Debug logging of raw VLM output for zero-reward diagnosis

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • Include image placeholder in chat template for VLM GRPO (#59, 6617e02)

Qwen2.5-VL requires <|image_pad|> tokens in the input. These are inserted by apply_chat_template only when messages include {"type": "image"} content blocks.

Fixed both agent_fn and _compute_rollout_loss.

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • Increase max_new_tokens to 2048 and make configurable via GRPOConfig (#62, fecf461)
  • fix: align GRPO prompt format with SFT training format

The GRPO rollout prompt was missing the "Thought:" line and action history that the SFT training uses. Models fine-tuned via SFT output "Thought: ...\nAction: CLICK(...)" but the GRPO prompt didn't prompt for this format, causing verbose free-form output that couldn't be parsed → reward 0.0 → zero gradients.

Changes: - Add "Thought:" and "Action:" prompt lines matching SFT format - Add action_history parameter for step context - Parser extracts action from "Action: ..." line before regex matching - Parser handles JSON format {"action_type": "click", "coordinate": [x,y]} - Debug logging of raw VLM output for zero-reward diagnosis

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • fix: increase max_new_tokens to 2048 and make configurable

The default of 100 tokens truncated reasoning models mid-thought, producing unparseable output → DONE → reward 0.0 → zero gradients. Caused 4 failed training runs (~20 GPU-hours wasted).

  • Add max_new_tokens to GRPOConfig (default 2048) - Use config value instead of hardcoded 100 - Add truncation warning when generation hits the limit

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • Repair broken internal imports, CPU training, and config packaging (OpenAdapt#999) (#63, 9b8f3cd)
  • fix: repair broken internal imports, CPU training, and config packaging

Fixes the openadapt-ml side of OpenAdaptAI/OpenAdapt#999:

  • Add missing update_current_symlink_to_latest() to training/trainer.py; cmd_serve imported it but it never existed, so openadapt serve failed with a misleading "openadapt-ml not installed" error (bug 2) - scripts/train.py: capture_to_episode(goal=) -> instruction=, matching the actual signature (bug 3) - scripts/demo_policy.py: import generate_synthetic_episodes instead of the non-existent generate_synthetic_sessions (bug 4) - Ship configs/ inside the wheel (hatch force-include) and add resolve_config_path() so installed packages find bundled configs instead of failing on repo-relative paths (bug 5) - training/trl_trainer.py: pass use_cpu/bf16/fp16 to SFTConfig based on CUDA availability so CPU-only training no longer raises ValueError (bug 7)

Verified: py_compile on all changed files, isolated functional test of

the symlink helper, and wheel build confirming configs are included.

Co-Authored-By: Claude Fable 5 noreply@anthropic.com

  • fix: respect MPS in CPU fallback, prefer run-like dirs for symlink
  • use_cpu now stays False on Apple Silicon (MPS) so the previous accelerated path isn't regressed; only true CPU-only setups get use_cpu=True - update_current_symlink_to_latest prefers directories containing training_log.json or dashboard.html so a stray top-level "checkpoints" directory from the old flat layout can't win
  • style: ruff format grpo modules (pre-existing violations blocking CI)

These two files fail 'ruff format --check' on main as well; formatting them here so the test matrix can actually run.


Co-authored-by: Claude Fable 5 noreply@anthropic.com

Features

  • Add --task-dir support for milestone-based rewards in standalone GRPO trainer (#60, 7d095da)
  • fix: include image placeholder in chat template for VLM GRPO

Qwen2.5-VL requires <|image_pad|> tokens in the input. These are inserted by apply_chat_template only when messages include {"type": "image"} content blocks.

Fixed both agent_fn and _compute_rollout_loss.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • feat: add --task-dir support for milestone-based rewards in standalone GRPO trainer
  • GRPOConfig: add task_dir field - reward.py: evaluate_milestones_screenshot() for client-side reward - trainer.py: load TaskConfigs, auto-populate task_ids, override rewards - rollout_collector.py: pass task_configs to env - No WAA evaluate endpoint needed — rewards computed via VLM judge

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com


Detailed Changes: v0.15.1...v0.16.0

v0.15.1

21 Mar 20:00

Choose a tag to compare

v0.15.1 (2026-03-21)

Bug Fixes

  • Use keyword args for Qwen VL processor call (#58, d196dce)
  • fix: replace AutoModelForVision2Seq with AutoModelForImageTextToText for transformers 5.x

AutoModelForVision2Seq was removed in transformers 5.x (shipped on AWS DL AMI). Use AutoModelForImageTextToText as the primary import with a fallback to AutoModelForVision2Seq for older transformers versions.

Files updated: - openadapt_ml/training/grpo/trainer.py - openadapt_ml/cloud/modal_cloud.py - docs/grpo_trl_rewrite_draft.py (comment only)

Note: openadapt_ml/training/trl_trainer.py already had the correct

try/except pattern and was not modified.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • fix: use keyword args for Qwen VL processor to avoid positional conflict

Qwen2_5_VLProcessor.call() expects text= and images= as keyword args. Passing text as positional arg conflicts with images kwarg: TypeError: got multiple values for argument 'images'


Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

Chores

Documentation

  • Add first scored trace (Notepad Hello World, score 0.5) (ba44eaa)

6 steps, 91s, GPT-5.4-mini planner+grounder, lightweight mode. VLM judge passed milestone 2 (Hello World typed, confidence 1.00). Milestone 1 (process check) timed out during /execute_windows eval.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com


Detailed Changes: v0.15.0...v0.15.1

v0.15.0

19 Mar 23:36

Choose a tag to compare

v0.15.0 (2026-03-19)

Bug Fixes

  • Make heavy ML dependencies optional for lightweight installs (#57, aa954ba)
  • fix: make heavy ML dependencies optional for lightweight installs

Move torch, torchvision, bitsandbytes, peft, and transformers from required dependencies to [project.optional-dependencies.training]. Wrap all top-level imports of these packages in try/except ImportError so the package can be imported without them installed.

This unblocks lightweight consumers (e.g. Wright worker installing openadapt-evals) that don't need local model training/inference. Users who need training can install with: pip install openadapt-ml[training]

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • style: fix ruff formatting in qwen_vl.py

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • Use ResetConfig for RLEnvironment.reset() in validation script (#56, 942a2f3)

The validation script called env.reset(task_id=...) but the actual API is env.reset(config=ResetConfig(task_id=...)). This caused Phase 2 to fail with TypeError.

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

Chores

Documentation

  • Add LoRA-per-task design document (#54, d6d63b6)

Literature-backed design for task-specific LoRA adapters with runtime routing. Covers architecture, training pipeline, data collection (including correction flywheel as training data source), update economics, and validation plan. Positioned as one experiment track within the broader OpenAdapt experimentation framework.

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

Features

  • Add GRPO validation infrastructure and LoRA checkpoint support (#55, 1b8ae78)
  • feat: add evaluate_url, lora_checkpoint, validation script, and CLI for GRPO training
  • Add evaluate_url field to GRPOConfig for separate evaluate endpoint - Add lora_checkpoint field to resume GRPO from existing SFT LoRA adapter - Pass evaluate_url through rollout collector to WAALiveConfig - Load existing LoRA via PeftModel.from_pretrained() when lora_checkpoint set - Update verl_backend.py error message with actionable instructions - Add 5-phase validation script (connectivity → rollout → inference → train → multi-step) - Add CLI entry point (scripts/run_grpo.py) for running GRPO without writing Python

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

  • style: fix ruff formatting in config and validation script

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com


Detailed Changes: v0.14.1...v0.15.0

v0.14.1

04 Mar 04:33

Choose a tag to compare

v0.14.1 (2026-03-04)

Bug Fixes

  • Lower PyTorch minimum to 2.8.0 for vLLM compatibility (#53, c0bc069)

vLLM 0.11.0 pins torch==2.8.0. The GPU E2E validation (openadapt-evals PR #87) confirmed the full ML stack works with PyTorch 2.8.0+cu128. The previous >=2.9.1 constraint prevented installing openadapt-ml alongside vLLM in the same environment.

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.14.0...v0.14.1

v0.14.0

04 Mar 02:22

Choose a tag to compare

v0.14.0 (2026-03-04)

Features

  • Add dual training backend support (standalone + verl-agent) (#51, 4419b21)
  • feat: add dual training backend support (standalone + verl-agent)

Add backend field to GRPOConfig ("standalone" or "verl") to support switching between training backends:

  • standalone: existing trainer.py (single-GPU, episode-level rewards)
  • verl: verl-agent/VAGEN integration (multi-GPU, GiGPO per-step credit)

New verl_backend.py provides build_vagen_config() to map GRPOConfig to VAGEN-compatible config, and train_with_verl() as the integration point (placeholder until full end-to-end is wired up).

No existing function signatures or behavior modified.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • style: format verl_backend.py with ruff

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.13.0...v0.14.0

v0.13.0

03 Mar 23:29

Choose a tag to compare

v0.13.0 (2026-03-03)

Features


Detailed Changes: v0.12.0...v0.13.0

v0.12.0

03 Mar 02:06

Choose a tag to compare

v0.12.0 (2026-03-03)

Features

  • Add GRPO training module with minimal TRL bridge (#34, 339e5d3)
  • docs: add experimental roadmap and evidence context to vision
  • Add 2x2 experimental matrix (retrieval × fine-tuning) to Core Thesis - Add evidence context to benchmark table: note it's an internal synthetic benchmark (~3 UI elements) that validates the pipeline, not real-world performance. Link to openadapt-evals for ongoing WAA/OSWorld evaluation.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • fix: use 46.7% consistently in 2x2 matrix

Was showing 33-47% range which conflated preliminary (n=3) and full (n=45) results. The validated number is 46.7%.

  • feat: add GRPO training module for online RL

Add openadapt_ml/training/grpo/ package with: - GRPOConfig for training hyperparameters - GRPORolloutCollector connecting to openadapt-evals RLEnvironment - GRPOTrainer implementing custom GRPO loop for multimodal VLMs - Binary reward function and group-relative advantage computation - Chain-of-thought warm-up pipeline for SFT pre-training - 20 unit tests passing without GPU

  • fix: address review findings in GRPO module
  • Replace copy.deepcopy(model) with LoRA state dict snapshot (prevents OOM) - Mark _compute_rollout_loss as scaffold with dummy forward pass for grad flow - Fix collect_rollout call to match RLEnvironment API (task_id in signature) - Add model.eval()/model.train() toggling around rollout/training phases - Remove unused gradient_accumulation_steps config field - Use actual screen_size from RLEnvironment instead of hardcoded 1920x1200 - Clamp CLICK coordinates to [0.0, 1.0] to prevent invalid pixel values - Validate task_ids non-empty at start of train() - Export CoT warmup functions from package init - Add BenchmarkAction fallback when openadapt-evals not installed - Add 9 new tests: action parser (8) + empty task_ids validation (1) - All 29 tests passing
  • feat: implement GRPO loss computation and fix cot_warmup dependency

Implement the core _compute_rollout_loss method that was previously a NotImplementedError scaffold. The implementation:

  • Reconstructs VLM prompts from rollout observations - Formats actions back to DSL text via new _format_action_as_text helper - Computes log-probabilities of action tokens under current policy - Computes reference policy log-probs via PEFT disable_adapter() with fallback to manual LoRA weight swapping - Returns GRPO loss: -advantage * log_prob + kl_coef * KL penalty

Also adds get_api_adapter() factory function to api_adapter.py, fixing the broken import in cot_warmup.py's generate_cot_annotations().

Additional review fixes from prior session: - Initialize _is_unsloth and _ref_lora_state in init - Remove dead else branch for task_id selection - Fix total_loss device placement - LoRA-only fallback save in checkpoint - TYPE regex accepts single quotes - Coordinate clamping in _parse_vlm_output_to_action

40 tests passing (10 new: 8 format_action + 1 roundtrip + 1 api_adapter).

  • refactor: deduplicate GRPO prompts via shared _build_agent_messages

Extract prompt construction into _build_agent_messages() which imports SYSTEM_PROMPT from next_action.py (the SFT training prompt). This ensures the GRPO agent uses the same prompt distribution the model was warm-started on, and guarantees _make_agent_fn and _compute_rollout_loss use identical prompts (critical for correct log-prob computation).

  • fix(grpo): address critical review findings in GRPO loss computation
  • C-01: Store raw model output on action._grpo_raw_text for accurate loss - C-02: Separate tokenization of prompt/action with concatenation to fix BPE boundary alignment - I-01: Prefer LoRA weight swapping over disable_adapter() for reference policy (captures initial LoRA state after SFT warm-start) - I-03: Per-step gradient accumulation via immediate backward() to prevent OOM from building computation graph over all rollout steps - I-04: Fix unescape order in TYPE parser (backslash before quotes) - M-03: Pass model_name through get_api_adapter to ApiVLMAdapter - M-07: Case-insensitive CLICK/TYPE regex in _parse_vlm_output_to_action - L-01: Extract DEFAULT_SCREEN_SIZE constant, replace all hardcoded values
  • fix(grpo): fix instruction propagation, screen size, weight swap safety
  • CR-01: Task instruction was never populated during GRPO rollouts. WAALiveAdapter._get_observation() does not populate raw_observation, so the agent prompt said "Goal: " with nothing after it. Fix: store instruction on Rollout dataclass (populated from env._current_task in collector), use it in both agent_fn and _compute_rollout_loss. - IM-01: Change DEFAULT_SCREEN_SIZE from 1920x1200 to 1920x1080 for consistency with baselines module and standard VM configurations. Add screen_size field to GRPOConfig so it is configurable. - IM-02: Add try/finally around LoRA weight swap in _compute_ref_log_probs. Without this, an exception during the reference forward pass permanently corrupts the model state.
  • fix(grpo): remove unused torch import in _setup_model

The import torch at line 121 was flagged by ruff (F401) as unused. The surrounding code only calls .detach().clone() on tensor objects, which does not require the torch module directly.

  • style(grpo): apply ruff formatting to GRPO module files

Run ruff format on cot_warmup.py, rollout_collector.py, and trainer.py to satisfy the CI ruff formatter check.

  • refactor(grpo): replace custom trainer with minimal TRL bridge

Replace 809-line custom GRPO trainer with ~280 lines that: - Use standard HuggingFace AutoModelForVision2Seq + AutoProcessor + PEFT LoraConfig instead of Unsloth monkey-patching - Implement standalone GRPO loss in ~15 lines of PyTorch (clipped surrogate) instead of custom policy gradient + KL penalty - Use beta=0.0 (no KL penalty, no reference model) per DAPO/Open- Reasoner-Zero literature, eliminating weight-swap complexity - Keep per-step backward to avoid OOM on long trajectories - Use standard model.save_pretrained() for checkpointing - Document WHY standalone GRPO math vs TRL GRPOTrainer (VLM multi-turn image pixel_values not stored in token IDs) and WHEN to switch

Preserves all public API: GRPOTrainer, _parse_vlm_output_to_action, _format_action_as_text, _build_agent_messages, DEFAULT_SCREEN_SIZE. All 50 tests pass (44 existing + 6 new for grpo_loss and trainer internals).

  • feat(grpo): add E2E tests with artifact generation and architecture docs
  • tests/test_grpo_e2e.py: 5 E2E tests (training loop, rollout collection, loss convergence, weight diff, mathematical properties) using tiny mock VLM. Produces 65+ artifacts (JSON traces, PNGs, checkpoints, summaries). - scripts/grpo_e2e_report.py: CLI report generator for test artifacts (text + optional HTML output). - docs/grpo_e2e_test_design.md: design rationale for E2E test approach - docs/grpo_architecture_analysis.md: analysis of custom vs TRL-based GRPO - docs/grpo_trl_rewrite_draft.py: TRL v0.29.0 integration research - docs/strategic_analysis_evals_ml_synergy.md: business/economics analysis
  • fix(grpo): address self-review findings (BUG-01, CLEAN-01 through -05)
  • Rename grpo_loss to policy_gradient_loss with honest docstring: single-epoch on-policy means ratio=1.0, clipping never fires, this is REINFORCE with group-relative advantages. Keep grpo_loss as backwards-compatible alias. - Add public aliases: parse_vlm_output_to_action, format_action_as_text (drop underscore prefix for public API) - Export policy_gradient_loss and public functions from init.py - Remove unused config fields: kl_coef (was 0.01 but never used with beta=0), max_seq_length (never referenced) - Fix model_name default: Qwen/Qwen2.5-VL-7B-Instruct (not unsloth variant) - Fix trivial test assertion: grad_norm > 0 (was >= 0, always true) - Update loss tests to verify gradient direction, not just loss sign - Add test_public_api_exports for new public names

56 tests pass (51 unit + 5 E2E).


Co-authored-by: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.11.2...v0.12.0

v0.11.2

25 Feb 21:12

Choose a tag to compare

v0.11.2 (2026-02-25)

Bug Fixes

  • docs: Require conventional commit format for PR titles (#32, 303f54f)

PR titles become squash merge commit messages. Without the fix:/feat: prefix, python-semantic-release skips the release. Document this requirement prominently in CLAUDE.md.

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

Documentation

  • docs: add mandatory branch/PR rule to CLAUDE.md

Adds explicit instruction that all changes must go through feature branches and pull requests. enforce_admins has been enabled on GitHub to prevent admin bypass of branch protection.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

  • fix(modal): remove unused os import

Fixes ruff F401 lint error on modal_cloud.py.


Co-authored-by: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.11.1...v0.11.2

v0.11.1

24 Feb 20:18

Choose a tag to compare

v0.11.1 (2026-02-24)

Bug Fixes

  • modal: Fix inference container image and multi-modal message handling (88e4c09)

  • Pin transformers==4.57.3 (matches local, has Qwen3-VL support)

  • Add torchvision dependency (required by AutoVideoProcessor)

  • Add fallback: AutoModelForVision2Seq -> Qwen2_5_VLForConditionalGeneration

  • Add fallback: AutoProcessor -> Qwen2_5_VLProcessor

  • Reconstruct multi-modal messages with {"type": "image"} placeholders
    for proper vision token generation in apply_chat_template

  • Rename container_idle_timeout -> scaledown_window (Modal API update)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com


Detailed Changes: v0.11.0...v0.11.1