Skip to content

Pull requests: vercel-labs/agent-eval

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[judge] Agentic LLM judge: judge codebases and transcripts
#161 opened Jun 26, 2026 by gaojude Collaborator Loading…
[agents] Plugin model: per-agent definition + in-sandbox runner
#160 opened Jun 26, 2026 by gaojude Collaborator Loading…
fix: reuse cached results for single-experiment runs
#153 opened Jun 11, 2026 by benjamincanac Loading…
7 tasks done
feat: add max turns run budget
#135 opened May 15, 2026 by EfeDurmaz16 Loading…
fix(results): discover nested eval result dirs
#133 opened May 15, 2026 by EfeDurmaz16 Loading…
feat: add experiment trend analyzer
#132 opened May 14, 2026 by EfeDurmaz16 Loading…
fix: discover nested eval result directories
#131 opened May 14, 2026 by EfeDurmaz16 Loading…
feat: improve names of Docker sandbox containers
#128 opened May 8, 2026 by jonahsnider Loading…
fix: clean up Docker sandbox containers on exit
#127 opened May 8, 2026 by jonahsnider Loading…
Skip npm install when package.json is absent
#125 opened May 6, 2026 by allenzhou101 Contributor Loading…
[Kimi] Add Kimi CLI agent harness
#117 opened Apr 20, 2026 by gaojude Collaborator Draft
2 of 4 tasks
Skip missing validation scripts
#92 opened Mar 17, 2026 by gaojude Collaborator Loading…
[wip] add bub agent support
#91 opened Mar 7, 2026 by CorrectRoadH Draft
Add timings for phases
#88 opened Feb 25, 2026 by jeffsee55 Loading…
Add ability to choose which eval --smoke runs
#84 opened Feb 20, 2026 by jeffsee55 Loading…
ProTip! Mix and match filters to narrow down what you’re looking for.