fix: eliminate catastrophic regex backtracking in postprocess script#5491
Merged
Conversation
CODEX_PROXY_ENV_KEY_REGEX hung the postprocess script for minutes when it failed to match (the common case once the openai-proxy block exists but the env_key line is already stripped). Two ambiguities drove exponential backtracking: \\s also matches newlines, and [ \\t]+ is variable-length so ^[ \\t]+.* could split a single line many ways. Anchor each repeated line to exactly one leading space/tab (^[ \\t].*) so every physical line matches a single way, making the match linear. Add a cheap content.includes guard to skip the regex entirely on already-processed files. Script runtime drops from minutes to ~1.4s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (1 files)
Coverage comparison generated by |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR speeds up the CI smoke-workflow postprocessing script by preventing catastrophic regex backtracking when stripping a legacy env_key line from Codex/OpenAI proxy provider blocks, and by skipping the regex entirely when the target line is absent.
Changes:
- Reworks
CODEX_PROXY_ENV_KEY_REGEXto avoid ambiguous repeated subpatterns that caused exponential backtracking on non-matches. - Adds a cheap
content.includes('env_key = "OPENAI_API_KEY"')guard before running the regex.
Show a summary per file
| File | Description |
|---|---|
| scripts/ci/postprocess-smoke-workflows.ts | Makes env_key stripping linear-time and adds an includes() short-circuit to avoid unnecessary regex work. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 1
Comment on lines
841
to
+842
| const CODEX_PROXY_ENV_KEY_REGEX = | ||
| /(^\s+\[model_providers\.openai-proxy\]\n(?:^\s+.*\n)*?)^\s+env_key = "OPENAI_API_KEY"\n/m; | ||
| /(^[ \t]+\[model_providers\.openai-proxy\]\n(?:^[ \t].*\n)*?)^[ \t]+env_key = "OPENAI_API_KEY"\n/m; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
scripts/ci/postprocess-smoke-workflows.tshung for minutes on every run. Root cause:CODEX_PROXY_ENV_KEY_REGEXtriggered catastrophic backtracking whenever it failed to match — which is the common case once the[model_providers.openai-proxy]block exists but theenv_keyline has already been stripped (the normal state after the first post-process run).Two ambiguities drove the exponential backtracking:
\salso matches newlines, so^\s+.*\ncould consume line breaks two different ways.[ \t]+is variable-length, so^[ \t]+.*could split a single line's indent and body many ways.Fix
^[ \t].*) so every physical line matches a single way → linear matching.content.includes('env_key = "OPENAI_API_KEY"')short-circuit guard so the regex is skipped entirely on already-processed files.Result
Script runtime drops from minutes → ~1.4s.
tsc --noEmitclean; output (the env_key strip + xpia transforms) is unchanged on the codex lock files.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com