[audit-workflows] 🔍 Agentic Workflow Audit — 2026-06-25 — ⚠️ DEGRADED: prod-main 81.0% (distributed 0-tok agent-startup fails) #41549

2026-06-25T22:10:46Z

github-actions[bot]
Bot Jun 25, 2026

🔍 Agentic Workflow Audit — 2026-06-25

Window: 2026-06-24 21:42Z → 2026-06-25 21:18Z (~23.5h, full coverage, 5 stitched batches)
Verdict: ⚠️ DEGRADED — 85.3% overall / 81.0% prod-main (worst prod-main since ~06-18). Unlike the 06-20 Skillet day, today's drop is distributed, not one workflow.

Metric	Value	vs 06-24
Total runs	382	↑ (342)
Success / Fail	326 / 56	—
Overall rate	85.3%	↓ 7.4pt (92.7%)
Prod-main rate	81.0%	↓ 11.4pt (92.4%)
Prod-main ex-2-intentional	81.8%	↓
AI credits	~24,173	—
Action minutes	~3,972	—
GitHub API calls	~4,245	—
Missing tools / data / MCP failures	0 / 0 / 0	✅ clean

Engines: copilot 227 · claude 64 · pi 43 · codex 23 · antigravity 6 · gemini 6.

📊 Trend Charts

Today's bar is the tallest of the window (382 runs) but carries a visibly larger red failure band, dropping the success line to 85.3% — the first dip below 90% since the 06-20 Skillet incident, and the lowest prod-main rate in a week. The preceding three days (94.5 / 92.5 / 92.7%) had established a healthy on-baseline, so this is a genuine break in trend rather than continuation.

Token data still ends at 06-20 — the token_usage artifact has been empty fleet-wide (TokenUsage=0) for ~6 consecutive days, so the 7-day moving average flatlines after the 06-12 spike (126M, the Daily Code Metrics outlier day). Cost visibility now relies solely on AI-credits (AIC), not token counts. This observability gap should be fixed — we cannot track per-run token efficiency.

🔴 Headline: distributed 0-tok agent-startup failures

32 of 40 prod-main failures (80%) are 0-tok / 0-turn agent-job failures — the agent job fails within minutes while pre_activation, activation, detection, and safe_outputs all stay green. This signature spans 4 engines: copilot (15), pi (7), claude (6), codex (2), null (2), across ~16 distinct workflows.

This is NOT a single incident or regression:

Spread evenly across the entire 24h window (1–5 per hour, no spike) — argues against a transient platform outage.
Spans three gh-aw versions (0.80.2, 1.0.65, 2.1.191) — argues against a single-version regression.

It reads as an aggregation of many independent agent-startup failures that happen to share a generic "agent died before output" signature, plus three genuinely-new per-workflow hotspots below. The key question is 06-26: revert to ~92% baseline ⇒ noisy day; same workflow set keeps failing ⇒ systemic agent-startup investigation needed.

🆕 New hotspots this window

Workflow	Engine / Model	Fail rate	Signature
Auto-Triage Issues	pi / `copilot/gpt-5.4` (experimental)	5/6	0-tok/0-turn agent:failure, 2–6min
Code Scanning Fixer	copilot	3/4	0-tok/0-turn agent:failure
Typist - Go Type Analysis	claude	2/2	0-tok/0-turn agent:failure

Auto-Triage Issues is the top single-workflow offender. Running on the experimental copilot/gpt-5.4 model; observability already flags model_downgrade_available (read-only triage doesn't need a frontier model). Likely a pi-agent-core driver or model-availability issue on gpt-5.4.
Code Scanning Fixer and Typist newly broke 0-tok this window; watch for recurrence.

💸 Failures that actually ran work (aic > 0)

Workflow	Engine	Duration	AIC	Class
Daily Safe Output Integrator	copilot	25.4m	85	copilot-sdk LONG-RUN (chronic)
Daily SPDD Spec Planner	copilot	24.4m	100	copilot-sdk LONG-RUN (chronic)
GitHub MCP Structural Analysis	claude	14.3m	222	single-occurrence
Code Simplifier	copilot	12.3m	145	single-occurrence
PR Sous Chef ×2	copilot	10–11m	48/77	safe-output / sdk-startup
Lockfile Statistics Analysis Agent	claude	6.8m	70	single-occurrence

The copilot-sdk LONG-RUN family (agent runs 24–25min then job fails) remains the dominant "ran-work" prod-main failure — day 23+ chronic, issue recurrence count 19.

🔁 Chronic offender status

Avenger — DID NOT RUN AGAIN. Pattern is now toggling: 06-18→21 chronic-fail → 06-22/23 absent → 06-24 resumed 2×-fail → 06-25 absent again. The err-config / no-structured-logs root cause is still unfixed whenever it does fire; it appears to be manually paused/unpaused rather than repaired.
Daily Issues Report Generator (chroot-node) — 1/1 fail, 5 consecutive days (06-21→25). High-ROI single-workflow chronic; node still not on PATH inside the AWF chroot.
copilot-sdk LONG-RUN — persists (SPDD + Safe Output Integrator, ~50min combined compute burned for no output).

🧪 Non-main (PR/dev) noise — 16 fails, expected

Concentrated on copilot/fix-firewall-configuration and sibling dev branches: Smoke Copilot AOAI (apikey/Entra) probes, Smoke Antigravity, Changeset Generator, Design Decision Gate ×3, PR Description Updater ×2. These are by-design smoke/gate probes on active dev branches (firewall + SDK-version work in flight) — normal dev churn, not fleet health signals.

✅ What's healthy

0 missing-tools, 0 missing-data, 0 MCP failures fleet-wide — the tooling/data plane is clean; this is purely an agent-execution problem.
No new shared incident with cross-run lineage.
PR Sous Chef only 2/23 (8.7%) despite being a known safe-output hotspot.

🎯 Recommended actions

HIGH — Pin Auto-Triage Issues off pi/gpt-5.4 to a stable model (gpt-4.1-mini / claude-haiku-4-5) until the experimental driver/model path is verified. (5/6 fail, read-only triage doesn't need a frontier model.)
HIGH — Re-audit 06-26 to classify the distributed 0-tok elevation as transient (reverts to ~92%) vs systemic (same wf set persists). If systemic, escalate to an agent-startup/launch-path investigation.
MEDIUM — Fix the token-usage artifact (TokenUsage=0 fleet-wide for 6 days) so per-run token efficiency is observable again.
MEDIUM — Resolve chroot-node for Daily Issues Report Generator (5-day chronic, single-workflow, clear fix: node on PATH inside chroot).
LOW — Decide Avenger's fate: either fix err-config root cause or formally disable; the on/off toggling produces recurring noise.

References:

§28194188254 — Auto-Triage Issues (pi/gpt-5.4 0-tok)
§28194579476 — Daily Safe Output Integrator (copilot-sdk longrun 25.4m)
§28181048821 — Daily Issues Report Generator (chroot-node day 5)

Generated by 🔍 Agentic Workflow Audit Agent · 376.4 AIC · ⌖ 45.8 AIC · ⊞ 7.2K · ◷

expires on Jun 26, 2026, 2:10 PM UTC-08:00

2026-06-26T22:59:29Z

github-actions[bot]
Bot Jun 26, 2026
Author

This discussion was automatically closed because it expired on 2026-06-26T22:10:46.430Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[audit-workflows] 🔍 Agentic Workflow Audit — 2026-06-25 — ⚠️ DEGRADED: prod-main 81.0% (distributed 0-tok agent-startup fails) #41549

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[audit-workflows] 🔍 Agentic Workflow Audit — 2026-06-25 — ⚠️ DEGRADED: prod-main 81.0% (distributed 0-tok agent-startup fails) #41549

Uh oh!

github-actions[bot] Bot Jun 25, 2026

🔍 Agentic Workflow Audit — 2026-06-25

📊 Trend Charts

🔴 Headline: distributed 0-tok agent-startup failures

✅ What's healthy

🎯 Recommended actions

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 26, 2026 Author

github-actions[bot]
Bot Jun 25, 2026

github-actions[bot]
Bot Jun 26, 2026
Author