[audit-workflows] π Agentic Workflow Audit β 2026-06-25 β β οΈ DEGRADED: prod-main 81.0% (distributed 0-tok agent-startup fails) #41549
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-06-26T22:10:46.430Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
π Agentic Workflow Audit β 2026-06-25
Window: 2026-06-24 21:42Z β 2026-06-25 21:18Z (~23.5h, full coverage, 5 stitched batches)β οΈ DEGRADED β 85.3% overall / 81.0% prod-main (worst prod-main since ~06-18). Unlike the 06-20 Skillet day, today's drop is distributed, not one workflow.
Verdict:
Engines: copilot 227 Β· claude 64 Β· pi 43 Β· codex 23 Β· antigravity 6 Β· gemini 6.
π Trend Charts
Today's bar is the tallest of the window (382 runs) but carries a visibly larger red failure band, dropping the success line to 85.3% β the first dip below 90% since the 06-20 Skillet incident, and the lowest prod-main rate in a week. The preceding three days (94.5 / 92.5 / 92.7%) had established a healthy on-baseline, so this is a genuine break in trend rather than continuation.
Token data still ends at 06-20 β the
token_usageartifact has been empty fleet-wide (TokenUsage=0) for ~6 consecutive days, so the 7-day moving average flatlines after the 06-12 spike (126M, the Daily Code Metrics outlier day). Cost visibility now relies solely on AI-credits (AIC), not token counts. This observability gap should be fixed β we cannot track per-run token efficiency.π΄ Headline: distributed 0-tok agent-startup failures
32 of 40 prod-main failures (80%) are 0-tok / 0-turn agent-job failures β the agent job fails within minutes while
pre_activation,activation,detection, andsafe_outputsall stay green. This signature spans 4 engines: copilot (15), pi (7), claude (6), codex (2), null (2), across ~16 distinct workflows.This is NOT a single incident or regression:
It reads as an aggregation of many independent agent-startup failures that happen to share a generic "agent died before output" signature, plus three genuinely-new per-workflow hotspots below. The key question is 06-26: revert to ~92% baseline β noisy day; same workflow set keeps failing β systemic agent-startup investigation needed.
π New hotspots this window
copilot/gpt-5.4(experimental)copilot/gpt-5.4model; observability already flagsmodel_downgrade_available(read-only triage doesn't need a frontier model). Likely a pi-agent-core driver or model-availability issue on gpt-5.4.πΈ Failures that actually ran work (aic > 0)
The copilot-sdk LONG-RUN family (agent runs 24β25min then job fails) remains the dominant "ran-work" prod-main failure β day 23+ chronic, issue recurrence count 19.
π Chronic offender status
err-config / no-structured-logsroot cause is still unfixed whenever it does fire; it appears to be manually paused/unpaused rather than repaired.π§ͺ Non-main (PR/dev) noise β 16 fails, expected
Concentrated on
copilot/fix-firewall-configurationand sibling dev branches: Smoke Copilot AOAI (apikey/Entra) probes, Smoke Antigravity, Changeset Generator, Design Decision Gate Γ3, PR Description Updater Γ2. These are by-design smoke/gate probes on active dev branches (firewall + SDK-version work in flight) β normal dev churn, not fleet health signals.β What's healthy
π― Recommended actions
pi/gpt-5.4to a stable model (gpt-4.1-mini / claude-haiku-4-5) until the experimental driver/model path is verified. (5/6 fail, read-only triage doesn't need a frontier model.)TokenUsage=0fleet-wide for 6 days) so per-run token efficiency is observable again.err-configroot cause or formally disable; the on/off toggling produces recurring noise.References:
Beta Was this translation helpful? Give feedback.
All reactions