Skip to content

docs: runner doctor update — A13, B5, B6 + portable agent A12 sync#5590

Merged
lpcox merged 5 commits into
mainfrom
copilot/update-runner-doctor-a13-b5-b6
Jun 27, 2026
Merged

docs: runner doctor update — A13, B5, B6 + portable agent A12 sync#5590
lpcox merged 5 commits into
mainfrom
copilot/update-runner-doctor-a13-b5-b6

Conversation

Copilot AI commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Weekly runner doctor scan (2026-06-19 → 2026-06-26) identified 3 new failure modes and a sync gap in the portable agent file.

New failure modes

  • A13chroot: failed to run command '/bin/sh' on a glibc/Debian DinD daemon: staging dirs (/tmp/gh-aw/{usr,bin,...}) are empty because stageBaseSystem() is not yet implemented. The "musl/Alpine" entrypoint warning is a red herring. Unresolved — workaround is baking binaries into the DinD daemon image.
  • B5EAI_AGAIN <awmg-cli-proxy> deadlock in --network-isolation + --topology-attach: connectTopologyContainers() runs after startContainers(), but the cli-proxy health gate blocks on the topology peer that hasn't been attached yet. Deterministic, not flaky. Fixed in AWF.
  • B6EACCES on upload-artifact after rootless (sudo: false) AWF runs: squid/cli-proxy/agent sidecars write files as non-runner UIDs; chmod -R a+rX silently fails at debug level. Fixed in AWF.

Files changed

  • shared/self-hosted-failure-modes.md — A13 row (Category A), B5/B6 rows (Category B), 3 new error-string lookup entries, A13 added to known-unresolved list
  • self-hosted-runner-doctor.md — §3 hint lines for A13/B5/B6; §4 expanded with A13 unresolved entry
  • .github/agents/self-hosted-runner-doctor.md (portable, self-contained) — synced missing A12 (mkdirat read-only fs, fixed in v0.27.10), added A13/B5/B6, added 4 missing error-string rows, updated §3 hints and §4 unresolved to match

Copilot AI changed the title [WIP] Update runner doctor with new failure modes A13, B5, B6 docs: runner doctor update — A13, B5, B6 + portable agent A12 sync Jun 26, 2026
Copilot finished work on behalf of lpcox June 26, 2026 22:14
Copilot AI requested a review from lpcox June 26, 2026 22:14
@lpcox lpcox marked this pull request as ready for review June 27, 2026 15:39
Copilot AI review requested due to automatic review settings June 27, 2026 15:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the self-hosted runner “doctor” documentation set to incorporate newly observed AWF failure modes from the latest weekly scan and to keep the portable agent copy aligned with the shared workflow guidance.

Changes:

  • Added new failure modes A13 (ARC/DinD split-fs empty base-userland), B5 (cli-proxy + topology attach startup deadlock), and B6 (rootless runs leaving unreadable artifacts) to the shared failure-mode catalog and runner-doctor docs.
  • Expanded error-string quick-lookup mappings and the “known unresolved” list to include A13.
  • Synced the portable .github/agents/self-hosted-runner-doctor.md catalog content to include A12 and the newly added A13/B5/B6 entries.
Show a summary per file
File Description
.github/workflows/shared/self-hosted-failure-modes.md Adds A13/B5/B6 failure-mode rows, new error-string lookups, and marks A13 as a known unresolved item.
.github/workflows/self-hosted-runner-doctor.md Updates triage hints and the known-unresolved section to include A13/B5/B6.
.github/agents/self-hosted-runner-doctor.md Syncs the portable runner-doctor catalog (adds A12 + A13/B5/B6 + missing lookup rows) to align with the shared sources.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 3
  • Review effort level: Low

Comment thread .github/agents/self-hosted-runner-doctor.md
Comment thread .github/workflows/shared/self-hosted-failure-modes.md Outdated
Comment thread .github/agents/self-hosted-runner-doctor.md Outdated
lpcox and others added 2 commits June 27, 2026 08:48
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

✅ Copilot review passed with no inline comments.

@copilot Add the ready-for-aw label to this PR to trigger agentic CI smoke tests.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

Copy link
Copy Markdown
Contributor

🚀 Security Guard has started processing this pull request

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

Smoke test completed. Overall status: FAIL due to connectivity issues. Label 'smoke-gemini' was not added.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Contribution Check failed. Please review the logs for details.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude passed

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

PR: docs: runner doctor update — A13, B5, B6 + portable agent A12 sync
Author: @Copilot | Assignees: @lpcox @Copilot

Test Result
GitHub MCP connectivity
GitHub.com HTTP ⚠️ (pre-step data not expanded)
File write/read ⚠️ (pre-step data not expanded)

Overall: PARTIAL PASS — MCP ✅, pre-step outputs unavailable (template vars not expanded in workflow)

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) Mode ✅ PASS

Tests:

  • ✅ GitHub MCP connectivity
  • ✅ File write/read verification
  • ✅ BYOK inference path (agent → api-proxy → api.githubcopilot.com)

Mode: Direct BYOK (COPILOT_PROVIDER_API_KEY via api-proxy sidecar)

Assignees: @Copilot @lpcox

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Claude Engine Validation

Check Result
API Status ✅ PASS
gh Check ✅ PASS
File Status ✅ PASS

Overall result: PASS

Generated by Smoke Claude for issue #5590 · 61.7 AIC · ⊞ 3.3K ·

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test: PAT Auth — PR #5590

Test Result
GitHub MCP (list_pull_requests)
GitHub.com HTTP ✅ 200
File write/read ⚠️ pre-step vars not substituted

Overall: PASS (2/2 verifiable tests passed; file test skipped — template vars unresolved)

Auth mode: PAT (COPILOT_GITHUB_TOKEN) | PR by @Copilot | Assignees: @lpcox, @Copilot

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

Merged PRs:

Results:

  • ✅ GitHub PR query
  • ✅ Playwright title
  • ✅ File write/read
  • ✅ Discussion comment
  • ✅ Build

Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

🔭 Smoke Test: API Proxy OpenTelemetry Tracing

# Scenario Result
1 Module Loading
2 Test Suite
3 Env Var Forwarding
4 Token Tracker Integration
5 OTEL Diagnostics

Details:

  • Scenario 1 — Module Loading: otel.js loads successfully. Exports: startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled, plus internal helpers (_provider, _parseEndpoints, _ProxyAwareOtlpExporter, _FileSpanExporter, _FanOutSpanExporter, _parseOtlpHeaders, _buildResourceSpans).

  • Scenario 2 — Test Suite: 59 tests passed, 0 failed across 2 suites (otel.test.js + otel-fanout.test.js). Covers span creation, token attributes (GenAI semantic conventions), parent context propagation, serialization, and all three exporter types.

  • Scenario 3 — Env Var Forwarding: src/services/api-proxy-env-config.ts (lines 116–123) uses pickEnvVars() to forward GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID to the api-proxy container; OTEL_SERVICE_NAME is always set (defaults to awf-api-proxy).

  • Scenario 4 — Token Tracker Integration: token-tracker-http.js line 324 invokes onUsage(normalized, model) — the hook point where OTEL setTokenAttributes is called to record gen_ai.usage.* span attributes.

  • Scenario 5 — OTEL Diagnostics: No live run in this workflow; fallback behavior is graceful — when no OTLP endpoint is configured, spans are written to /var/log/api-proxy/otel.jsonl (no errors, no export attempts).

All scenarios pass. ✅

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.17.0 v22.23.0
Go go1.22.12 go1.22.12

Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox @Copilot

Smoke test results:

  • GitHub MCP connectivity: ✅
  • GitHub.com connectivity: ✅
  • File write/read test: ✅
  • Direct BYOK inference: ✅

Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra

Overall: PASS

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color passed ✅ PASS
Go env passed ✅ PASS
Go uuid passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5590 · 50.1 AIC · ⊞ 7.8K ·

@github-actions

Copy link
Copy Markdown
Contributor

@Copilot @lpcox

  1. GitHub MCP: ✅
  2. GitHub.com Connectivity: ✅
  3. Agent File I/O: ✅
  4. BYOK Inference Path: ✅

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)

Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Services Connectivity

Check Result
Redis PING (host.docker.internal:6379) ❌ Timeout (no response)
PostgreSQL pg_isready (host.docker.internal:5432) ❌ No response
PostgreSQL SELECT 1 ❌ Not attempted (pg_isready failed)

Overall: FAIL

host.docker.internal resolves to 172.17.0.1 but TCP connections to both ports 6379 and 5432 timeout. Neither service is reachable from within the AWF agent container (172.30.0.20).

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Gemini Engine Validation

PR Titles:

  1. fix(test): sync doc-maintainer test with max-turns 15 + prompt rewrite
  2. perf(contribution-check): cut token/tool overhead per ⚡ Copilot Token Optimization2026-06-26 — Contribution Check #5558

Test Results:

  • GitHub MCP Testing: ✅
  • GitHub.com Connectivity: ❌
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@lpcox lpcox merged commit 7bbe639 into main Jun 27, 2026
79 of 80 checks passed
@lpcox lpcox deleted the copilot/update-runner-doctor-a13-b5-b6 branch June 27, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants