Add forecast compliance fixtures and wire AWF config source drift pre-check by Copilot · Pull Request #42107 · github/gh-aw

Copilot · 2026-06-28T16:28:27Z

Daily SPDD review identified three missing forecast compliance fixtures (blocking T-FC-022, T-FC-035, T-ET-006) and a gap in AWF config-source drift coverage in schema-consistency-checker. This PR adds the missing fixture artifacts, marks them as available in spec docs, and extends the workflow pre-agent analysis to fetch and compare canonical AWF config sources.

Forecast compliance fixture coverage
- Added:
  - specs/forecast-compliance-fixtures/run_summary_zero_et.json
  - specs/forecast-compliance-fixtures/run_summary_failed.json
  - specs/forecast-compliance-fixtures/run_summary_high_et.json
- Fixture values are aligned to required scenarios:
  - zero ET
  - failed conclusion
  - high ET boundary (≥ 1,000,000)
Spec fixture registry update
- Updated specs/forecast-compliance-fixtures/README.md from recommended to available fixtures for:
  - T-FC-022
  - T-FC-035
  - T-ET-006
Test-level fixture binding
- Added TestMonteCarloFixtureVariantsAreAvailable in pkg/cli/forecast_montecarlo_test.go to ensure these fixtures are loaded and validated by expected fields.
AWF config source drift check wiring
- Extended .github/workflows/schema-consistency-checker.md pre-agent steps to:
  - fetch docs/awf-config.schema.json and docs/awf-config-spec.md from github/gh-aw-firewall
  - emit AWF drift pre-check artifacts under /tmp/gh-aw/agent/
- Recompiled lockfile: .github/workflows/schema-consistency-checker.lock.yml

func TestMonteCarloFixtureVariantsAreAvailable(t *testing.T) {
	t.Run("zero ET fixture", func(t *testing.T) {
		fixture := loadFixture(t, "run_summary_zero_et.json")
		usage := fixture["token_usage_summary"].(map[string]any)
		assert.Zero(t, usage["total_effective_tokens"])
	})
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Adds missing forecast compliance fixture artifacts and wires a new AWF config-source drift pre-check into the schema-consistency-checker workflow so the agent has canonical AWF config sources available during analysis.

Changes:

Added three new forecast compliance fixture JSONs covering zero ET, failed conclusion, and high-ET boundary scenarios.
Updated the fixture registry doc and added a Go test that loads/validates the new fixture variants.
Extended schema-consistency-checker pre-agent steps (and regenerated the compiled lockfile) to fetch canonical AWF config sources from github/gh-aw-firewall.

Show a summary per file

File	Description
`specs/forecast-compliance-fixtures/run_summary_zero_et.json`	New fixture representing a successful run with `total_effective_tokens: 0`.
`specs/forecast-compliance-fixtures/run_summary_failed.json`	New fixture representing a failed run (`conclusion: "failure"`).
`specs/forecast-compliance-fixtures/run_summary_high_et.json`	New fixture representing a run with very high ET (≥ 1,000,000).
`specs/forecast-compliance-fixtures/README.md`	Marks additional fixtures as “Available” and lists the new artifacts + test IDs.
`pkg/cli/forecast_montecarlo_test.go`	Adds a test that loads the three new fixtures and asserts key expected fields/values.
`.github/workflows/schema-consistency-checker.md`	Adds AWF config source drift pre-check downloads + artifact emission in pre-agent steps.
`.github/workflows/schema-consistency-checker.lock.yml`	Recompiled workflow output reflecting the pre-agent step changes.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 7/7 changed files
Comments generated: 1
Review effort level: Low

+        gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \
+          --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json
+        gh api /repos/github/gh-aw-firewall/contents/docs/awf-config-spec.md \
+          --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config-spec.md


github-actions · 2026-06-28T17:07:08Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #42107 does not have the 'implementation' label and has only 27 new lines of code in business logic directories (threshold: 100).

github-actions · 2026-06-28T17:07:11Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-28T17:07:16Z

✅ Test Quality Sentinel completed test quality analysis.

Caution

agentic threat detected
Threat detection flagged this output in warn mode. Manual review is REQUIRED before any follow-up automation.

Details

Potential security threats were detected in the agent output.

Review the workflow run logs for details.

github-actions · 2026-06-28T17:07:35Z

⚠️ PR Code Quality Reviewer failed during code quality review.

github-actions

Review: Add forecast compliance fixtures and wire AWF config source drift pre-check

Summary: Good structural additions — the fixture registry, TestMonteCarloFixtureVariantsAreAvailable, and the AWF drift pre-check are all valuable. Three issues need attention before merge.

Blocking issues (3):

run_summary_zero_et.json — total_aic not zeroed (line 41)
total_aic is 0.0054 (copy-pasted from run_summary_minimal.json) but total_effective_tokens is 0. The formula across all fixtures is total_aic = total_effective_tokens / 1_000_000, so this must be 0.0. Future predicate tests on the AIC–ET invariant will fail.
schema-consistency-checker.md — missing pipefail (line 141)
The gh api ... | base64 -d > file pipelines run under set -e alone. If gh api fails, base64 -d succeeds on empty input and writes an empty JSON file; the step passes silently while awf-config-drift.json will report top_level_property_count: 0. Change set -e to set -eo pipefail in the pre-agent step.
run_summary_failed.json — failed_steps: 0 with conclusion: "failure" (line 19)
Semantically inconsistent — a failed conclusion with zero failed steps will mislead future tests checking both fields together. Either set failed_steps: 1 or document the intended semantic (e.g., infrastructure-level failure).

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 62.4 AIC · ⌖ 6.8 AIC · ⊞ 4.9K

github-actions · 2026-06-28T17:14:20Z

+    "total_response_bytes": 24680,
+    "cache_efficiency": 0.1778,
+    "total_effective_tokens": 0,
+    "total_aic": 0.0054,


Bug: total_aic is inconsistent with total_effective_tokens: 0.

The formula in other fixtures is total_aic = total_effective_tokens / 1_000_000 (e.g. run_summary_minimal.json: 5400 ET → 0.0054; run_summary_high_et.json: 1000001 ET → 1.000001). When total_effective_tokens is 0, total_aic must also be 0.0, not 0.0054.

This is a copy-paste artefact from run_summary_minimal.json. Any future predicate test validating the AIC–ET relationship will fail against this fixture.

Suggested change

"total_aic": 0.0054,

"total_aic": 0.0,

@copilot please address this.

github-actions · 2026-06-28T17:14:20Z

+      echo "=== AWF config source drift pre-check (gh-aw-firewall) ==="
+      if [ -n "${GH_TOKEN:-${GITHUB_TOKEN:-}}" ]; then
+        gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \
+          --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json


Reliability: missing pipefail causes silent failure on gh api errors.

The script starts with set -e (line 38), but this only exits if the last command in a pipeline fails. In:

gh api .../awf-config.schema.json --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json

If gh api fails (missing token permission, 404, rate limit), base64 -d reads empty input, succeeds, and writes an empty file. The step appears to pass while /tmp/gh-aw/agent/awf-config.schema.json is empty. The subsequent jq -r '.properties | keys[]' call will then silently produce no output or fail, and awf-config-drift.json will show top_level_property_count: 0 with no indication of the upstream failure.

Add set -o pipefail (or change the shebang) at the top of the pre-agent step script, or guard the gh api calls individually:

set -eo pipefail

@copilot please address this.

github-actions · 2026-06-28T17:14:20Z

+  },
+  "metrics": {
+    "total_steps": 5,
+    "failed_steps": 0,


Data inconsistency: failed_steps: 0 in a fixture with conclusion: "failure".

The run object has "conclusion": "failure" (line 12) but metrics.failed_steps is 0. Any future compliance test that checks both the run conclusion and the step-failure count will receive contradictory signals from this fixture.

If the intent is to represent a workflow-level failure not caused by an individual step failure (e.g. timeout, infrastructure error), add a comment in the fixture or README clarifying this semantic. Otherwise, set failed_steps to a non-zero value consistent with a real failed run (e.g. 1).

Suggested change

"failed_steps": 0,

"failed_steps": 1,

@copilot please address this.

github-actions

Skills-Based Review 🧠

Applied /tdd, /zoom-out, and /grill-with-docs — requesting changes on two fixture data correctness issues and improvements to the AWF drift pre-check.

📋 Key Themes & Highlights

Key Themes

Fixture data inconsistencies (correctness): Two of the three new fixture files contain internal contradictions that would cause consumers to work with misleading test data:
- run_summary_zero_et.json: total_aic: 0.0054 with total_effective_tokens: 0 — AIC cannot be non-zero when no tokens were consumed. Copy-paste artefact.
- run_summary_failed.json: failed_steps: 0 with conclusion: "failure" — a failure conclusion requires at least one failed step.
Test coverage gap: TestMonteCarloFixtureVariantsAreAvailable only checks run.conclusion for the failed fixture, so the failed_steps inconsistency goes undetected.
AWF drift pre-check brittleness: gh api calls for the upstream gh-aw-firewall content have no fallback — a transient network error or permission gap will abort the entire pre-agent step.
Drift output completeness: awf-config-drift.json only records metadata counts, not comparable content, so downstream agents cannot detect actual schema drift from it.

Positive Highlights

✅ Clean fixture structure and consistent use of 0.0.0-test version for test artifacts
✅ Good use of require.True guards before type assertions — prevents panics on bad fixture shape
✅ The GH_TOKEN guard for the AWF drift section is a sensible no-op fallback for environments without the token
✅ Using run_id values with sequential offsets (12345679, 12345680, 12345681) makes fixtures easy to distinguish in logs

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 77.3 AIC · ⌖ 7.93 AIC · ⊞ 6.6K

github-actions · 2026-06-28T17:15:52Z

+    "total_response_bytes": 24680,
+    "cache_efficiency": 0.1778,
+    "total_effective_tokens": 0,
+    "total_aic": 0.0054,


[/tdd] Fixture data inconsistency: total_aic is 0.0054 but total_effective_tokens is 0 — AIC should be 0 when no effective tokens were consumed.

Given the formula aic = effective_tokens × 0.000001 (confirmed in run_summary_failed.json where 5400 ET → 0.0054 AIC), a zero-ET fixture must have total_aic: 0. This looks like a copy-paste artefact from the non-zero fixtures.

💡 Suggested fix

"total_effective_tokens": 0, "total_aic": 0,

@copilot please address this.

github-actions · 2026-06-28T17:15:52Z

+  },
+  "metrics": {
+    "total_steps": 5,
+    "failed_steps": 0,


[/tdd] Fixture semantics mismatch: failed_steps: 0 contradicts conclusion: "failure". A run that concluded with failure should have at least one failed step — otherwise the fixture misrepresents a real failure scenario and tests depending on it may pass for the wrong reasons.

💡 Suggested fix

"metrics": { "total_steps": 5, "failed_steps": 1, "tool_calls": 12 },

@copilot please address this.

github-actions · 2026-06-28T17:15:52Z

+		fixture := loadFixture(t, "run_summary_failed.json")
+		run, ok := fixture["run"].(map[string]any)
+		require.True(t, ok)
+		assert.Equal(t, "failure", run["conclusion"])


[/tdd] Test only validates conclusion — the fixture inconsistency (failed_steps: 0 with conclusion: "failure") is not caught because failed_steps is never asserted. A spec-compliant test should verify the internal consistency of the fixture data it relies on.

💡 Suggested additional assertion

run, ok := fixture["run"].(map[string]any) require.True(t, ok) assert.Equal(t, "failure", run["conclusion"]) metrics, ok := fixture["metrics"].(map[string]any) require.True(t, ok) failedSteps, ok := metrics["failed_steps"].(float64) require.True(t, ok) assert.Greater(t, failedSteps, 0.0, "failed run should have at least one failed step")

@copilot please address this.

github-actions · 2026-06-28T17:15:52Z

+
+      echo "=== AWF config source drift pre-check (gh-aw-firewall) ==="
+      if [ -n "${GH_TOKEN:-${GITHUB_TOKEN:-}}" ]; then
+        gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \


[/zoom-out] The gh api calls for the AWF config files have no error resilience — under set -e a permissions error or transient network failure will abort the entire pre-agent step, blocking the checker run. This is a hard dependency on gh-aw-firewall read access.

Consider wrapping the block (or individual calls) so failures degrade gracefully instead of failing the whole step:

💡 Suggested approach

if gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json 2>/dev/null; then # ... continue with fetch + analysis else echo "⚠️ Unable to fetch awf-config.schema.json from gh-aw-firewall — skipping drift pre-check" fi

Or add a top-level || true guard on the entire conditional block so a fetch failure is non-fatal.

@copilot please address this.

github-actions · 2026-06-28T17:15:52Z

+            canonical_schema: $schema_path,
+            top_level_property_count: ($top_level_count | tonumber),
+            ghaw_reference_sample_count: ($refs_sample_count | tonumber)
+          }' > /tmp/gh-aw/agent/awf-config-drift.json


[/grill-with-docs] The output of this step (awf-config-drift.json) records only metadata (counts and file paths) — it does not contain any content that would allow the downstream agent to detect actual schema drift. The name "drift pre-check" implies comparison, but no diff or checksum of local vs. upstream content is produced.

If the intent is to give the agent material to reason about drift, consider including content hashes or the key lists extracted from both local and upstream schemas so real divergence is detectable.

💡 Suggested enhancement

Add the actual top-level keys from both the upstream schema and the local schema to the output:

LOCAL_KEYS=cache check-for-updates checkout concurrency container description emoji engine env environment experiments features github-app if import-schema imports inlined-imports jobs labels lsp max-ai-credits max-daily-ai-credits max-runs max-tool-denials max-turn-cache-misses max-turns mcp-scripts mcp-servers metadata models name network observability on permissions post-steps pre-agent-steps pre-steps private redirect resources run-name runs-on runs-on-slim runtimes safe-outputs sandbox secret-masking secrets services source steps strict timeout-minutes tools tracker-id user-rate-limit jq -n --arg local_keys "$LOCAL_KEYS" --arg top_level_count "..." '{ source_repo: "github/gh-aw-firewall", upstream_top_level_keys: ($upstream_keys | split("\n") | map(select(. != ""))), local_top_level_keys: ($local_keys | split("\n") | map(select(. != ""))) }' > /tmp/gh-aw/agent/awf-config-drift.json

This gives the agent a concrete list to diff rather than just counts.

@copilot please address this.

github-actions · 2026-06-28T17:16:29Z

Caution

agentic threat detected
Threat detection flagged this output in warn mode. Manual review is REQUIRED before any follow-up automation.

Details

Potential security threats were detected in the agent output.

Review the workflow run logs for details.

test connection

🧪 Test quality analysis by Test Quality Sentinel · 66.2 AIC · ⌖ 24.4 AIC · ⊞ 8.2K · ◷

github-actions

❌ Test Quality Sentinel: 50/100. 100% of new tests are classified as low-value implementation tests, exceeding the 30% threshold. All three subtests in TestMonteCarloFixtureVariantsAreAvailable verify fixture JSON structure only — no production function is exercised. Please review the flagged tests in the PR comment above and improve their behavioral coverage. Additionally, 7 testify assertion calls are missing required descriptive message arguments.

github-actions · 2026-06-28T17:28:03Z

Hey @pelikhan 👋 — this PR from the daily SPDD review does solid work unblocking T-FC-022, T-FC-035, and T-ET-006, and the test coverage for the fixture variants is a nice touch. One thing worth a look before merging:

Split the two concerns — The PR bundles two independent workstreams into a single change: (1) forecast compliance fixtures + Monte Carlo test binding (specs/forecast-compliance-fixtures/, pkg/cli/forecast_montecarlo_test.go) and (2) AWF config source drift check wiring (schema-consistency-checker.md + lockfile recompile). These are unrelated enough that a failure or rollback in the drift-check wiring shouldn't drag the fixture work with it. Splitting keeps review scope tight and makes bisecting easier.

If you'd like to have the agent handle the split, here's a ready-to-use prompt:

Split PR #42107 into two focused PRs:

1. **Forecast compliance fixtures** — include only:
   - specs/forecast-compliance-fixtures/run_summary_zero_et.json
   - specs/forecast-compliance-fixtures/run_summary_failed.json
   - specs/forecast-compliance-fixtures/run_summary_high_et.json
   - specs/forecast-compliance-fixtures/README.md (fixture registry update)
   - pkg/cli/forecast_montecarlo_test.go (TestMonteCarloFixtureVariantsAreAvailable)

2. **AWF config source drift pre-check wiring** — include only:
   - .github/workflows/schema-consistency-checker.md (pre-agent step additions)
   - .github/workflows/schema-consistency-checker.lock.yml (recompiled lockfile)

Base both PRs off main. Preserve the original PR body sections verbatim, split to match each PR's scope.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

patchdiff.githubusercontent.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "patchdiff.githubusercontent.com"

See Network Configuration for more information.

Generated by ✅ Contribution Check · 293.1 AIC · ⌖ 20.4 AIC · ⊞ 6K · ◷

Done

pelikhan · 2026-06-28T17:34:21Z

@copilot run pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Initial plan

b9a3987

Copilot AI assigned Copilot and pelikhan Jun 28, 2026

Copilot started work on behalf of pelikhan June 28, 2026 16:29 View session

Apply remaining changes

1f14c95

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update daily spec review findings for 2026-06-28~~ Add forecast compliance fixtures and wire AWF config source drift pre-check Jun 28, 2026

Copilot finished work on behalf of pelikhan June 28, 2026 16:48

Copilot AI requested a review from pelikhan June 28, 2026 16:48

pelikhan marked this pull request as ready for review June 28, 2026 16:53

Copilot AI review requested due to automatic review settings June 28, 2026 16:53

Copilot started reviewing on behalf of pelikhan June 28, 2026 16:54 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

pelikhan approved these changes Jun 28, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 28, 2026

[aw] No-Op Runs #39849

Open

github-actions Bot requested changes Jun 28, 2026

View reviewed changes

github-actions Bot previously requested changes Jun 28, 2026

View reviewed changes

github-actions Bot added the needs-review label Jun 28, 2026

This comment has been minimized.

Sign in to view

Copilot started work on behalf of pelikhan June 28, 2026 17:35 View session

Handle AWF canonical source fetch degradation

d8a32d3

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan June 28, 2026 17:49

Copilot AI requested a review from pelikhan June 28, 2026 17:49

pelikhan merged commit f0cf4df into main Jun 28, 2026

pelikhan deleted the copilot/spdd-daily-spec-review-2026-06-28 branch June 28, 2026 18:26

Uh oh!

Conversation

Copilot AI commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Review: Add forecast compliance fixtures and wire AWF config source drift pre-check

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 28, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Jun 28, 2026

Uh oh!

pelikhan commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading