Skip to content

Add forecast compliance fixtures and wire AWF config source drift pre-check#42107

Merged
pelikhan merged 3 commits into
mainfrom
copilot/spdd-daily-spec-review-2026-06-28
Jun 28, 2026
Merged

Add forecast compliance fixtures and wire AWF config source drift pre-check#42107
pelikhan merged 3 commits into
mainfrom
copilot/spdd-daily-spec-review-2026-06-28

Conversation

Copilot AI commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Daily SPDD review identified three missing forecast compliance fixtures (blocking T-FC-022, T-FC-035, T-ET-006) and a gap in AWF config-source drift coverage in schema-consistency-checker. This PR adds the missing fixture artifacts, marks them as available in spec docs, and extends the workflow pre-agent analysis to fetch and compare canonical AWF config sources.

  • Forecast compliance fixture coverage

    • Added:
      • specs/forecast-compliance-fixtures/run_summary_zero_et.json
      • specs/forecast-compliance-fixtures/run_summary_failed.json
      • specs/forecast-compliance-fixtures/run_summary_high_et.json
    • Fixture values are aligned to required scenarios:
      • zero ET
      • failed conclusion
      • high ET boundary (≥ 1,000,000)
  • Spec fixture registry update

    • Updated specs/forecast-compliance-fixtures/README.md from recommended to available fixtures for:
      • T-FC-022
      • T-FC-035
      • T-ET-006
  • Test-level fixture binding

    • Added TestMonteCarloFixtureVariantsAreAvailable in pkg/cli/forecast_montecarlo_test.go to ensure these fixtures are loaded and validated by expected fields.
  • AWF config source drift check wiring

    • Extended .github/workflows/schema-consistency-checker.md pre-agent steps to:
      • fetch docs/awf-config.schema.json and docs/awf-config-spec.md from github/gh-aw-firewall
      • emit AWF drift pre-check artifacts under /tmp/gh-aw/agent/
    • Recompiled lockfile: .github/workflows/schema-consistency-checker.lock.yml
func TestMonteCarloFixtureVariantsAreAvailable(t *testing.T) {
	t.Run("zero ET fixture", func(t *testing.T) {
		fixture := loadFixture(t, "run_summary_zero_et.json")
		usage := fixture["token_usage_summary"].(map[string]any)
		assert.Zero(t, usage["total_effective_tokens"])
	})
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Update daily spec review findings for 2026-06-28 Add forecast compliance fixtures and wire AWF config source drift pre-check Jun 28, 2026
Copilot AI requested a review from pelikhan June 28, 2026 16:48
@pelikhan pelikhan marked this pull request as ready for review June 28, 2026 16:53
Copilot AI review requested due to automatic review settings June 28, 2026 16:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing forecast compliance fixture artifacts and wires a new AWF config-source drift pre-check into the schema-consistency-checker workflow so the agent has canonical AWF config sources available during analysis.

Changes:

  • Added three new forecast compliance fixture JSONs covering zero ET, failed conclusion, and high-ET boundary scenarios.
  • Updated the fixture registry doc and added a Go test that loads/validates the new fixture variants.
  • Extended schema-consistency-checker pre-agent steps (and regenerated the compiled lockfile) to fetch canonical AWF config sources from github/gh-aw-firewall.
Show a summary per file
File Description
specs/forecast-compliance-fixtures/run_summary_zero_et.json New fixture representing a successful run with total_effective_tokens: 0.
specs/forecast-compliance-fixtures/run_summary_failed.json New fixture representing a failed run (conclusion: "failure").
specs/forecast-compliance-fixtures/run_summary_high_et.json New fixture representing a run with very high ET (≥ 1,000,000).
specs/forecast-compliance-fixtures/README.md Marks additional fixtures as “Available” and lists the new artifacts + test IDs.
pkg/cli/forecast_montecarlo_test.go Adds a test that loads the three new fixtures and asserts key expected fields/values.
.github/workflows/schema-consistency-checker.md Adds AWF config source drift pre-check downloads + artifact emission in pre-agent steps.
.github/workflows/schema-consistency-checker.lock.yml Recompiled workflow output reflecting the pre-agent step changes.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 7/7 changed files
  • Comments generated: 1
  • Review effort level: Low

Comment on lines +140 to +143
gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \
--jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json
gh api /repos/github/gh-aw-firewall/contents/docs/awf-config-spec.md \
--jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config-spec.md
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #42107 does not have the 'implementation' label and has only 27 new lines of code in business logic directories (threshold: 100).

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

Caution

agentic threat detected
Threat detection flagged this output in warn mode. Manual review is REQUIRED before any follow-up automation.

Details

Potential security threats were detected in the agent output.

Review the workflow run logs for details.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

⚠️ PR Code Quality Reviewer failed during code quality review.

@github-actions github-actions Bot mentioned this pull request Jun 28, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add forecast compliance fixtures and wire AWF config source drift pre-check

Summary: Good structural additions — the fixture registry, TestMonteCarloFixtureVariantsAreAvailable, and the AWF drift pre-check are all valuable. Three issues need attention before merge.

Blocking issues (3):

  1. run_summary_zero_et.jsontotal_aic not zeroed (line 41)
    total_aic is 0.0054 (copy-pasted from run_summary_minimal.json) but total_effective_tokens is 0. The formula across all fixtures is total_aic = total_effective_tokens / 1_000_000, so this must be 0.0. Future predicate tests on the AIC–ET invariant will fail.

  2. schema-consistency-checker.md — missing pipefail (line 141)
    The gh api ... | base64 -d > file pipelines run under set -e alone. If gh api fails, base64 -d succeeds on empty input and writes an empty JSON file; the step passes silently while awf-config-drift.json will report top_level_property_count: 0. Change set -e to set -eo pipefail in the pre-agent step.

  3. run_summary_failed.jsonfailed_steps: 0 with conclusion: "failure" (line 19)
    Semantically inconsistent — a failed conclusion with zero failed steps will mislead future tests checking both fields together. Either set failed_steps: 1 or document the intended semantic (e.g., infrastructure-level failure).

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 62.4 AIC · ⌖ 6.8 AIC · ⊞ 4.9K

"total_response_bytes": 24680,
"cache_efficiency": 0.1778,
"total_effective_tokens": 0,
"total_aic": 0.0054,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: total_aic is inconsistent with total_effective_tokens: 0.

The formula in other fixtures is total_aic = total_effective_tokens / 1_000_000 (e.g. run_summary_minimal.json: 5400 ET → 0.0054; run_summary_high_et.json: 1000001 ET → 1.000001). When total_effective_tokens is 0, total_aic must also be 0.0, not 0.0054.

This is a copy-paste artefact from run_summary_minimal.json. Any future predicate test validating the AIC–ET relationship will fail against this fixture.

Suggested change
"total_aic": 0.0054,
"total_aic": 0.0,

@copilot please address this.

echo "=== AWF config source drift pre-check (gh-aw-firewall) ==="
if [ -n "${GH_TOKEN:-${GITHUB_TOKEN:-}}" ]; then
gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \
--jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reliability: missing pipefail causes silent failure on gh api errors.

The script starts with set -e (line 38), but this only exits if the last command in a pipeline fails. In:

gh api .../awf-config.schema.json --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json

If gh api fails (missing token permission, 404, rate limit), base64 -d reads empty input, succeeds, and writes an empty file. The step appears to pass while /tmp/gh-aw/agent/awf-config.schema.json is empty. The subsequent jq -r '.properties | keys[]' call will then silently produce no output or fail, and awf-config-drift.json will show top_level_property_count: 0 with no indication of the upstream failure.

Add set -o pipefail (or change the shebang) at the top of the pre-agent step script, or guard the gh api calls individually:

set -eo pipefail

@copilot please address this.

},
"metrics": {
"total_steps": 5,
"failed_steps": 0,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data inconsistency: failed_steps: 0 in a fixture with conclusion: "failure".

The run object has "conclusion": "failure" (line 12) but metrics.failed_steps is 0. Any future compliance test that checks both the run conclusion and the step-failure count will receive contradictory signals from this fixture.

If the intent is to represent a workflow-level failure not caused by an individual step failure (e.g. timeout, infrastructure error), add a comment in the fixture or README clarifying this semantic. Otherwise, set failed_steps to a non-zero value consistent with a real failed run (e.g. 1).

Suggested change
"failed_steps": 0,
"failed_steps": 1,

@copilot please address this.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd, /zoom-out, and /grill-with-docs — requesting changes on two fixture data correctness issues and improvements to the AWF drift pre-check.

📋 Key Themes & Highlights

Key Themes

  • Fixture data inconsistencies (correctness): Two of the three new fixture files contain internal contradictions that would cause consumers to work with misleading test data:
    • run_summary_zero_et.json: total_aic: 0.0054 with total_effective_tokens: 0 — AIC cannot be non-zero when no tokens were consumed. Copy-paste artefact.
    • run_summary_failed.json: failed_steps: 0 with conclusion: "failure" — a failure conclusion requires at least one failed step.
  • Test coverage gap: TestMonteCarloFixtureVariantsAreAvailable only checks run.conclusion for the failed fixture, so the failed_steps inconsistency goes undetected.
  • AWF drift pre-check brittleness: gh api calls for the upstream gh-aw-firewall content have no fallback — a transient network error or permission gap will abort the entire pre-agent step.
  • Drift output completeness: awf-config-drift.json only records metadata counts, not comparable content, so downstream agents cannot detect actual schema drift from it.

Positive Highlights

  • ✅ Clean fixture structure and consistent use of 0.0.0-test version for test artifacts
  • ✅ Good use of require.True guards before type assertions — prevents panics on bad fixture shape
  • ✅ The GH_TOKEN guard for the AWF drift section is a sensible no-op fallback for environments without the token
  • ✅ Using run_id values with sequential offsets (12345679, 12345680, 12345681) makes fixtures easy to distinguish in logs

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 77.3 AIC · ⌖ 7.93 AIC · ⊞ 6.6K

"total_response_bytes": 24680,
"cache_efficiency": 0.1778,
"total_effective_tokens": 0,
"total_aic": 0.0054,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Fixture data inconsistency: total_aic is 0.0054 but total_effective_tokens is 0 — AIC should be 0 when no effective tokens were consumed.

Given the formula aic = effective_tokens × 0.000001 (confirmed in run_summary_failed.json where 5400 ET → 0.0054 AIC), a zero-ET fixture must have total_aic: 0. This looks like a copy-paste artefact from the non-zero fixtures.

💡 Suggested fix
"total_effective_tokens": 0,
"total_aic": 0,

@copilot please address this.

},
"metrics": {
"total_steps": 5,
"failed_steps": 0,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Fixture semantics mismatch: failed_steps: 0 contradicts conclusion: "failure". A run that concluded with failure should have at least one failed step — otherwise the fixture misrepresents a real failure scenario and tests depending on it may pass for the wrong reasons.

💡 Suggested fix
"metrics": {
  "total_steps": 5,
  "failed_steps": 1,
  "tool_calls": 12
},

@copilot please address this.

fixture := loadFixture(t, "run_summary_failed.json")
run, ok := fixture["run"].(map[string]any)
require.True(t, ok)
assert.Equal(t, "failure", run["conclusion"])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Test only validates conclusion — the fixture inconsistency (failed_steps: 0 with conclusion: "failure") is not caught because failed_steps is never asserted. A spec-compliant test should verify the internal consistency of the fixture data it relies on.

💡 Suggested additional assertion
run, ok := fixture["run"].(map[string]any)
require.True(t, ok)
assert.Equal(t, "failure", run["conclusion"])

metrics, ok := fixture["metrics"].(map[string]any)
require.True(t, ok)
failedSteps, ok := metrics["failed_steps"].(float64)
require.True(t, ok)
assert.Greater(t, failedSteps, 0.0, "failed run should have at least one failed step")

@copilot please address this.


echo "=== AWF config source drift pre-check (gh-aw-firewall) ==="
if [ -n "${GH_TOKEN:-${GITHUB_TOKEN:-}}" ]; then
gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The gh api calls for the AWF config files have no error resilience — under set -e a permissions error or transient network failure will abort the entire pre-agent step, blocking the checker run. This is a hard dependency on gh-aw-firewall read access.

Consider wrapping the block (or individual calls) so failures degrade gracefully instead of failing the whole step:

💡 Suggested approach
if gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json     --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json 2>/dev/null; then
  # ... continue with fetch + analysis
else
  echo "⚠️ Unable to fetch awf-config.schema.json from gh-aw-firewall — skipping drift pre-check"
fi

Or add a top-level || true guard on the entire conditional block so a fetch failure is non-fatal.

@copilot please address this.

canonical_schema: $schema_path,
top_level_property_count: ($top_level_count | tonumber),
ghaw_reference_sample_count: ($refs_sample_count | tonumber)
}' > /tmp/gh-aw/agent/awf-config-drift.json

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] The output of this step (awf-config-drift.json) records only metadata (counts and file paths) — it does not contain any content that would allow the downstream agent to detect actual schema drift. The name "drift pre-check" implies comparison, but no diff or checksum of local vs. upstream content is produced.

If the intent is to give the agent material to reason about drift, consider including content hashes or the key lists extracted from both local and upstream schemas so real divergence is detectable.

💡 Suggested enhancement

Add the actual top-level keys from both the upstream schema and the local schema to the output:

LOCAL_KEYS=cache
check-for-updates
checkout
concurrency
container
description
emoji
engine
env
environment
experiments
features
github-app
if
import-schema
imports
inlined-imports
jobs
labels
lsp
max-ai-credits
max-daily-ai-credits
max-runs
max-tool-denials
max-turn-cache-misses
max-turns
mcp-scripts
mcp-servers
metadata
models
name
network
observability
on
permissions
post-steps
pre-agent-steps
pre-steps
private
redirect
resources
run-name
runs-on
runs-on-slim
runtimes
safe-outputs
sandbox
secret-masking
secrets
services
source
steps
strict
timeout-minutes
tools
tracker-id
user-rate-limit
jq -n   --arg local_keys "$LOCAL_KEYS"   --arg top_level_count "..."   '{
    source_repo: "github/gh-aw-firewall",
    upstream_top_level_keys: ($upstream_keys | split("\n") | map(select(. != ""))),
    local_top_level_keys: ($local_keys | split("\n") | map(select(. != "")))
  }' > /tmp/gh-aw/agent/awf-config-drift.json

This gives the agent a concrete list to diff rather than just counts.

@copilot please address this.

@github-actions

Copy link
Copy Markdown
Contributor

Caution

agentic threat detected
Threat detection flagged this output in warn mode. Manual review is REQUIRED before any follow-up automation.

Details

Potential security threats were detected in the agent output.

Review the workflow run logs for details.

test connection

🧪 Test quality analysis by Test Quality Sentinel · 66.2 AIC · ⌖ 24.4 AIC · ⊞ 8.2K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Test Quality Sentinel: 50/100. 100% of new tests are classified as low-value implementation tests, exceeding the 30% threshold. All three subtests in TestMonteCarloFixtureVariantsAreAvailable verify fixture JSON structure only — no production function is exercised. Please review the flagged tests in the PR comment above and improve their behavioral coverage. Additionally, 7 testify assertion calls are missing required descriptive message arguments.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

Copy link
Copy Markdown
Contributor

Hey @pelikhan 👋 — this PR from the daily SPDD review does solid work unblocking T-FC-022, T-FC-035, and T-ET-006, and the test coverage for the fixture variants is a nice touch. One thing worth a look before merging:

  • Split the two concerns — The PR bundles two independent workstreams into a single change: (1) forecast compliance fixtures + Monte Carlo test binding (specs/forecast-compliance-fixtures/, pkg/cli/forecast_montecarlo_test.go) and (2) AWF config source drift check wiring (schema-consistency-checker.md + lockfile recompile). These are unrelated enough that a failure or rollback in the drift-check wiring shouldn't drag the fixture work with it. Splitting keeps review scope tight and makes bisecting easier.

If you'd like to have the agent handle the split, here's a ready-to-use prompt:

Split PR #42107 into two focused PRs:

1. **Forecast compliance fixtures** — include only:
   - specs/forecast-compliance-fixtures/run_summary_zero_et.json
   - specs/forecast-compliance-fixtures/run_summary_failed.json
   - specs/forecast-compliance-fixtures/run_summary_high_et.json
   - specs/forecast-compliance-fixtures/README.md (fixture registry update)
   - pkg/cli/forecast_montecarlo_test.go (TestMonteCarloFixtureVariantsAreAvailable)

2. **AWF config source drift pre-check wiring** — include only:
   - .github/workflows/schema-consistency-checker.md (pre-agent step additions)
   - .github/workflows/schema-consistency-checker.lock.yml (recompiled lockfile)

Base both PRs off main. Preserve the original PR body sections verbatim, split to match each PR's scope.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • patchdiff.githubusercontent.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "patchdiff.githubusercontent.com"

See Network Configuration for more information.

Generated by ✅ Contribution Check · 293.1 AIC · ⌖ 20.4 AIC · ⊞ 6K ·

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan June 28, 2026 17:49
@pelikhan pelikhan merged commit f0cf4df into main Jun 28, 2026
@pelikhan pelikhan deleted the copilot/spdd-daily-spec-review-2026-06-28 branch June 28, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants