Add forecast compliance fixtures and wire AWF config source drift pre-check#42107
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds missing forecast compliance fixture artifacts and wires a new AWF config-source drift pre-check into the schema-consistency-checker workflow so the agent has canonical AWF config sources available during analysis.
Changes:
- Added three new forecast compliance fixture JSONs covering zero ET, failed conclusion, and high-ET boundary scenarios.
- Updated the fixture registry doc and added a Go test that loads/validates the new fixture variants.
- Extended
schema-consistency-checkerpre-agent steps (and regenerated the compiled lockfile) to fetch canonical AWF config sources fromgithub/gh-aw-firewall.
Show a summary per file
| File | Description |
|---|---|
specs/forecast-compliance-fixtures/run_summary_zero_et.json |
New fixture representing a successful run with total_effective_tokens: 0. |
specs/forecast-compliance-fixtures/run_summary_failed.json |
New fixture representing a failed run (conclusion: "failure"). |
specs/forecast-compliance-fixtures/run_summary_high_et.json |
New fixture representing a run with very high ET (≥ 1,000,000). |
specs/forecast-compliance-fixtures/README.md |
Marks additional fixtures as “Available” and lists the new artifacts + test IDs. |
pkg/cli/forecast_montecarlo_test.go |
Adds a test that loads the three new fixtures and asserts key expected fields/values. |
.github/workflows/schema-consistency-checker.md |
Adds AWF config source drift pre-check downloads + artifact emission in pre-agent steps. |
.github/workflows/schema-consistency-checker.lock.yml |
Recompiled workflow output reflecting the pre-agent step changes. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 7/7 changed files
- Comments generated: 1
- Review effort level: Low
| gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \ | ||
| --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json | ||
| gh api /repos/github/gh-aw-firewall/contents/docs/awf-config-spec.md \ | ||
| --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config-spec.md |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #42107 does not have the 'implementation' label and has only 27 new lines of code in business logic directories (threshold: 100). |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Test Quality Sentinel completed test quality analysis. Caution agentic threat detected DetailsPotential security threats were detected in the agent output. Review the workflow run logs for details. |
|
|
There was a problem hiding this comment.
Review: Add forecast compliance fixtures and wire AWF config source drift pre-check
Summary: Good structural additions — the fixture registry, TestMonteCarloFixtureVariantsAreAvailable, and the AWF drift pre-check are all valuable. Three issues need attention before merge.
Blocking issues (3):
-
run_summary_zero_et.json—total_aicnot zeroed (line 41)
total_aicis0.0054(copy-pasted fromrun_summary_minimal.json) buttotal_effective_tokensis0. The formula across all fixtures istotal_aic = total_effective_tokens / 1_000_000, so this must be0.0. Future predicate tests on the AIC–ET invariant will fail. -
schema-consistency-checker.md— missingpipefail(line 141)
Thegh api ... | base64 -d > filepipelines run underset -ealone. Ifgh apifails,base64 -dsucceeds on empty input and writes an empty JSON file; the step passes silently whileawf-config-drift.jsonwill reporttop_level_property_count: 0. Changeset -etoset -eo pipefailin the pre-agent step. -
run_summary_failed.json—failed_steps: 0withconclusion: "failure"(line 19)
Semantically inconsistent — a failed conclusion with zero failed steps will mislead future tests checking both fields together. Either setfailed_steps: 1or document the intended semantic (e.g., infrastructure-level failure).
🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 62.4 AIC · ⌖ 6.8 AIC · ⊞ 4.9K
| "total_response_bytes": 24680, | ||
| "cache_efficiency": 0.1778, | ||
| "total_effective_tokens": 0, | ||
| "total_aic": 0.0054, |
There was a problem hiding this comment.
Bug: total_aic is inconsistent with total_effective_tokens: 0.
The formula in other fixtures is total_aic = total_effective_tokens / 1_000_000 (e.g. run_summary_minimal.json: 5400 ET → 0.0054; run_summary_high_et.json: 1000001 ET → 1.000001). When total_effective_tokens is 0, total_aic must also be 0.0, not 0.0054.
This is a copy-paste artefact from run_summary_minimal.json. Any future predicate test validating the AIC–ET relationship will fail against this fixture.
| "total_aic": 0.0054, | |
| "total_aic": 0.0, |
@copilot please address this.
| echo "=== AWF config source drift pre-check (gh-aw-firewall) ===" | ||
| if [ -n "${GH_TOKEN:-${GITHUB_TOKEN:-}}" ]; then | ||
| gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \ | ||
| --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json |
There was a problem hiding this comment.
Reliability: missing pipefail causes silent failure on gh api errors.
The script starts with set -e (line 38), but this only exits if the last command in a pipeline fails. In:
gh api .../awf-config.schema.json --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.jsonIf gh api fails (missing token permission, 404, rate limit), base64 -d reads empty input, succeeds, and writes an empty file. The step appears to pass while /tmp/gh-aw/agent/awf-config.schema.json is empty. The subsequent jq -r '.properties | keys[]' call will then silently produce no output or fail, and awf-config-drift.json will show top_level_property_count: 0 with no indication of the upstream failure.
Add set -o pipefail (or change the shebang) at the top of the pre-agent step script, or guard the gh api calls individually:
set -eo pipefail@copilot please address this.
| }, | ||
| "metrics": { | ||
| "total_steps": 5, | ||
| "failed_steps": 0, |
There was a problem hiding this comment.
Data inconsistency: failed_steps: 0 in a fixture with conclusion: "failure".
The run object has "conclusion": "failure" (line 12) but metrics.failed_steps is 0. Any future compliance test that checks both the run conclusion and the step-failure count will receive contradictory signals from this fixture.
If the intent is to represent a workflow-level failure not caused by an individual step failure (e.g. timeout, infrastructure error), add a comment in the fixture or README clarifying this semantic. Otherwise, set failed_steps to a non-zero value consistent with a real failed run (e.g. 1).
| "failed_steps": 0, | |
| "failed_steps": 1, |
@copilot please address this.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd, /zoom-out, and /grill-with-docs — requesting changes on two fixture data correctness issues and improvements to the AWF drift pre-check.
📋 Key Themes & Highlights
Key Themes
- Fixture data inconsistencies (correctness): Two of the three new fixture files contain internal contradictions that would cause consumers to work with misleading test data:
run_summary_zero_et.json:total_aic: 0.0054withtotal_effective_tokens: 0— AIC cannot be non-zero when no tokens were consumed. Copy-paste artefact.run_summary_failed.json:failed_steps: 0withconclusion: "failure"— a failure conclusion requires at least one failed step.
- Test coverage gap:
TestMonteCarloFixtureVariantsAreAvailableonly checksrun.conclusionfor the failed fixture, so thefailed_stepsinconsistency goes undetected. - AWF drift pre-check brittleness:
gh apicalls for the upstreamgh-aw-firewallcontent have no fallback — a transient network error or permission gap will abort the entire pre-agent step. - Drift output completeness:
awf-config-drift.jsononly records metadata counts, not comparable content, so downstream agents cannot detect actual schema drift from it.
Positive Highlights
- ✅ Clean fixture structure and consistent use of
0.0.0-testversion for test artifacts - ✅ Good use of
require.Trueguards before type assertions — prevents panics on bad fixture shape - ✅ The GH_TOKEN guard for the AWF drift section is a sensible no-op fallback for environments without the token
- ✅ Using
run_idvalues with sequential offsets (12345679, 12345680, 12345681) makes fixtures easy to distinguish in logs
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 77.3 AIC · ⌖ 7.93 AIC · ⊞ 6.6K
| "total_response_bytes": 24680, | ||
| "cache_efficiency": 0.1778, | ||
| "total_effective_tokens": 0, | ||
| "total_aic": 0.0054, |
There was a problem hiding this comment.
[/tdd] Fixture data inconsistency: total_aic is 0.0054 but total_effective_tokens is 0 — AIC should be 0 when no effective tokens were consumed.
Given the formula aic = effective_tokens × 0.000001 (confirmed in run_summary_failed.json where 5400 ET → 0.0054 AIC), a zero-ET fixture must have total_aic: 0. This looks like a copy-paste artefact from the non-zero fixtures.
💡 Suggested fix
"total_effective_tokens": 0,
"total_aic": 0,@copilot please address this.
| }, | ||
| "metrics": { | ||
| "total_steps": 5, | ||
| "failed_steps": 0, |
There was a problem hiding this comment.
[/tdd] Fixture semantics mismatch: failed_steps: 0 contradicts conclusion: "failure". A run that concluded with failure should have at least one failed step — otherwise the fixture misrepresents a real failure scenario and tests depending on it may pass for the wrong reasons.
💡 Suggested fix
"metrics": {
"total_steps": 5,
"failed_steps": 1,
"tool_calls": 12
},@copilot please address this.
| fixture := loadFixture(t, "run_summary_failed.json") | ||
| run, ok := fixture["run"].(map[string]any) | ||
| require.True(t, ok) | ||
| assert.Equal(t, "failure", run["conclusion"]) |
There was a problem hiding this comment.
[/tdd] Test only validates conclusion — the fixture inconsistency (failed_steps: 0 with conclusion: "failure") is not caught because failed_steps is never asserted. A spec-compliant test should verify the internal consistency of the fixture data it relies on.
💡 Suggested additional assertion
run, ok := fixture["run"].(map[string]any)
require.True(t, ok)
assert.Equal(t, "failure", run["conclusion"])
metrics, ok := fixture["metrics"].(map[string]any)
require.True(t, ok)
failedSteps, ok := metrics["failed_steps"].(float64)
require.True(t, ok)
assert.Greater(t, failedSteps, 0.0, "failed run should have at least one failed step")@copilot please address this.
|
|
||
| echo "=== AWF config source drift pre-check (gh-aw-firewall) ===" | ||
| if [ -n "${GH_TOKEN:-${GITHUB_TOKEN:-}}" ]; then | ||
| gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json \ |
There was a problem hiding this comment.
[/zoom-out] The gh api calls for the AWF config files have no error resilience — under set -e a permissions error or transient network failure will abort the entire pre-agent step, blocking the checker run. This is a hard dependency on gh-aw-firewall read access.
Consider wrapping the block (or individual calls) so failures degrade gracefully instead of failing the whole step:
💡 Suggested approach
if gh api /repos/github/gh-aw-firewall/contents/docs/awf-config.schema.json --jq '.content' | base64 -d > /tmp/gh-aw/agent/awf-config.schema.json 2>/dev/null; then
# ... continue with fetch + analysis
else
echo "⚠️ Unable to fetch awf-config.schema.json from gh-aw-firewall — skipping drift pre-check"
fiOr add a top-level || true guard on the entire conditional block so a fetch failure is non-fatal.
@copilot please address this.
| canonical_schema: $schema_path, | ||
| top_level_property_count: ($top_level_count | tonumber), | ||
| ghaw_reference_sample_count: ($refs_sample_count | tonumber) | ||
| }' > /tmp/gh-aw/agent/awf-config-drift.json |
There was a problem hiding this comment.
[/grill-with-docs] The output of this step (awf-config-drift.json) records only metadata (counts and file paths) — it does not contain any content that would allow the downstream agent to detect actual schema drift. The name "drift pre-check" implies comparison, but no diff or checksum of local vs. upstream content is produced.
If the intent is to give the agent material to reason about drift, consider including content hashes or the key lists extracted from both local and upstream schemas so real divergence is detectable.
💡 Suggested enhancement
Add the actual top-level keys from both the upstream schema and the local schema to the output:
LOCAL_KEYS=cache
check-for-updates
checkout
concurrency
container
description
emoji
engine
env
environment
experiments
features
github-app
if
import-schema
imports
inlined-imports
jobs
labels
lsp
max-ai-credits
max-daily-ai-credits
max-runs
max-tool-denials
max-turn-cache-misses
max-turns
mcp-scripts
mcp-servers
metadata
models
name
network
observability
on
permissions
post-steps
pre-agent-steps
pre-steps
private
redirect
resources
run-name
runs-on
runs-on-slim
runtimes
safe-outputs
sandbox
secret-masking
secrets
services
source
steps
strict
timeout-minutes
tools
tracker-id
user-rate-limit
jq -n --arg local_keys "$LOCAL_KEYS" --arg top_level_count "..." '{
source_repo: "github/gh-aw-firewall",
upstream_top_level_keys: ($upstream_keys | split("\n") | map(select(. != ""))),
local_top_level_keys: ($local_keys | split("\n") | map(select(. != "")))
}' > /tmp/gh-aw/agent/awf-config-drift.jsonThis gives the agent a concrete list to diff rather than just counts.
@copilot please address this.
|
Caution agentic threat detected DetailsPotential security threats were detected in the agent output. Review the workflow run logs for details. test connection
|
There was a problem hiding this comment.
❌ Test Quality Sentinel: 50/100. 100% of new tests are classified as low-value implementation tests, exceeding the 30% threshold. All three subtests in TestMonteCarloFixtureVariantsAreAvailable verify fixture JSON structure only — no production function is exercised. Please review the flagged tests in the PR comment above and improve their behavioral coverage. Additionally, 7 testify assertion calls are missing required descriptive message arguments.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Hey
If you'd like to have the agent handle the split, here's a ready-to-use prompt: Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "patchdiff.githubusercontent.com"See Network Configuration for more information.
|
|
@copilot run pr-finisher skill |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Daily SPDD review identified three missing forecast compliance fixtures (blocking T-FC-022, T-FC-035, T-ET-006) and a gap in AWF config-source drift coverage in
schema-consistency-checker. This PR adds the missing fixture artifacts, marks them as available in spec docs, and extends the workflow pre-agent analysis to fetch and compare canonical AWF config sources.Forecast compliance fixture coverage
specs/forecast-compliance-fixtures/run_summary_zero_et.jsonspecs/forecast-compliance-fixtures/run_summary_failed.jsonspecs/forecast-compliance-fixtures/run_summary_high_et.jsonSpec fixture registry update
specs/forecast-compliance-fixtures/README.mdfrom recommended to available fixtures for:Test-level fixture binding
TestMonteCarloFixtureVariantsAreAvailableinpkg/cli/forecast_montecarlo_test.goto ensure these fixtures are loaded and validated by expected fields.AWF config source drift check wiring
.github/workflows/schema-consistency-checker.mdpre-agent steps to:docs/awf-config.schema.jsonanddocs/awf-config-spec.mdfromgithub/gh-aw-firewall/tmp/gh-aw/agent/.github/workflows/schema-consistency-checker.lock.yml