Skip to content

Strengthen anomaly detector edge coverage and remove order-dependent test behavior#41916

Merged
pelikhan merged 5 commits into
mainfrom
copilot/testify-expert-improve-test-quality
Jun 27, 2026
Merged

Strengthen anomaly detector edge coverage and remove order-dependent test behavior#41916
pelikhan merged 5 commits into
mainfrom
copilot/testify-expert-improve-test-quality

Conversation

Copilot AI commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

pkg/agentdrain/anomaly_test.go was missing key boundary coverage around rareThreshold and had an order-dependent TestAnalyzeEvent structure that could hide failures under subtest shuffling. This update closes those gaps and pins the documented Analyze invariants.

  • Rare-cluster boundary coverage

    • Added TestAnomalyDetector_Analyze rows for:
      • cluster.Size == rareThreshold
      • cluster.Size == rareThreshold + 1
    • This explicitly protects the inclusive predicate cluster.Size <= rareThreshold.
  • Zero-threshold edge behavior

    • Added rareThreshold=0 cases for cluster.Size=0 and cluster.Size=1.
    • Confirms behavior is correct at the lower bound, not just constructor acceptance.
  • Deterministic AnalyzeEvent progression

    • Refactored TestAnalyzeEvent from shared-state subtests into a single sequential flow:
      1. first occurrence (new template)
      2. second identical occurrence (existing template)
      3. distinct event (new template)
    • Removes implicit t.Run ordering dependency.
  • Invariant pinning for Analyze flags and score

    • Added TestAnalyze_FlagMutualExclusivity to assert:
      • IsNewTemplate and LowSimilarity are never both true
      • AnomalyScore remains within [0,1]
      • nil-cluster behavior is explicit (RareCluster == false)
report := d.Analyze(tt.result, tt.isNew, tt.cluster)
assert.False(t, report.IsNewTemplate && report.LowSimilarity)
assert.GreaterOrEqual(t, report.AnomalyScore, 0.0)
assert.LessOrEqual(t, report.AnomalyScore, 1.0)

pr-sous-chef: requested branch refresh via update_branch.

Generated by 👨‍🍳 PR Sous Chef · 83 AIC · ⌖ 0.986 AIC · ⊞ 17.1K ·

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve test quality for anomaly_test.go Strengthen anomaly detector edge coverage and remove order-dependent test behavior Jun 27, 2026
Copilot AI requested a review from pelikhan June 27, 2026 18:49
@pelikhan pelikhan marked this pull request as ready for review June 27, 2026 18:50
Copilot AI review requested due to automatic review settings June 27, 2026 18:50
@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens test coverage for the pkg/agentdrain anomaly detection logic by adding explicit boundary/edge cases for rareThreshold and making TestAnalyzeEvent deterministic by removing order-dependent subtests.

Changes:

  • Added table-driven Analyze test cases covering cluster.Size == rareThreshold, cluster.Size == rareThreshold+1, and rareThreshold=0 boundaries.
  • Refactored TestAnalyzeEvent into a single sequential progression to avoid relying on t.Run ordering.
  • Added an invariant test (TestAnalyze_FlagMutualExclusivity) to pin mutual exclusivity of flags and ensure AnomalyScore stays within [0,1], including explicit nil-cluster behavior.
Show a summary per file
File Description
pkg/agentdrain/anomaly_test.go Adds boundary/edge coverage for Analyze, removes order-dependent subtests in AnalyzeEvent, and pins key Analyze invariants.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 0
  • Review effort level: Low

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd — good foundational coverage additions, with a few gaps worth addressing.

📋 Key Themes & Highlights

Key Themes

  • MatchResult assertions missing: resultFirst/Second/Distinct are nil-checked but their fields (ClusterID, Similarity) are never verified — a regression in match data would be invisible.
  • TestAnalyze_FlagMutualExclusivity is incomplete as a spec: no wantIsNew field, IsNewTemplate is only tested indirectly, and score assertions are too loose ([0,1] rather than exact values).
  • t.Run removal tradeoff: the flat sequential structure makes the state dependency explicit (good) but loses per-step test filtering and failure attribution. The stated reason ("subtest shuffling") only applies with go test -shuffle=on, which is non-default.

Positive Highlights

  • ✅ Excellent boundary commentary in each new table-driven case (e.g. // 2 <= 2 is true, // score = 0.3/2.0 = 0.15) — makes the predicate immediately verifiable.
  • ✅ Correct and complete rareThreshold boundary coverage: size == threshold, size == threshold+1, threshold=0 with both size=0 and size=1.
  • TestAnalyze_FlagMutualExclusivity is a valuable invariant test that didn't exist before; the nil-cluster case is especially useful.
  • TestBuildReason coverage of the "all flags set" case is well-commented re: mutual exclusivity.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 104.1 AIC · ⌖ 7.51 AIC · ⊞ 6.6K

// Step 1: first occurrence trains the template and is flagged as new.
resultFirst, reportFirst, errFirst := m.AnalyzeEvent(evtPlan)
require.NoError(t, errFirst, "AnalyzeEvent should not fail for first event")
require.NotNil(t, resultFirst, "AnalyzeEvent should return a non-nil result")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] resultFirst (and resultSecond, resultDistinct) are only checked for nil — the actual MatchResult content (ClusterID, Similarity, template tokens) is never asserted, leaving those fields unspecified.

💡 Suggested enhancement

Pinning result fields alongside report fields fully documents the expected state after each step:

require.NotNil(t, resultFirst)
assert.Equal(t, int64(1), resultFirst.ClusterID, "should match cluster 1")
assert.InDelta(t, 1.0, resultFirst.Similarity, 1e-9, "first event matches its own new template exactly")

This catches regressions where Analyze returns the correct report but AnalyzeEvent silently returns mismatched match data.

@copilot please address this.

assert.InDelta(t, 0.65, report.AnomalyScore, 1e-9, "AnomalyScore mismatch for first event")
assert.Equal(t, "new log template discovered; rare cluster (few observations)", report.Reason, "Reason mismatch for first event")
})
// Step 1: first occurrence trains the template and is flagged as new.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Removing t.Run loses the ability to run individual steps with go test -run TestAnalyzeEvent/step_name and weakens failure attribution in test output — a failure at Step 2 now silently skips Step 3 without naming the step.

💡 Tradeoffs and alternatives

The PR description frames this as preventing "subtest shuffling," but go test -shuffle=on is not the default — the real problem was shared miner state across subtests.

If the sequential coupling is intentional, consider a clarifying comment:

// NOTE: Steps share miner state intentionally — do not extract into parallel subtests
// or run with -shuffle=on.

Alternatively, keep t.Run but document the dependency:

t.Run("first occurrence is flagged as new template", func(t *testing.T) {
    // miner state starts empty
    result, report, err := m.AnalyzeEvent(evtPlan)
    ...
})
// Step 2 relies on Step 1 having trained the template
t.Run("second identical occurrence is not new", func(t *testing.T) {
    ...
})

This preserves per-step filtering (-run TestAnalyzeEvent/first) and makes the ordering contract visible.

@copilot please address this.

result *MatchResult
cluster *Cluster
wantLow bool
wantRare bool

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The test struct omits a wantIsNew bool field — IsNewTemplate is only verified indirectly via the mutual-exclusivity assert on line 460, not directly per test case.

💡 Suggested enhancement

Add wantIsNew bool to the struct and assert it explicitly in the loop, making the table fully self-documenting:

tests := []struct {
    name      string
    isNew     bool
    result    *MatchResult
    cluster   *Cluster
    wantIsNew bool  // add this
    wantLow   bool
    wantRare  bool
}{
    {
        name:      "new template remains exclusive from low similarity",
        isNew:     true,
        ...
        wantIsNew: true,
        wantLow:   false,
        wantRare:  true,
    },
    ...
}
// in loop:
assert.Equal(t, tt.wantIsNew, report.IsNewTemplate, "IsNewTemplate mismatch")

This catches regressions in the isNewIsNewTemplate pass-through independently of the mutual-exclusivity check.

@copilot please address this.

assert.Equal(t, tt.wantLow, report.LowSimilarity, "LowSimilarity mismatch")
assert.Equal(t, tt.wantRare, report.RareCluster, "RareCluster mismatch")
assert.False(t, report.IsNewTemplate && report.LowSimilarity, "IsNewTemplate and LowSimilarity must be mutually exclusive")
assert.GreaterOrEqual(t, report.AnomalyScore, 0.0, "AnomalyScore must stay within [0,1]")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The score assertions only check bounds [0, 1] — a silent change to the scoring weights would pass these tests without detection.

💡 Suggested enhancement

Add wantScore float64 per test case and use assert.InDelta (consistent with TestAnomalyDetector_Analyze):

// "new template remains exclusive from low similarity":
// isNew=true + rare (size 1 <= rareThreshold 2) → score = (1.0+0.3)/2.0 = 0.65
wantScore: 0.65,

// "existing template can have low similarity":
// lowSim + rare → score = (0.7+0.3)/2.0 = 0.50
wantScore: 0.50,

// "nil cluster, high similarity":
// no flags → score = 0.0
wantScore: 0.0,

Then in the loop:

assert.InDelta(t, tt.wantScore, report.AnomalyScore, 1e-9, "AnomalyScore mismatch")

@copilot please address this.

{
// rareThreshold=0 boundary: size=0 satisfies 0 <= 0.
// score = (rare-cluster weight 0.3) / maxScore 2.0 = 0.15
name: "zero rare threshold marks zero-sized cluster as rare",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The Size: 0 cluster case is a useful defensive test, but it leaves readers wondering whether a zero-sized cluster is a reachable production state or purely hypothetical.

💡 Suggestion

A brief comment clarifies the intent, e.g.:

// rareThreshold=0 boundary: size=0 satisfies 0 <= 0.
// Note: a cluster with Size=0 is not a normal production state (clusters grow when events are
// assigned to them), but the predicate is evaluated regardless — this test pins the boundary
// arithmetic, not production reachability.
name: "zero rare threshold marks zero-sized cluster as rare",

@copilot please address this.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: test brittleness in the refactored TestAnalyzeEvent flow

The boundary-coverage additions to TestAnomalyDetector_Analyze and the new TestAnalyze_FlagMutualExclusivity invariant test are solid additions. The TestAnalyzeEvent refactoring itself is directionally correct, but introduces two maintainability gaps documented in the inline comments.

Summary of findings

Medium — hardcoded score values couple to scoring-weight constants (line 393)
All three InDelta assertions bake in the current formula constants (1.0/0.3/2.0) without documenting the math. The table-driven tests above do document it inline — inconsistent standard.

Medium — step 2 score silently depends on DefaultConfig().RareClusterThreshold == 2 (line 402)
After the second AnalyzeEvent, the cluster has Size=2. Asserting AnomalyScore == 0.15 requires 2 <= rareThreshold, but rareThreshold comes opaquely from DefaultConfig(). Changing that default breaks the test with a confusing numeric mismatch rather than an obvious config-change signal.

Neither finding is a behavioral regression — documentation/maintainability concerns only.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • patchdiff.githubusercontent.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "patchdiff.githubusercontent.com"

See Network Configuration for more information.

🔎 Code quality review by PR Code Quality Reviewer · 99.1 AIC · ⌖ 7.25 AIC · ⊞ 5.2K

require.NotNil(t, resultFirst, "AnalyzeEvent should return a non-nil result")
require.NotNil(t, reportFirst, "AnalyzeEvent should return a non-nil report")
assert.True(t, reportFirst.IsNewTemplate, "IsNewTemplate mismatch for first event")
assert.InDelta(t, 0.65, reportFirst.AnomalyScore, 1e-9, "AnomalyScore mismatch for first event")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded scores white-box couple to scoring weights and DefaultConfig() internals — all three steps are fragile.

All three InDelta assertions (0.65, 0.15, 0.65) encode the production formula (weight_IsNew=1.0, weight_Rare=0.3) / maxScore=2.0 directly. If anyone retunes those weights, every assertion here fails with an opaque numeric mismatch that offers no diagnostic signal.

The table-driven tests above this function document their expected scores inline (e.g. // score = (1.0 + 0.3) / 2.0 = 0.65). This sequential flow has none of that context.

💡 Suggested fix

Add inline math comments tying each expected value to the formula:

// score = (weight_IsNewTemplate=1.0 + weight_RareCluster=0.3) / maxScore=2.0
assert.InDelta(t, 0.65, reportFirst.AnomalyScore, 1e-9, "AnomalyScore mismatch for first event")

When weights change, the reader immediately knows which constant to update rather than having to reverse-engineer 0.65.

require.NotNil(t, resultSecond, "AnalyzeEvent should return a non-nil result")
require.NotNil(t, reportSecond, "AnalyzeEvent should return a non-nil report")
assert.False(t, reportSecond.IsNewTemplate, "IsNewTemplate mismatch for second identical event")
assert.InDelta(t, 0.15, reportSecond.AnomalyScore, 1e-9, "AnomalyScore mismatch for second identical event")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 2's score (0.15) silently depends on DefaultConfig().RareClusterThreshold == 2 — an undocumented load-bearing assumption.

After the first AnalyzeEvent(evtPlan) call the cluster has Size=1; after the second it becomes Size=2. The assertion AnomalyScore == 0.15 (= RareCluster weight 0.3 / maxScore 2.0) requires cluster.Size(2) <= rareThreshold(2) to be true. That's currently satisfied because DefaultConfig().RareClusterThreshold = 2, but the test has zero documentation of this dependency.

If someone changes RareClusterThreshold to 1 (a perfectly plausible 'tighten rare detection' config change), the condition becomes 2 <= 1 = false, RareCluster flips to false, and the test fails with expected 0.15 got 0.0 — a confusing numeric mismatch with no obvious explanation.

💡 Suggested fix

Verify the config value you're relying on, and document why:

// DefaultConfig().RareClusterThreshold == 2, so Size=2 (after second insert) is still rare.
// score = weight_RareCluster=0.3 / maxScore=2.0 = 0.15
assert.InDelta(t, 0.15, reportSecond.AnomalyScore, 1e-9, "AnomalyScore mismatch for second identical event")

Or, if the intent is to test flag transitions rather than exact scoring, drop the numeric assertion and assert only RareCluster == true and IsNewTemplate == false.

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 100/100 — Excellent

Analyzed 7 test scenario(s) across 1 file: 7 design, 0 implementation, 0 guideline violation(s). TestAnalyzeEvent was refactored (order-dependency fix) with no new scenarios.

📊 Metrics & Test Classification (7 tests analyzed)
Metric Value
New/modified tests analyzed 7
✅ Design tests (behavioral contracts) 7 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 7 (100%)
Duplicate test clusters 0
Test inflation detected No
🚨 Coding-guideline violations 0 (Go mock libraries / missing build tags / no assertion messages)
Test File Classification Issues Detected
TestAnomalyDetector_Analyze — "cluster size exactly at rare threshold is rare" pkg/agentdrain/anomaly_test.go:~70 ✅ Design
TestAnomalyDetector_Analyze — "cluster size just above rare threshold is not rare" pkg/agentdrain/anomaly_test.go:~83 ✅ Design
TestAnomalyDetector_Analyze — "zero rare threshold marks zero-sized cluster as rare" pkg/agentdrain/anomaly_test.go:~96 ✅ Design
TestAnomalyDetector_Analyze — "zero rare threshold does not mark size one cluster as rare" pkg/agentdrain/anomaly_test.go:~109 ✅ Design
TestAnalyze_FlagMutualExclusivity — "new template remains exclusive from low similarity" pkg/agentdrain/anomaly_test.go:~428 ✅ Design
TestAnalyze_FlagMutualExclusivity — "existing template can have low similarity" pkg/agentdrain/anomaly_test.go:~436 ✅ Design
TestAnalyze_FlagMutualExclusivity — "existing template with high similarity and nil cluster" pkg/agentdrain/anomaly_test.go:~444 ✅ Design

Go: 1 (*_test.go); JavaScript: 0. Other languages detected but not scored.

Notes on TestAnalyzeEvent refactor: The 3 t.Run sub-tests were replaced with sequential inline assertions to eliminate order-dependent test behavior (sub-tests could previously be run in isolation without shared state, silently producing misleading pass results). No new scenarios were added; this is counted as a quality improvement.

Verdict

Check passed. 0% implementation tests (threshold: 30%). All 7 new scenarios are behavioral contracts: 4 cover precise boundary conditions on the rareThreshold ≤ comparison (exact, off-by-one, and zero-value edges), and 3 enforce the IsNewTemplate/LowSimilarity mutual-exclusivity invariant including a nil-cluster edge case. Every assertion carries a descriptive message; no mock libraries are used; //go:build !integration is present.

🧪 Test quality analysis by Test Quality Sentinel · 115.3 AIC · ⌖ 23.2 AIC · ⊞ 8.4K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 7 new scenarios enforce behavioral contracts with full edge case coverage.

@gh-aw-bot

Copy link
Copy Markdown
Collaborator

@copilot please run the pr-finisher skill, update this branch from main (it is behind by 1 commit), and rerun the checks. If any unresolved review feedback remains, please address it in the same pass.

Generated by 👨‍🍳 PR Sous Chef · 83 AIC · ⌖ 0.986 AIC · ⊞ 17.1K ·

Copilot AI and others added 2 commits June 27, 2026 19:26
…testify-expert-improve-test-quality

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants