Strengthen anomaly detector edge coverage and remove order-dependent test behavior#41916
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Test Quality Sentinel completed test quality analysis. |
There was a problem hiding this comment.
Pull request overview
This PR strengthens test coverage for the pkg/agentdrain anomaly detection logic by adding explicit boundary/edge cases for rareThreshold and making TestAnalyzeEvent deterministic by removing order-dependent subtests.
Changes:
- Added table-driven
Analyzetest cases coveringcluster.Size == rareThreshold,cluster.Size == rareThreshold+1, andrareThreshold=0boundaries. - Refactored
TestAnalyzeEventinto a single sequential progression to avoid relying ont.Runordering. - Added an invariant test (
TestAnalyze_FlagMutualExclusivity) to pin mutual exclusivity of flags and ensureAnomalyScorestays within[0,1], including explicit nil-cluster behavior.
Show a summary per file
| File | Description |
|---|---|
| pkg/agentdrain/anomaly_test.go | Adds boundary/edge coverage for Analyze, removes order-dependent subtests in AnalyzeEvent, and pins key Analyze invariants. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 0
- Review effort level: Low
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd — good foundational coverage additions, with a few gaps worth addressing.
📋 Key Themes & Highlights
Key Themes
- MatchResult assertions missing:
resultFirst/Second/Distinctare nil-checked but their fields (ClusterID, Similarity) are never verified — a regression in match data would be invisible. TestAnalyze_FlagMutualExclusivityis incomplete as a spec: nowantIsNewfield,IsNewTemplateis only tested indirectly, and score assertions are too loose ([0,1]rather than exact values).t.Runremoval tradeoff: the flat sequential structure makes the state dependency explicit (good) but loses per-step test filtering and failure attribution. The stated reason ("subtest shuffling") only applies withgo test -shuffle=on, which is non-default.
Positive Highlights
- ✅ Excellent boundary commentary in each new table-driven case (e.g.
// 2 <= 2 is true,// score = 0.3/2.0 = 0.15) — makes the predicate immediately verifiable. - ✅ Correct and complete
rareThresholdboundary coverage:size == threshold,size == threshold+1,threshold=0with bothsize=0andsize=1. - ✅
TestAnalyze_FlagMutualExclusivityis a valuable invariant test that didn't exist before; the nil-cluster case is especially useful. - ✅
TestBuildReasoncoverage of the "all flags set" case is well-commented re: mutual exclusivity.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 104.1 AIC · ⌖ 7.51 AIC · ⊞ 6.6K
| // Step 1: first occurrence trains the template and is flagged as new. | ||
| resultFirst, reportFirst, errFirst := m.AnalyzeEvent(evtPlan) | ||
| require.NoError(t, errFirst, "AnalyzeEvent should not fail for first event") | ||
| require.NotNil(t, resultFirst, "AnalyzeEvent should return a non-nil result") |
There was a problem hiding this comment.
[/tdd] resultFirst (and resultSecond, resultDistinct) are only checked for nil — the actual MatchResult content (ClusterID, Similarity, template tokens) is never asserted, leaving those fields unspecified.
💡 Suggested enhancement
Pinning result fields alongside report fields fully documents the expected state after each step:
require.NotNil(t, resultFirst)
assert.Equal(t, int64(1), resultFirst.ClusterID, "should match cluster 1")
assert.InDelta(t, 1.0, resultFirst.Similarity, 1e-9, "first event matches its own new template exactly")This catches regressions where Analyze returns the correct report but AnalyzeEvent silently returns mismatched match data.
@copilot please address this.
| assert.InDelta(t, 0.65, report.AnomalyScore, 1e-9, "AnomalyScore mismatch for first event") | ||
| assert.Equal(t, "new log template discovered; rare cluster (few observations)", report.Reason, "Reason mismatch for first event") | ||
| }) | ||
| // Step 1: first occurrence trains the template and is flagged as new. |
There was a problem hiding this comment.
[/tdd] Removing t.Run loses the ability to run individual steps with go test -run TestAnalyzeEvent/step_name and weakens failure attribution in test output — a failure at Step 2 now silently skips Step 3 without naming the step.
💡 Tradeoffs and alternatives
The PR description frames this as preventing "subtest shuffling," but go test -shuffle=on is not the default — the real problem was shared miner state across subtests.
If the sequential coupling is intentional, consider a clarifying comment:
// NOTE: Steps share miner state intentionally — do not extract into parallel subtests
// or run with -shuffle=on.Alternatively, keep t.Run but document the dependency:
t.Run("first occurrence is flagged as new template", func(t *testing.T) {
// miner state starts empty
result, report, err := m.AnalyzeEvent(evtPlan)
...
})
// Step 2 relies on Step 1 having trained the template
t.Run("second identical occurrence is not new", func(t *testing.T) {
...
})This preserves per-step filtering (-run TestAnalyzeEvent/first) and makes the ordering contract visible.
@copilot please address this.
| result *MatchResult | ||
| cluster *Cluster | ||
| wantLow bool | ||
| wantRare bool |
There was a problem hiding this comment.
[/tdd] The test struct omits a wantIsNew bool field — IsNewTemplate is only verified indirectly via the mutual-exclusivity assert on line 460, not directly per test case.
💡 Suggested enhancement
Add wantIsNew bool to the struct and assert it explicitly in the loop, making the table fully self-documenting:
tests := []struct {
name string
isNew bool
result *MatchResult
cluster *Cluster
wantIsNew bool // add this
wantLow bool
wantRare bool
}{
{
name: "new template remains exclusive from low similarity",
isNew: true,
...
wantIsNew: true,
wantLow: false,
wantRare: true,
},
...
}
// in loop:
assert.Equal(t, tt.wantIsNew, report.IsNewTemplate, "IsNewTemplate mismatch")This catches regressions in the isNew → IsNewTemplate pass-through independently of the mutual-exclusivity check.
@copilot please address this.
| assert.Equal(t, tt.wantLow, report.LowSimilarity, "LowSimilarity mismatch") | ||
| assert.Equal(t, tt.wantRare, report.RareCluster, "RareCluster mismatch") | ||
| assert.False(t, report.IsNewTemplate && report.LowSimilarity, "IsNewTemplate and LowSimilarity must be mutually exclusive") | ||
| assert.GreaterOrEqual(t, report.AnomalyScore, 0.0, "AnomalyScore must stay within [0,1]") |
There was a problem hiding this comment.
[/tdd] The score assertions only check bounds [0, 1] — a silent change to the scoring weights would pass these tests without detection.
💡 Suggested enhancement
Add wantScore float64 per test case and use assert.InDelta (consistent with TestAnomalyDetector_Analyze):
// "new template remains exclusive from low similarity":
// isNew=true + rare (size 1 <= rareThreshold 2) → score = (1.0+0.3)/2.0 = 0.65
wantScore: 0.65,
// "existing template can have low similarity":
// lowSim + rare → score = (0.7+0.3)/2.0 = 0.50
wantScore: 0.50,
// "nil cluster, high similarity":
// no flags → score = 0.0
wantScore: 0.0,Then in the loop:
assert.InDelta(t, tt.wantScore, report.AnomalyScore, 1e-9, "AnomalyScore mismatch")@copilot please address this.
| { | ||
| // rareThreshold=0 boundary: size=0 satisfies 0 <= 0. | ||
| // score = (rare-cluster weight 0.3) / maxScore 2.0 = 0.15 | ||
| name: "zero rare threshold marks zero-sized cluster as rare", |
There was a problem hiding this comment.
[/tdd] The Size: 0 cluster case is a useful defensive test, but it leaves readers wondering whether a zero-sized cluster is a reachable production state or purely hypothetical.
💡 Suggestion
A brief comment clarifies the intent, e.g.:
// rareThreshold=0 boundary: size=0 satisfies 0 <= 0.
// Note: a cluster with Size=0 is not a normal production state (clusters grow when events are
// assigned to them), but the predicate is evaluated regardless — this test pins the boundary
// arithmetic, not production reachability.
name: "zero rare threshold marks zero-sized cluster as rare",@copilot please address this.
There was a problem hiding this comment.
Non-blocking: test brittleness in the refactored TestAnalyzeEvent flow
The boundary-coverage additions to TestAnomalyDetector_Analyze and the new TestAnalyze_FlagMutualExclusivity invariant test are solid additions. The TestAnalyzeEvent refactoring itself is directionally correct, but introduces two maintainability gaps documented in the inline comments.
Summary of findings
Medium — hardcoded score values couple to scoring-weight constants (line 393)
All three InDelta assertions bake in the current formula constants (1.0/0.3/2.0) without documenting the math. The table-driven tests above do document it inline — inconsistent standard.
Medium — step 2 score silently depends on DefaultConfig().RareClusterThreshold == 2 (line 402)
After the second AnalyzeEvent, the cluster has Size=2. Asserting AnomalyScore == 0.15 requires 2 <= rareThreshold, but rareThreshold comes opaquely from DefaultConfig(). Changing that default breaks the test with a confusing numeric mismatch rather than an obvious config-change signal.
Neither finding is a behavioral regression — documentation/maintainability concerns only.
Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
patchdiff.githubusercontent.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "patchdiff.githubusercontent.com"See Network Configuration for more information.
🔎 Code quality review by PR Code Quality Reviewer · 99.1 AIC · ⌖ 7.25 AIC · ⊞ 5.2K
| require.NotNil(t, resultFirst, "AnalyzeEvent should return a non-nil result") | ||
| require.NotNil(t, reportFirst, "AnalyzeEvent should return a non-nil report") | ||
| assert.True(t, reportFirst.IsNewTemplate, "IsNewTemplate mismatch for first event") | ||
| assert.InDelta(t, 0.65, reportFirst.AnomalyScore, 1e-9, "AnomalyScore mismatch for first event") |
There was a problem hiding this comment.
Hardcoded scores white-box couple to scoring weights and DefaultConfig() internals — all three steps are fragile.
All three InDelta assertions (0.65, 0.15, 0.65) encode the production formula (weight_IsNew=1.0, weight_Rare=0.3) / maxScore=2.0 directly. If anyone retunes those weights, every assertion here fails with an opaque numeric mismatch that offers no diagnostic signal.
The table-driven tests above this function document their expected scores inline (e.g. // score = (1.0 + 0.3) / 2.0 = 0.65). This sequential flow has none of that context.
💡 Suggested fix
Add inline math comments tying each expected value to the formula:
// score = (weight_IsNewTemplate=1.0 + weight_RareCluster=0.3) / maxScore=2.0
assert.InDelta(t, 0.65, reportFirst.AnomalyScore, 1e-9, "AnomalyScore mismatch for first event")When weights change, the reader immediately knows which constant to update rather than having to reverse-engineer 0.65.
| require.NotNil(t, resultSecond, "AnalyzeEvent should return a non-nil result") | ||
| require.NotNil(t, reportSecond, "AnalyzeEvent should return a non-nil report") | ||
| assert.False(t, reportSecond.IsNewTemplate, "IsNewTemplate mismatch for second identical event") | ||
| assert.InDelta(t, 0.15, reportSecond.AnomalyScore, 1e-9, "AnomalyScore mismatch for second identical event") |
There was a problem hiding this comment.
Step 2's score (0.15) silently depends on DefaultConfig().RareClusterThreshold == 2 — an undocumented load-bearing assumption.
After the first AnalyzeEvent(evtPlan) call the cluster has Size=1; after the second it becomes Size=2. The assertion AnomalyScore == 0.15 (= RareCluster weight 0.3 / maxScore 2.0) requires cluster.Size(2) <= rareThreshold(2) to be true. That's currently satisfied because DefaultConfig().RareClusterThreshold = 2, but the test has zero documentation of this dependency.
If someone changes RareClusterThreshold to 1 (a perfectly plausible 'tighten rare detection' config change), the condition becomes 2 <= 1 = false, RareCluster flips to false, and the test fails with expected 0.15 got 0.0 — a confusing numeric mismatch with no obvious explanation.
💡 Suggested fix
Verify the config value you're relying on, and document why:
// DefaultConfig().RareClusterThreshold == 2, so Size=2 (after second insert) is still rare.
// score = weight_RareCluster=0.3 / maxScore=2.0 = 0.15
assert.InDelta(t, 0.15, reportSecond.AnomalyScore, 1e-9, "AnomalyScore mismatch for second identical event")Or, if the intent is to test flag transitions rather than exact scoring, drop the numeric assertion and assert only RareCluster == true and IsNewTemplate == false.
🧪 Test Quality Sentinel Report✅ Test Quality Score: 100/100 — Excellent
📊 Metrics & Test Classification (7 tests analyzed)
Go: 1 ( Notes on Verdict
|
|
@copilot please run the
|
…testify-expert-improve-test-quality Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
pkg/agentdrain/anomaly_test.gowas missing key boundary coverage aroundrareThresholdand had an order-dependentTestAnalyzeEventstructure that could hide failures under subtest shuffling. This update closes those gaps and pins the documentedAnalyzeinvariants.Rare-cluster boundary coverage
TestAnomalyDetector_Analyzerows for:cluster.Size == rareThresholdcluster.Size == rareThreshold + 1cluster.Size <= rareThreshold.Zero-threshold edge behavior
rareThreshold=0cases forcluster.Size=0andcluster.Size=1.Deterministic
AnalyzeEventprogressionTestAnalyzeEventfrom shared-state subtests into a single sequential flow:t.Runordering dependency.Invariant pinning for
Analyzeflags and scoreTestAnalyze_FlagMutualExclusivityto assert:IsNewTemplateandLowSimilarityare never both trueAnomalyScoreremains within[0,1]RareCluster == false)pr-sous-chef: requested branch refresh via update_branch.