Skip to content

Commit 2e7a543

Browse files
committed
fix(attribution): Bug 8 — git -C <dir> scope hint flows to attribution
Reported alongside Bug 7 ("CRITICAL cross-project attribution"). Empirical repro found Bug 7 is NOT a bug — existing EVENT_PATH + bridge resolveProjectIdentity already attribute file_edit on /projB/foo.ts to projB even when cwd=projA. Bug 8 IS a bug: Bash invocations that scope via `-C <dir>` (git, make, tar) had no path signal in the canonical event data and fell back to inputProjectDir (= hook startup cwd), missing the user's intentional scoping. ROOT CAUSE extract.ts:extractGit only ran a regex pattern scan like /\bgit\s+commit\b/, which doesn't tolerate flags between `git` and the operation. So `git -C /projB status` returned NO git event AND no cwd hint. Even if a cwd event had been emitted, the fallbackAttribution path in project-attribution.ts:fallbackAttribution prefers context.inputProjectDir over context.lastKnownProjectDir — so the carry-forward from a previous cwd event in the same batch is silently overridden by the hook's startup cwd. FIX — TWO LAYERS, BOTH ALGORITHMIC (NO REGEX) src/session/extract.ts: new parseGitInvocation() tokenises the Bash command (reusing tokenizeCommand from v1.0.161 Bug 1 fix), skips env-style assignments and common runners (sudo/doas/env/exec/time), locates the `git` token, then scans for `-C` / `--directory` to capture scopedDir and the first bare token as the operation. Falls back to the legacy regex scan only when the algorithmic parse can't find a `git` token. When scopedDir is captured, extractGit now emits a leading cwd event with the parsed dir BEFORE the git event. The attribution layer then routes the git event via carry-forward. src/session/project-attribution.ts: resolveProjectAttributions tracks whether an in-batch CWD-level event (confidence ≥ 0.9) has explicitly re-scoped the project. Once true, subsequent events fall back to the carried-forward lastKnown instead of the original hook inputProjectDir. This is the architectural minimum needed for "user's intentional cwd shift wins over hook startup cwd" without bumping CARRY_FORWARD_THRESHOLD or reordering the fallbackAttribution priority globally (both of which would have broader regression surface). TESTS (tests/integration/cross-project-attribution.test.ts, 5 tests) 1. Tracer — Edit on projB's file while cwd=projA → projB ✅ (no-fix, existing path) 2. Batched mixed A+B+B edits → each event attributes independently ✅ 3. Bug 8 tracer — `git -C /projB status` via real extractEvents → projB ✅ 4. `cd /projB && npm test` (baseline that already worked) → projB ✅ 5. Bash without path indicator (df -h) → fallback to cwd ✅ Empirical proof: live E2E against real platform — file_edit on projB, cwd event, and git event all POST with project=github.com/acme/projB, status 200 from production endpoint. REGRESSION 1215 pass, 5 skipped, 1 pre-existing failure (project-dir-strict — stale, unrelated). The attribution batch-shadow fix does not affect any existing test fixture — the lastKnown shadowing only kicks in when a CWD_EVENT-level confidence attribution has carried forward, which the existing tests don't exercise in mixed batches. NO release. Push to next only — waiting for user verification + approval.
1 parent 4f58e4f commit 2e7a543

3 files changed

Lines changed: 399 additions & 5 deletions

File tree

src/session/extract.ts

Lines changed: 120 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,21 @@ function extractGit(input: HookInput): SessionEvent[] {
340340
if (input.tool_name !== "Bash") return [];
341341

342342
const cmd = String(input.tool_input["command"] ?? "");
343-
const match = GIT_PATTERNS.find(p => p.pattern.test(cmd));
343+
344+
// Bug 8 (v1.0.162) — parse the git invocation algorithmically so flags
345+
// between `git` and the operation token are tolerated (`git -C /path
346+
// status`, `git --no-pager log`, etc.). Falls back to the legacy regex
347+
// pattern scan when the algorithmic parse cannot locate a `git` token —
348+
// preserves backward compat for commands like `cd /repo && git status`
349+
// where the algorithmic parse sees `cd` as the first token instead.
350+
const parsed = parseGitInvocation(cmd);
351+
let match: { pattern: RegExp; operation: string } | undefined;
352+
if (parsed && parsed.operation) {
353+
match = GIT_PATTERNS.find(p => p.operation === parsed.operation);
354+
}
355+
if (!match) {
356+
match = GIT_PATTERNS.find(p => p.pattern.test(cmd));
357+
}
344358
if (!match) return [];
345359

346360
// Bug 1 (v1.0.161) — for `git commit` operations, parse -m / -am / --message=
@@ -354,24 +368,125 @@ function extractGit(input: HookInput): SessionEvent[] {
354368
// rollup aggregator can distinguish ACTUAL commits from other git operations
355369
// (status/diff/log were inflating has_commit on every event — see
356370
// session-loaders.mjs rollup stamp + Bug 2).
371+
// Bug 8 cwd hint — when `-C <dir>` is present in the git invocation, emit
372+
// a leading cwd event so the attribution carry-forward (LAST_SEEN source)
373+
// routes downstream events in the same batch to the scoped directory's
374+
// project. Without the hint, `git -C /projB status` while cwd=/projA
375+
// misattributes to /projA.
376+
const out: SessionEvent[] = [];
377+
if (parsed?.scopedDir) {
378+
out.push({
379+
type: "cwd",
380+
category: "cwd",
381+
data: safeString(parsed.scopedDir),
382+
priority: 2,
383+
});
384+
}
385+
357386
if (match.operation === "commit") {
358387
const msg = extractCommitMessageFromCommand(cmd);
359388
if (msg) {
360-
return [{
389+
out.push({
361390
type: "git_commit",
362391
category: "git",
363392
data: safeString(msg),
364393
priority: 2,
365-
}];
394+
});
395+
return out;
366396
}
367397
}
368398

369-
return [{
399+
out.push({
370400
type: "git",
371401
category: "git",
372402
data: safeString(match.operation),
373403
priority: 2,
374-
}];
404+
});
405+
return out;
406+
}
407+
408+
// Algorithmic git invocation parser — tokenizes the Bash command and walks
409+
// argv to extract the `-C <dir>` scope hint and the operation subcommand.
410+
// Tolerates env-prefix assignments and any number of flags between `git`
411+
// and the operation. Returns null when no `git` token is found (caller
412+
// falls back to the legacy regex pattern scan).
413+
interface ParsedGit {
414+
scopedDir: string | null;
415+
operation: string | null;
416+
}
417+
418+
function parseGitInvocation(cmd: string): ParsedGit | null {
419+
const tokens = tokenizeCommand(cmd);
420+
let i = 0;
421+
// Skip env-style assignments at the head (FOO=bar git ...)
422+
while (i < tokens.length && isEnvAssignment(tokens[i])) i++;
423+
// Locate the `git` token (allow common runners like `sudo git ...`)
424+
while (i < tokens.length && tokens[i] !== "git" && !tokens[i].endsWith("/git")) {
425+
// Stop runner-skipping at the first non-assignment, non-runner token
426+
if (!isCommonRunner(tokens[i])) break;
427+
i++;
428+
}
429+
if (i >= tokens.length) return null;
430+
if (tokens[i] !== "git" && !tokens[i].endsWith("/git")) return null;
431+
i++; // consume `git`
432+
433+
let scopedDir: string | null = null;
434+
let operation: string | null = null;
435+
while (i < tokens.length) {
436+
const t = tokens[i];
437+
if (t === "-C" || t === "--directory") {
438+
scopedDir = tokens[i + 1] ?? null;
439+
i += 2;
440+
continue;
441+
}
442+
if (t.length > 0 && t[0] === "-") {
443+
// Generic flag — skip the flag itself. We do NOT consume the next
444+
// token as its value generically because git's per-flag arg shape
445+
// varies; the dedicated extractCommitMessageFromCommand handles -m
446+
// separately.
447+
i++;
448+
continue;
449+
}
450+
// First bare (non-flag) token after `git` = operation
451+
operation = t;
452+
break;
453+
}
454+
return { scopedDir, operation };
455+
}
456+
457+
function isEnvAssignment(token: string): boolean {
458+
if (token.length === 0) return false;
459+
// FOO=bar shape: starts with an uppercase letter, contains an `=`
460+
let sawEq = false;
461+
for (let j = 0; j < token.length; j++) {
462+
const c = token.charCodeAt(j);
463+
if (j === 0) {
464+
// First char must be A-Z or underscore
465+
if (!((c >= 65 && c <= 90) || c === 95)) return false;
466+
} else if (c === 61 /* = */) {
467+
sawEq = true;
468+
break;
469+
} else if (!((c >= 65 && c <= 90) || (c >= 48 && c <= 57) || c === 95)) {
470+
// Body chars must be A-Z, 0-9, or _
471+
return false;
472+
}
473+
}
474+
return sawEq;
475+
}
476+
477+
function isCommonRunner(token: string): boolean {
478+
// Runners that wrap real commands. We skip them when locating `git`
479+
// so `sudo git status` works the same as `git status`.
480+
switch (token) {
481+
case "sudo":
482+
case "doas":
483+
case "env":
484+
case "exec":
485+
case "time":
486+
return true;
487+
default:
488+
return false;
489+
}
375490
}
376491

377492
// Shell-like argv tokenizer — handles single/double quotes, backslash escapes,

src/session/project-attribution.ts

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,16 +249,30 @@ export function resolveProjectAttributions(
249249
): ProjectAttribution[] {
250250
const out: ProjectAttribution[] = [];
251251
let lastKnown = context.lastKnownProjectDir ? normalizePath(context.lastKnownProjectDir) : "";
252+
// v1.0.162 Bug 8 — track whether an in-batch CWD-level event has explicitly
253+
// re-scoped the project. When extract.ts emits a cwd event for `cd /projB`
254+
// or `git -C /projB ...`, subsequent path-less events in the same batch
255+
// (e.g. the git operation event itself) currently fall back to the hook's
256+
// inputProjectDir, which still points at the session startup cwd. The
257+
// user's INTENTIONAL scoping should win over the hook's startup cwd —
258+
// shadow inputProjectDir with the carried-forward lastKnown once a high-
259+
// confidence cwd event has fired in this batch.
260+
let inBatchCwdScope = false;
252261

253262
for (const ev of events) {
263+
const effectiveInputCwd = inBatchCwdScope ? lastKnown : context.inputProjectDir;
254264
const attribution = resolveProjectAttribution(ev, {
255265
...context,
266+
inputProjectDir: effectiveInputCwd,
256267
lastKnownProjectDir: lastKnown || context.lastKnownProjectDir || null,
257268
});
258269
out.push(attribution);
259270

260271
if (attribution.projectDir && attribution.confidence >= ATTRIBUTION_CONFIDENCE.CARRY_FORWARD_THRESHOLD) {
261272
lastKnown = attribution.projectDir;
273+
if (attribution.confidence >= ATTRIBUTION_CONFIDENCE.CWD_EVENT) {
274+
inBatchCwdScope = true;
275+
}
262276
}
263277
}
264278

0 commit comments

Comments
 (0)