perf: optimize context extraction pipeline (~2x speedup)#1920
Conversation
Eliminate redundant CST traversals in code context extraction by caching dependency data, skipping unnecessary transforms, and removing MetadataWrapper.
|
Claude finished @KRRT7's task in 3m 27s —— View job PR Review SummaryPrek ChecksClean — Mypy errors in Code ReviewSize: SMALL (44 additions / 19 deletions across 3 files) No bugs, security issues, or breaking API changes found. The three optimizations are all correct: 1. The One minor note: 2. The check correctly short-circuits Minor: 3. Pre-compute Consistent with the pattern already used in the first loop (lines 273–274). Duplicate DetectionNo duplicates detected. The new helper functions ( Test CoverageSkipped for SMALL PRs. PR author reports 226 context extraction + static analysis tests pass. Open codeflash-ai[bot] PRsThree open bot PRs (#1890, #1891, #1895) all target PR #1887's branch, not Last updated: 2026-03-27T21:00 UTC |
cst.Attribute branch was dead code since __future__ imports always use a plain Name node.
Summary
base_defsonce per file in the second loop ofextract_all_contexts_from_files, passing them toremove_unused_definitions_by_function_namesto skip redundant CST traversals (1.83s → 0.01s)FutureAliasedImportTransformervia a fast_has_aliased_future_importscheck when no aliased__future__imports exist (140ms → 0.02ms/call)MetadataWrapper+ParentNodeProviderinDependencyCollectorwith lightweightvisit_Attribute/leave_Attributeid-tracking, eliminating expensive metadata computation (5.7x per-call speedup)Details
extract_all_contexts_from_filesruns two loops over helper files. The first loop already pre-computesbase_defsand passesdefs_with_usagesto avoid re-traversal — but the second loop (HoH-only files) was callingremove_unused_definitions_by_function_nameswithout it, forcing 5 redundantMetadataWrapper+DependencyCollectortraversals.DependencyCollectorrequiredMetadataWrappersolely forParentNodeProviderin one place: checking if aNameis the.attrpart of anAttributeinside class bodies. An id-based set populated byvisit_Attribute/leave_Attributereplaces this with zero metadata overhead.FutureAliasedImportTransformertraversed the full CST on every call even when no aliased__future__imports existed (the common case). A fast O(imports) check short-circuits the traversal.Benchmark
Profiled via cProfile on
test_benchmark_extract(Python 3.13.7, macOS):remove_unused_definitions_by_function_namescollect_top_level_defs_with_dependenciesgather_source_importsRemaining cost dominated by libcst
transform_module(43%) and Jedi inference (22%) — external library internals.Test plan
DependencyCollectorproduces identical results across 6 project files (352 definitions)uv run prekclean