wrapped functions default export support#1441
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review SummaryPrek ChecksStatus: Fixed (pending push) Issues found and fixed:
All prek checks now pass. Mypy: 121 errors found across changed files, but these are pre-existing type issues (e.g., Code ReviewNo critical issues found. This PR does two things:
Minor note: Test Coverage
Note: 8 pre-existing test failures in Last updated: 2026-02-10T21:00Z |
| if args_node: | ||
| for child in args_node.children: | ||
| if child.type == "identifier": | ||
| identifiers.append(self.get_node_text(child, source_bytes)) |
There was a problem hiding this comment.
⚡️Codeflash found 24% (0.24x) speedup for TreeSitterAnalyzer._extract_call_expression_identifiers in codeflash/languages/javascript/treesitter.py
⏱️ Runtime : 195 microseconds → 157 microseconds (best of 165 runs)
📝 Explanation and details
The optimized code achieves a 23% runtime improvement (from 195μs to 157μs) by eliminating unnecessary function call overhead in a hot loop.
Key Optimization:
The critical change is inlining the text extraction operation for identifier nodes. Instead of calling self.get_node_text(child, source_bytes) for each identifier, the optimized version directly performs:
source_bytes[child.start_byte : child.end_byte].decode("utf8")Why This Improves Runtime:
The line profiler reveals that in the original code, the identifiers.append(self.get_node_text(...)) line consumed 86.3% of total execution time (2.25ms out of 2.61ms). This is executed 1,008 times per test run, meaning each call has significant cumulative overhead:
- Method call overhead: Each
self.get_node_text()invocation adds function call stack setup/teardown - Attribute lookup: Accessing
self.get_node_textrequires traversing the instance's method resolution order - Parameter passing: Copying
childandsource_bytesreferences to the new stack frame
By inlining, the optimized version reduces this hot path from 2.25ms to just 507μs (77% reduction), directly accounting for the overall speedup.
Test Case Performance:
The optimization shows particularly strong results for workloads with many identifiers:
- Large-scale extraction (1000 identifiers): 25.6% faster (180μs → 143μs)
- Special character identifiers: 15.7% faster
- Single identifier: 12.3% faster
- Edge cases (no arguments, non-identifiers): Minimal overhead, maintaining correctness
The get_node_text() method is preserved for potential use elsewhere in the codebase, but is bypassed in this performance-critical loop where the same operation can be performed inline without abstraction cost.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 9 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 📊 Tests Coverage | 100.0% |
🌀 Click to see Generated Regression Tests
import pytest # used for our unit tests
from codeflash.languages.javascript.treesitter import TreeSitterAnalyzer
# function to test
# We will create a minimal Node-like structure compatible with the attributes and methods
# used by TreeSitterAnalyzer._extract_call_expression_identifiers. We intentionally
# avoid using external parsing libraries to keep tests deterministic and focused.
class _DummyNode:
"""
Minimal compatible stand-in for tree-sitter Node for testing purposes.
NOTE: The real analyzer expects a Node with:
- .type (str)
- .children (list of nodes)
- .start_byte (int), .end_byte (int) for slicing source bytes
- .child_by_field_name(name) -> node or None
We provide these attributes so the method under test can operate normally.
"""
def __init__(self, type_, start_byte=0, end_byte=0, children=None, arguments_node=None):
self.type = type_
self.start_byte = start_byte
self.end_byte = end_byte
# children should be a list of other _DummyNode instances
self.children = children or []
# arguments_node is returned when child_by_field_name("arguments") is called
self._arguments_node = arguments_node
def child_by_field_name(self, name: str):
# Only "arguments" is used by the method under test
if name == "arguments":
return self._arguments_node
return None
# Helper factory functions to build nodes used by tests
def make_identifier_node(source_bytes: bytes, start: int, end: int):
"""Create an identifier node that slices source_bytes[start:end]."""
return _DummyNode("identifier", start_byte=start, end_byte=end, children=[])
def make_arguments_node(children):
"""Create an arguments node that contains a list of child nodes."""
return _DummyNode("arguments", children=children)
def make_call_node(arguments_node: _DummyNode):
"""Create a call_expression node whose 'arguments' field returns arguments_node."""
# start/end bytes on call node are irrelevant for the extraction logic
return _DummyNode("call_expression", children=[arguments_node], arguments_node=arguments_node)
# Create an analyzer instance without invoking __init__ to avoid requiring TreeSitterLanguage.
# This is acceptable because the method under test does not depend on instance initialization
# other than bound methods (get_node_text & _extract_call_expression_identifiers) existing.
analyzer = object.__new__(TreeSitterAnalyzer)
def test_single_identifier_argument_basic():
# Basic case: curry(traverseEntity) -> should extract ["traverseEntity"]
src = b"curry(traverseEntity)"
# locate the identifier substring
start = src.index(b"traverseEntity")
end = start + len(b"traverseEntity")
ident_node = make_identifier_node(src, start, end) # identifier node for traverseEntity
args = make_arguments_node([ident_node]) # arguments node wrapping the identifier
call = make_call_node(args) # top-level call_expression node
# Call the method under test and verify the single identifier is extracted
codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.64μs -> 1.46μs (12.3% faster)
def test_multiple_identifier_arguments_basic():
# Basic case: compose(fn1, fn2) -> should extract ["fn1", "fn2"]
src = b"compose(fn1, fn2)"
# find positions of fn1 and fn2
start1 = src.index(b"fn1")
end1 = start1 + len(b"fn1")
start2 = src.index(b"fn2")
end2 = start2 + len(b"fn2")
ident1 = make_identifier_node(src, start1, end1)
ident2 = make_identifier_node(src, start2, end2)
args = make_arguments_node([ident1, ident2])
call = make_call_node(args)
codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.80μs -> 1.71μs (5.25% faster)
def test_nested_call_expression_recursion():
# Nested case: compose(curry(fn)) -> should extract ["fn"] by recursing into nested call_expression
src = b"compose(curry(fn))"
# locate fn
start_fn = src.index(b"fn")
end_fn = start_fn + len(b"fn")
fn_node = make_identifier_node(src, start_fn, end_fn)
# inner curry(...) arguments node contains fn identifier
inner_args = make_arguments_node([fn_node])
inner_call = _DummyNode("call_expression", children=[inner_args], arguments_node=inner_args)
# outer compose(...) arguments node contains the inner call expression node
outer_args = make_arguments_node([inner_call])
outer_call = make_call_node(outer_args)
codeflash_output = analyzer._extract_call_expression_identifiers(outer_call, src); result = codeflash_output # 1.96μs -> 1.90μs (3.21% faster)
def test_no_arguments_returns_empty_list():
# If the call node has no 'arguments' field (child_by_field_name returns None), result should be []
src = b"noArgsCall()"
# create a call node that returns None for arguments
call = _DummyNode("call_expression", children=[], arguments_node=None)
codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 641ns -> 651ns (1.54% slower)
def test_non_identifier_arguments_are_ignored():
# Arguments that are not identifiers (e.g., numeric literals) should be ignored
src = b"call(42, 'string', { obj: 1 })"
# create dummy children with types that are not "identifier"
num_node = _DummyNode("number", start_byte=5, end_byte=7) # "42"
str_node = _DummyNode("string", start_byte=9, end_byte=17) # "'string'"
obj_node = _DummyNode("object", start_byte=19, end_byte=len(src))
args = make_arguments_node([num_node, str_node, obj_node])
call = make_call_node(args)
codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 981ns -> 932ns (5.26% faster)
def test_special_character_identifiers():
# Identifiers may include characters like ' and '_' commonly used in JS
src = b"compose($fn, _fn)"
start1 = src.index(b"$fn")
end1 = start1 + len(b"$fn")
start2 = src.index(b"_fn")
end2 = start2 + len(b"_fn")
id1 = make_identifier_node(src, start1, end1)
id2 = make_identifier_node(src, start2, end2)
args = make_arguments_node([id1, id2])
call = make_call_node(args)
codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.77μs -> 1.53μs (15.7% faster)
def test_empty_source_bytes_for_identifier():
# If source_bytes is empty but nodes have start/end 0, the extracted identifier is an empty string
# This tests boundary behavior of get_node_text slicing an empty buffer
src = b""
ident_node = make_identifier_node(src, 0, 0) # zero-length slice
args = make_arguments_node([ident_node])
call = make_call_node(args)
codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.34μs -> 1.25μs (7.27% faster)
def test_large_number_of_identifier_arguments_performance_and_correctness():
# Large-scale test: create 1000 identifier arguments and ensure all are extracted in order
count = 1000
# Build a source like "f0,f1,f2,...,f999" to easily compute offsets
identifiers = [f"f{i}" for i in range(count)]
# Construct source bytes with comma separators
src_str = ",".join(identifiers)
src = src_str.encode("utf8")
# Build identifier nodes with correct start/end positions
children = []
offset = 0
for i, ident in enumerate(identifiers):
b = ident.encode("utf8")
start = offset
end = start + len(b)
children.append(make_identifier_node(src, start, end))
# advance offset past the identifier and the comma (1 byte) except after the last
offset = end + 1
args = make_arguments_node(children)
call = make_call_node(args)
codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 180μs -> 143μs (25.6% faster)
def test_deeply_nested_multiple_levels():
# Build nested calls like a(b(c(d(e(fn))))) and ensure the identifier is still found.
src = b"a(b(c(d(e(fn)))))"
# locate "fn"
start_fn = src.index(b"fn")
end_fn = start_fn + len(b"fn")
fn_node = make_identifier_node(src, start_fn, end_fn)
# Build inner-most call
args_inner = make_arguments_node([fn_node])
call_inner = _DummyNode("call_expression", children=[args_inner], arguments_node=args_inner)
# Wrap with additional nested call_expression nodes multiple times
level = call_inner
nesting = 10 # modest depth to test recursion without hitting recursion limits
for _ in range(nesting):
args = make_arguments_node([level])
level = _DummyNode("call_expression", children=[args], arguments_node=args)
# Top-level call: pass into extractor
top_call = level
codeflash_output = analyzer._extract_call_expression_identifiers(top_call, src); result = codeflash_output # 4.24μs -> 4.04μs (4.98% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.To test or edit this optimization locally git merge codeflash/optimize-pr1441-2026-02-10T21.37.43
| identifiers.append(self.get_node_text(child, source_bytes)) | |
| identifiers.append(source_bytes[child.start_byte : child.end_byte].decode("utf8")) |
No description provided.