wrapped functions default export support by Saga4 · Pull Request #1441 · codeflash-ai/codeflash

Saga4 · 2026-02-10T20:34:52Z

No description provided.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-02-10T20:47:59Z

PR Review Summary

Prek Checks

Status: Fixed (pending push)

Issues found and fixed:

RUF059: Unused unpacked variable original_import → prefixed with _ (2 instances in find_references.py)
FURB171: Single-item tuple membership tests → converted to equality tests (in find_references.py and treesitter.py)
I001: Unsorted imports → auto-fixed (in import_resolver.py and support.py)

All prek checks now pass.

Mypy: 121 errors found across changed files, but these are pre-existing type issues (e.g., str vs Path mismatches in functions_to_optimize.py, missing type parameters in support.py). These require logic changes/refactoring to fix and are not introduced by this PR.

Code Review

No critical issues found.

This PR does two things:

File rename: codeflash/languages/treesitter_utils.py → codeflash/languages/javascript/treesitter.py with all import paths updated across 14 files. All imports verified correct — no remaining references to the old path in Python files (2 stale references remain in MULTI_LANGUAGE_ARCHITECTURE.md documentation only).
New feature: Wrapped default exports — handles patterns like export default curry(traverseEntity) where a function is exported through a wrapper call. Implementation adds:
- wrapped_default_args field on ExportInfo dataclass
- _extract_call_expression_identifiers() for recursive AST extraction
- Check in is_function_exported() for wrapped args
- Comprehensive test class TestWrappedDefaultExports with 8 test cases covering curry, compose, nested wrappers, and the real-world Strapi pattern

Minor note: MULTI_LANGUAGE_ARCHITECTURE.md still references the old file path treesitter_utils.py in 2 places (lines 289, 683). Non-blocking documentation issue.

Test Coverage

File	PR	Main	Change
`javascript/treesitter.py` (was `treesitter_utils.py`)	93%	92%	+1%
`javascript/find_references.py`	86%	86%	0%
`javascript/support.py`	74%	74%	0%
`javascript/import_resolver.py`	72%	72%	0%
`javascript/instrument.py`	69%	69%	0%
`javascript/line_profiler.py`	77%	77%	0%
`code_utils/code_extractor.py`	68%	68%	0%
`code_utils/code_replacer.py`	83%	83%	0%
`code_utils/normalizers/javascript.py`	21%	21%	0%
`discovery/functions_to_optimize.py`	68%	68%	0%
Overall	79.39%	79.33%	+0.06%

New wrapped export feature is well-covered by 8 new test cases
No coverage regressions
Overall coverage slightly improved (+0.06%)

Note: 8 pre-existing test failures in test_tracer.py (unrelated to this PR, present on both main and PR branch)

Last updated: 2026-02-10T21:00Z

codeflash-ai · 2026-02-10T21:37:44Z

+        if args_node:
+            for child in args_node.children:
+                if child.type == "identifier":
+                    identifiers.append(self.get_node_text(child, source_bytes))


⚡️Codeflash found 24% (0.24x) speedup for TreeSitterAnalyzer._extract_call_expression_identifiers in codeflash/languages/javascript/treesitter.py

⏱️ Runtime : 195 microseconds → 157 microseconds (best of 165 runs)

📝 Explanation and details

The optimized code achieves a 23% runtime improvement (from 195μs to 157μs) by eliminating unnecessary function call overhead in a hot loop.

Key Optimization:

The critical change is inlining the text extraction operation for identifier nodes. Instead of calling self.get_node_text(child, source_bytes) for each identifier, the optimized version directly performs:

source_bytes[child.start_byte : child.end_byte].decode("utf8")

Why This Improves Runtime:

The line profiler reveals that in the original code, the identifiers.append(self.get_node_text(...)) line consumed 86.3% of total execution time (2.25ms out of 2.61ms). This is executed 1,008 times per test run, meaning each call has significant cumulative overhead:

Method call overhead: Each self.get_node_text() invocation adds function call stack setup/teardown

Attribute lookup: Accessing self.get_node_text requires traversing the instance's method resolution order

Parameter passing: Copying child and source_bytes references to the new stack frame

By inlining, the optimized version reduces this hot path from 2.25ms to just 507μs (77% reduction), directly accounting for the overall speedup.

Test Case Performance:

The optimization shows particularly strong results for workloads with many identifiers:

Large-scale extraction (1000 identifiers): 25.6% faster (180μs → 143μs)

Special character identifiers: 15.7% faster

Single identifier: 12.3% faster

Edge cases (no arguments, non-identifiers): Minimal overhead, maintaining correctness

The get_node_text() method is preserved for potential use elsewhere in the codebase, but is bypassed in this performance-critical loop where the same operation can be performed inline without abstraction cost.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests 🔘 None Found

🌀 Generated Regression Tests ✅ 9 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

🌀 Click to see Generated Regression Tests

import pytest # used for our unit tests from codeflash.languages.javascript.treesitter import TreeSitterAnalyzer # function to test # We will create a minimal Node-like structure compatible with the attributes and methods # used by TreeSitterAnalyzer._extract_call_expression_identifiers. We intentionally # avoid using external parsing libraries to keep tests deterministic and focused. class _DummyNode: """ Minimal compatible stand-in for tree-sitter Node for testing purposes. NOTE: The real analyzer expects a Node with: - .type (str) - .children (list of nodes) - .start_byte (int), .end_byte (int) for slicing source bytes - .child_by_field_name(name) -> node or None We provide these attributes so the method under test can operate normally. """ def __init__(self, type_, start_byte=0, end_byte=0, children=None, arguments_node=None): self.type = type_ self.start_byte = start_byte self.end_byte = end_byte # children should be a list of other _DummyNode instances self.children = children or [] # arguments_node is returned when child_by_field_name("arguments") is called self._arguments_node = arguments_node def child_by_field_name(self, name: str): # Only "arguments" is used by the method under test if name == "arguments": return self._arguments_node return None # Helper factory functions to build nodes used by tests def make_identifier_node(source_bytes: bytes, start: int, end: int): """Create an identifier node that slices source_bytes[start:end].""" return _DummyNode("identifier", start_byte=start, end_byte=end, children=[]) def make_arguments_node(children): """Create an arguments node that contains a list of child nodes.""" return _DummyNode("arguments", children=children) def make_call_node(arguments_node: _DummyNode): """Create a call_expression node whose 'arguments' field returns arguments_node.""" # start/end bytes on call node are irrelevant for the extraction logic return _DummyNode("call_expression", children=[arguments_node], arguments_node=arguments_node) # Create an analyzer instance without invoking __init__ to avoid requiring TreeSitterLanguage. # This is acceptable because the method under test does not depend on instance initialization # other than bound methods (get_node_text & _extract_call_expression_identifiers) existing. analyzer = object.__new__(TreeSitterAnalyzer) def test_single_identifier_argument_basic(): # Basic case: curry(traverseEntity) -> should extract ["traverseEntity"] src = b"curry(traverseEntity)" # locate the identifier substring start = src.index(b"traverseEntity") end = start + len(b"traverseEntity") ident_node = make_identifier_node(src, start, end) # identifier node for traverseEntity args = make_arguments_node([ident_node]) # arguments node wrapping the identifier call = make_call_node(args) # top-level call_expression node # Call the method under test and verify the single identifier is extracted codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.64μs -> 1.46μs (12.3% faster) def test_multiple_identifier_arguments_basic(): # Basic case: compose(fn1, fn2) -> should extract ["fn1", "fn2"] src = b"compose(fn1, fn2)" # find positions of fn1 and fn2 start1 = src.index(b"fn1") end1 = start1 + len(b"fn1") start2 = src.index(b"fn2") end2 = start2 + len(b"fn2") ident1 = make_identifier_node(src, start1, end1) ident2 = make_identifier_node(src, start2, end2) args = make_arguments_node([ident1, ident2]) call = make_call_node(args) codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.80μs -> 1.71μs (5.25% faster) def test_nested_call_expression_recursion(): # Nested case: compose(curry(fn)) -> should extract ["fn"] by recursing into nested call_expression src = b"compose(curry(fn))" # locate fn start_fn = src.index(b"fn") end_fn = start_fn + len(b"fn") fn_node = make_identifier_node(src, start_fn, end_fn) # inner curry(...) arguments node contains fn identifier inner_args = make_arguments_node([fn_node]) inner_call = _DummyNode("call_expression", children=[inner_args], arguments_node=inner_args) # outer compose(...) arguments node contains the inner call expression node outer_args = make_arguments_node([inner_call]) outer_call = make_call_node(outer_args) codeflash_output = analyzer._extract_call_expression_identifiers(outer_call, src); result = codeflash_output # 1.96μs -> 1.90μs (3.21% faster) def test_no_arguments_returns_empty_list(): # If the call node has no 'arguments' field (child_by_field_name returns None), result should be [] src = b"noArgsCall()" # create a call node that returns None for arguments call = _DummyNode("call_expression", children=[], arguments_node=None) codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 641ns -> 651ns (1.54% slower) def test_non_identifier_arguments_are_ignored(): # Arguments that are not identifiers (e.g., numeric literals) should be ignored src = b"call(42, 'string', { obj: 1 })" # create dummy children with types that are not "identifier" num_node = _DummyNode("number", start_byte=5, end_byte=7) # "42" str_node = _DummyNode("string", start_byte=9, end_byte=17) # "'string'" obj_node = _DummyNode("object", start_byte=19, end_byte=len(src)) args = make_arguments_node([num_node, str_node, obj_node]) call = make_call_node(args) codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 981ns -> 932ns (5.26% faster) def test_special_character_identifiers(): # Identifiers may include characters like ' and '_' commonly used in JS src = b"compose($fn, _fn)" start1 = src.index(b"$fn") end1 = start1 + len(b"$fn") start2 = src.index(b"_fn") end2 = start2 + len(b"_fn") id1 = make_identifier_node(src, start1, end1) id2 = make_identifier_node(src, start2, end2) args = make_arguments_node([id1, id2]) call = make_call_node(args) codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.77μs -> 1.53μs (15.7% faster) def test_empty_source_bytes_for_identifier(): # If source_bytes is empty but nodes have start/end 0, the extracted identifier is an empty string # This tests boundary behavior of get_node_text slicing an empty buffer src = b"" ident_node = make_identifier_node(src, 0, 0) # zero-length slice args = make_arguments_node([ident_node]) call = make_call_node(args) codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.34μs -> 1.25μs (7.27% faster) def test_large_number_of_identifier_arguments_performance_and_correctness(): # Large-scale test: create 1000 identifier arguments and ensure all are extracted in order count = 1000 # Build a source like "f0,f1,f2,...,f999" to easily compute offsets identifiers = [f"f{i}" for i in range(count)] # Construct source bytes with comma separators src_str = ",".join(identifiers) src = src_str.encode("utf8") # Build identifier nodes with correct start/end positions children = [] offset = 0 for i, ident in enumerate(identifiers): b = ident.encode("utf8") start = offset end = start + len(b) children.append(make_identifier_node(src, start, end)) # advance offset past the identifier and the comma (1 byte) except after the last offset = end + 1 args = make_arguments_node(children) call = make_call_node(args) codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 180μs -> 143μs (25.6% faster) def test_deeply_nested_multiple_levels(): # Build nested calls like a(b(c(d(e(fn))))) and ensure the identifier is still found. src = b"a(b(c(d(e(fn)))))" # locate "fn" start_fn = src.index(b"fn") end_fn = start_fn + len(b"fn") fn_node = make_identifier_node(src, start_fn, end_fn) # Build inner-most call args_inner = make_arguments_node([fn_node]) call_inner = _DummyNode("call_expression", children=[args_inner], arguments_node=args_inner) # Wrap with additional nested call_expression nodes multiple times level = call_inner nesting = 10 # modest depth to test recursion without hitting recursion limits for _ in range(nesting): args = make_arguments_node([level]) level = _DummyNode("call_expression", children=[args], arguments_node=args) # Top-level call: pass into extractor top_call = level codeflash_output = analyzer._extract_call_expression_identifiers(top_call, src); result = codeflash_output # 4.24μs -> 4.04μs (4.98% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1441-2026-02-10T21.37.43

Suggested change

identifiers.append(self.get_node_text(child, source_bytes))

identifiers.append(source_bytes[child.start_byte : child.end_byte].decode("utf8"))

Saga4 and others added 4 commits February 11, 2026 02:04

wrapped functions default export support

b8597b2

refactor

fa56eb7

Merge branch 'main' into install_with_clone

b893220

style: auto-fix linting issues

78ce6e6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Saga4 requested a review from mohammedahmed18 February 10, 2026 20:45

codeflash-ai Bot reviewed Feb 10, 2026

View reviewed changes

Saga4 merged commit 97531dc into main Feb 10, 2026
28 of 30 checks passed

Saga4 deleted the install_with_clone branch February 10, 2026 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wrapped functions default export support#1441

wrapped functions default export support#1441
Saga4 merged 4 commits into
mainfrom
install_with_clone

Saga4 commented Feb 10, 2026

Uh oh!

claude Bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

codeflash-ai Bot Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 9 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

	identifiers.append(self.get_node_text(child, source_bytes))
	identifiers.append(source_bytes[child.start_byte : child.end_byte].decode("utf8"))

Uh oh!

Conversation

Saga4 commented Feb 10, 2026

Uh oh!

claude Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Test Coverage

Uh oh!

codeflash-ai Bot Feb 10, 2026

Choose a reason for hiding this comment

⚡️Codeflash found 24% (0.24x) speedup for TreeSitterAnalyzer._extract_call_expression_identifiers in codeflash/languages/javascript/treesitter.py

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Feb 10, 2026 •

edited

Loading

⚡️Codeflash found 24% (0.24x) speedup for `TreeSitterAnalyzer._extract_call_expression_identifiers` in `codeflash/languages/javascript/treesitter.py`