Skip to content

wrapped functions default export support#1441

Merged
Saga4 merged 4 commits into
mainfrom
install_with_clone
Feb 10, 2026
Merged

wrapped functions default export support#1441
Saga4 merged 4 commits into
mainfrom
install_with_clone

Conversation

@Saga4

@Saga4 Saga4 commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@claude

claude Bot commented Feb 10, 2026

Copy link
Copy Markdown
Contributor

PR Review Summary

Prek Checks

Status: Fixed (pending push)

Issues found and fixed:

  • RUF059: Unused unpacked variable original_import → prefixed with _ (2 instances in find_references.py)
  • FURB171: Single-item tuple membership tests → converted to equality tests (in find_references.py and treesitter.py)
  • I001: Unsorted imports → auto-fixed (in import_resolver.py and support.py)

All prek checks now pass.

Mypy: 121 errors found across changed files, but these are pre-existing type issues (e.g., str vs Path mismatches in functions_to_optimize.py, missing type parameters in support.py). These require logic changes/refactoring to fix and are not introduced by this PR.

Code Review

No critical issues found.

This PR does two things:

  1. File rename: codeflash/languages/treesitter_utils.pycodeflash/languages/javascript/treesitter.py with all import paths updated across 14 files. All imports verified correct — no remaining references to the old path in Python files (2 stale references remain in MULTI_LANGUAGE_ARCHITECTURE.md documentation only).

  2. New feature: Wrapped default exports — handles patterns like export default curry(traverseEntity) where a function is exported through a wrapper call. Implementation adds:

    • wrapped_default_args field on ExportInfo dataclass
    • _extract_call_expression_identifiers() for recursive AST extraction
    • Check in is_function_exported() for wrapped args
    • Comprehensive test class TestWrappedDefaultExports with 8 test cases covering curry, compose, nested wrappers, and the real-world Strapi pattern

Minor note: MULTI_LANGUAGE_ARCHITECTURE.md still references the old file path treesitter_utils.py in 2 places (lines 289, 683). Non-blocking documentation issue.

Test Coverage

File PR Main Change
javascript/treesitter.py (was treesitter_utils.py) 93% 92% +1%
javascript/find_references.py 86% 86% 0%
javascript/support.py 74% 74% 0%
javascript/import_resolver.py 72% 72% 0%
javascript/instrument.py 69% 69% 0%
javascript/line_profiler.py 77% 77% 0%
code_utils/code_extractor.py 68% 68% 0%
code_utils/code_replacer.py 83% 83% 0%
code_utils/normalizers/javascript.py 21% 21% 0%
discovery/functions_to_optimize.py 68% 68% 0%
Overall 79.39% 79.33% +0.06%
  • New wrapped export feature is well-covered by 8 new test cases
  • No coverage regressions
  • Overall coverage slightly improved (+0.06%)

Note: 8 pre-existing test failures in test_tracer.py (unrelated to this PR, present on both main and PR branch)


Last updated: 2026-02-10T21:00Z

if args_node:
for child in args_node.children:
if child.type == "identifier":
identifiers.append(self.get_node_text(child, source_bytes))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 24% (0.24x) speedup for TreeSitterAnalyzer._extract_call_expression_identifiers in codeflash/languages/javascript/treesitter.py

⏱️ Runtime : 195 microseconds 157 microseconds (best of 165 runs)

📝 Explanation and details

The optimized code achieves a 23% runtime improvement (from 195μs to 157μs) by eliminating unnecessary function call overhead in a hot loop.

Key Optimization:

The critical change is inlining the text extraction operation for identifier nodes. Instead of calling self.get_node_text(child, source_bytes) for each identifier, the optimized version directly performs:

source_bytes[child.start_byte : child.end_byte].decode("utf8")

Why This Improves Runtime:

The line profiler reveals that in the original code, the identifiers.append(self.get_node_text(...)) line consumed 86.3% of total execution time (2.25ms out of 2.61ms). This is executed 1,008 times per test run, meaning each call has significant cumulative overhead:

  1. Method call overhead: Each self.get_node_text() invocation adds function call stack setup/teardown
  2. Attribute lookup: Accessing self.get_node_text requires traversing the instance's method resolution order
  3. Parameter passing: Copying child and source_bytes references to the new stack frame

By inlining, the optimized version reduces this hot path from 2.25ms to just 507μs (77% reduction), directly accounting for the overall speedup.

Test Case Performance:

The optimization shows particularly strong results for workloads with many identifiers:

  • Large-scale extraction (1000 identifiers): 25.6% faster (180μs → 143μs)
  • Special character identifiers: 15.7% faster
  • Single identifier: 12.3% faster
  • Edge cases (no arguments, non-identifiers): Minimal overhead, maintaining correctness

The get_node_text() method is preserved for potential use elsewhere in the codebase, but is bypassed in this performance-critical loop where the same operation can be performed inline without abstraction cost.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from codeflash.languages.javascript.treesitter import TreeSitterAnalyzer

# function to test
# We will create a minimal Node-like structure compatible with the attributes and methods
# used by TreeSitterAnalyzer._extract_call_expression_identifiers. We intentionally
# avoid using external parsing libraries to keep tests deterministic and focused.
class _DummyNode:
    """
    Minimal compatible stand-in for tree-sitter Node for testing purposes.

    NOTE: The real analyzer expects a Node with:
      - .type (str)
      - .children (list of nodes)
      - .start_byte (int), .end_byte (int) for slicing source bytes
      - .child_by_field_name(name) -> node or None

    We provide these attributes so the method under test can operate normally.
    """
    def __init__(self, type_, start_byte=0, end_byte=0, children=None, arguments_node=None):
        self.type = type_
        self.start_byte = start_byte
        self.end_byte = end_byte
        # children should be a list of other _DummyNode instances
        self.children = children or []
        # arguments_node is returned when child_by_field_name("arguments") is called
        self._arguments_node = arguments_node

    def child_by_field_name(self, name: str):
        # Only "arguments" is used by the method under test
        if name == "arguments":
            return self._arguments_node
        return None

# Helper factory functions to build nodes used by tests
def make_identifier_node(source_bytes: bytes, start: int, end: int):
    """Create an identifier node that slices source_bytes[start:end]."""
    return _DummyNode("identifier", start_byte=start, end_byte=end, children=[])

def make_arguments_node(children):
    """Create an arguments node that contains a list of child nodes."""
    return _DummyNode("arguments", children=children)

def make_call_node(arguments_node: _DummyNode):
    """Create a call_expression node whose 'arguments' field returns arguments_node."""
    # start/end bytes on call node are irrelevant for the extraction logic
    return _DummyNode("call_expression", children=[arguments_node], arguments_node=arguments_node)

# Create an analyzer instance without invoking __init__ to avoid requiring TreeSitterLanguage.
# This is acceptable because the method under test does not depend on instance initialization
# other than bound methods (get_node_text & _extract_call_expression_identifiers) existing.
analyzer = object.__new__(TreeSitterAnalyzer)

def test_single_identifier_argument_basic():
    # Basic case: curry(traverseEntity) -> should extract ["traverseEntity"]
    src = b"curry(traverseEntity)"
    # locate the identifier substring
    start = src.index(b"traverseEntity")
    end = start + len(b"traverseEntity")
    ident_node = make_identifier_node(src, start, end)  # identifier node for traverseEntity
    args = make_arguments_node([ident_node])  # arguments node wrapping the identifier
    call = make_call_node(args)  # top-level call_expression node

    # Call the method under test and verify the single identifier is extracted
    codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.64μs -> 1.46μs (12.3% faster)

def test_multiple_identifier_arguments_basic():
    # Basic case: compose(fn1, fn2) -> should extract ["fn1", "fn2"]
    src = b"compose(fn1, fn2)"
    # find positions of fn1 and fn2
    start1 = src.index(b"fn1")
    end1 = start1 + len(b"fn1")
    start2 = src.index(b"fn2")
    end2 = start2 + len(b"fn2")

    ident1 = make_identifier_node(src, start1, end1)
    ident2 = make_identifier_node(src, start2, end2)
    args = make_arguments_node([ident1, ident2])
    call = make_call_node(args)

    codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.80μs -> 1.71μs (5.25% faster)

def test_nested_call_expression_recursion():
    # Nested case: compose(curry(fn)) -> should extract ["fn"] by recursing into nested call_expression
    src = b"compose(curry(fn))"
    # locate fn
    start_fn = src.index(b"fn")
    end_fn = start_fn + len(b"fn")
    fn_node = make_identifier_node(src, start_fn, end_fn)

    # inner curry(...) arguments node contains fn identifier
    inner_args = make_arguments_node([fn_node])
    inner_call = _DummyNode("call_expression", children=[inner_args], arguments_node=inner_args)

    # outer compose(...) arguments node contains the inner call expression node
    outer_args = make_arguments_node([inner_call])
    outer_call = make_call_node(outer_args)

    codeflash_output = analyzer._extract_call_expression_identifiers(outer_call, src); result = codeflash_output # 1.96μs -> 1.90μs (3.21% faster)

def test_no_arguments_returns_empty_list():
    # If the call node has no 'arguments' field (child_by_field_name returns None), result should be []
    src = b"noArgsCall()"
    # create a call node that returns None for arguments
    call = _DummyNode("call_expression", children=[], arguments_node=None)

    codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 641ns -> 651ns (1.54% slower)

def test_non_identifier_arguments_are_ignored():
    # Arguments that are not identifiers (e.g., numeric literals) should be ignored
    src = b"call(42, 'string', { obj: 1 })"
    # create dummy children with types that are not "identifier"
    num_node = _DummyNode("number", start_byte=5, end_byte=7)  # "42"
    str_node = _DummyNode("string", start_byte=9, end_byte=17)  # "'string'"
    obj_node = _DummyNode("object", start_byte=19, end_byte=len(src))
    args = make_arguments_node([num_node, str_node, obj_node])
    call = make_call_node(args)

    codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 981ns -> 932ns (5.26% faster)

def test_special_character_identifiers():
    # Identifiers may include characters like ' and '_' commonly used in JS
    src = b"compose($fn, _fn)"
    start1 = src.index(b"$fn")
    end1 = start1 + len(b"$fn")
    start2 = src.index(b"_fn")
    end2 = start2 + len(b"_fn")

    id1 = make_identifier_node(src, start1, end1)
    id2 = make_identifier_node(src, start2, end2)
    args = make_arguments_node([id1, id2])
    call = make_call_node(args)

    codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.77μs -> 1.53μs (15.7% faster)

def test_empty_source_bytes_for_identifier():
    # If source_bytes is empty but nodes have start/end 0, the extracted identifier is an empty string
    # This tests boundary behavior of get_node_text slicing an empty buffer
    src = b""
    ident_node = make_identifier_node(src, 0, 0)  # zero-length slice
    args = make_arguments_node([ident_node])
    call = make_call_node(args)

    codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 1.34μs -> 1.25μs (7.27% faster)

def test_large_number_of_identifier_arguments_performance_and_correctness():
    # Large-scale test: create 1000 identifier arguments and ensure all are extracted in order
    count = 1000
    # Build a source like "f0,f1,f2,...,f999" to easily compute offsets
    identifiers = [f"f{i}" for i in range(count)]
    # Construct source bytes with comma separators
    src_str = ",".join(identifiers)
    src = src_str.encode("utf8")

    # Build identifier nodes with correct start/end positions
    children = []
    offset = 0
    for i, ident in enumerate(identifiers):
        b = ident.encode("utf8")
        start = offset
        end = start + len(b)
        children.append(make_identifier_node(src, start, end))
        # advance offset past the identifier and the comma (1 byte) except after the last
        offset = end + 1

    args = make_arguments_node(children)
    call = make_call_node(args)

    codeflash_output = analyzer._extract_call_expression_identifiers(call, src); result = codeflash_output # 180μs -> 143μs (25.6% faster)

def test_deeply_nested_multiple_levels():
    # Build nested calls like a(b(c(d(e(fn))))) and ensure the identifier is still found.
    src = b"a(b(c(d(e(fn)))))"
    # locate "fn"
    start_fn = src.index(b"fn")
    end_fn = start_fn + len(b"fn")
    fn_node = make_identifier_node(src, start_fn, end_fn)
    # Build inner-most call
    args_inner = make_arguments_node([fn_node])
    call_inner = _DummyNode("call_expression", children=[args_inner], arguments_node=args_inner)

    # Wrap with additional nested call_expression nodes multiple times
    level = call_inner
    nesting = 10  # modest depth to test recursion without hitting recursion limits
    for _ in range(nesting):
        args = make_arguments_node([level])
        level = _DummyNode("call_expression", children=[args], arguments_node=args)

    # Top-level call: pass into extractor
    top_call = level
    codeflash_output = analyzer._extract_call_expression_identifiers(top_call, src); result = codeflash_output # 4.24μs -> 4.04μs (4.98% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr1441-2026-02-10T21.37.43

Suggested change
identifiers.append(self.get_node_text(child, source_bytes))
identifiers.append(source_bytes[child.start_byte : child.end_byte].decode("utf8"))

Static Badge

@Saga4 Saga4 merged commit 97531dc into main Feb 10, 2026
28 of 30 checks passed
@Saga4 Saga4 deleted the install_with_clone branch February 10, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant