Skip to content

More ctl::string optimization #1232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 20, 2024
Merged

More ctl::string optimization #1232

merged 7 commits into from
Jun 20, 2024

Conversation

mrdomino
Copy link
Collaborator

@mrdomino mrdomino commented Jun 19, 2024

Moves some isbig checks into string.h, enabling smarter optimizations to
be made on small strings. Also we no longer zero out our string prior to
calling the various constructors, buying back the performance we lost on
big strings when we made the small-string optimization. We further add a
little optimization to the big_string copy constructor: if the string is
using half or more of its capacity, then we don’t recompute capacity and
just take the old string’s. As well, the copy constructor always makes a
small string when it will fit, even if copied from a big string that got
truncated.

This also reworks the test to follow the idiom adopted elsewhere re stl,
and adds a helper function to tell if a string is small based on data().

@mrdomino mrdomino marked this pull request as ready for review June 19, 2024 01:20
@mrdomino
Copy link
Collaborator Author

mrdomino commented Jun 19, 2024

master.txt
    8.2089 ns 10000000x { ctl::string s; s.append("hello "); s.append("world"); }
   3.70787 ns 1000000x { ctl::string s; for (int i = 0; i < 8; ++i) { s.append('a'); } }
    3.7205 ns 1000000x { ctl::string s; for (int i = 0; i < 16; ++i) { s.append('a'); } }
    3.7843 ns 1000000x { ctl::string s; for (int i = 0; i < 23; ++i) { s.append('a'); } }
   3.87171 ns 1000000x { ctl::string s; for (int i = 0; i < 24; ++i) { s.append('a'); } }
   3.95228 ns 1000000x { ctl::string s; for (int i = 0; i < 32; ++i) { s.append('a'); } }
      6.73 ns 1000000x { ctl::string s(small_c); }
     5.099 ns 1000000x { ctl::string s(small); }
     5.402 ns 1000000x { ctl::string s2(small_copy); }
     5.915 ns 1000000x { ctl::string s(small); ctl::string s2(std::move(s)); }
     9.453 ns 1000000x { ctl::string s(small); ctl::string s2(s); }
    17.503 ns 1000000x { ctl::string s(big_c); }
    16.119 ns 1000000x { ctl::string s(big); }
    16.431 ns 1000000x { ctl::string s2(big_copy); }
    18.026 ns 1000000x { ctl::string s(big); ctl::string s2(std::move(s)); }
    31.198 ns 1000000x { ctl::string s(big); ctl::string s2(s); }
     5.835 ns 1000000x { ctl::string s(23, 'a'); }
    14.572 ns 1000000x { ctl::string s(24, 'a'); }
     1.328 ns 1000000x { ctl::string_view s2(s); }
     6.185 ns 1000000x { ctl::string s(big_trunc); }

isbig_header.txt
      7.58 ns 10000000x { ctl::string s; s.append("hello "); s.append("world"); }
      3.72 ns 1000000x { ctl::string s; for (int i = 0; i < 8; ++i) { s.append('a'); } }
   3.64756 ns 1000000x { ctl::string s; for (int i = 0; i < 16; ++i) { s.append('a'); } }
   3.78161 ns 1000000x { ctl::string s; for (int i = 0; i < 23; ++i) { s.append('a'); } }
   3.88562 ns 1000000x { ctl::string s; for (int i = 0; i < 24; ++i) { s.append('a'); } }
   3.68587 ns 1000000x { ctl::string s; for (int i = 0; i < 32; ++i) { s.append('a'); } }
     6.321 ns 1000000x { ctl::string s(small_c); }
      4.88 ns 1000000x { ctl::string s(small); }
     4.894 ns 1000000x { ctl::string s2(small_copy); }
     5.275 ns 1000000x { ctl::string s(small); ctl::string s2(std::move(s)); }
     8.507 ns 1000000x { ctl::string s(small); ctl::string s2(s); }
    17.474 ns 1000000x { ctl::string s(big_c); }
    16.062 ns 1000000x { ctl::string s(big); }
    16.327 ns 1000000x { ctl::string s2(big_copy); }
    16.096 ns 1000000x { ctl::string s(big); ctl::string s2(std::move(s)); }
    30.975 ns 1000000x { ctl::string s(big); ctl::string s2(s); }
     5.208 ns 1000000x { ctl::string s(23, 'a'); }
    14.332 ns 1000000x { ctl::string s(24, 'a'); }
     1.327 ns 1000000x { ctl::string_view s2(s); }
     5.535 ns 1000000x { ctl::string s(big_trunc); }

further_opt.txt
    7.5789 ns 10000000x { ctl::string s; s.append("hello "); s.append("world"); }
     3.721 ns 1000000x { ctl::string s; for (int i = 0; i < 8; ++i) { s.append('a'); } }
   3.69875 ns 1000000x { ctl::string s; for (int i = 0; i < 16; ++i) { s.append('a'); } }
   3.78017 ns 1000000x { ctl::string s; for (int i = 0; i < 23; ++i) { s.append('a'); } }
   3.88446 ns 1000000x { ctl::string s; for (int i = 0; i < 24; ++i) { s.append('a'); } }
   3.68509 ns 1000000x { ctl::string s; for (int i = 0; i < 32; ++i) { s.append('a'); } }
     4.755 ns 1000000x { ctl::string s(small_c); }
     3.503 ns 1000000x { ctl::string s(small); }
     1.513 ns 1000000x { ctl::string s2(small_copy); }
     3.546 ns 1000000x { ctl::string s(small); ctl::string s2(std::move(s)); }
     3.917 ns 1000000x { ctl::string s(small); ctl::string s2(s); }
    13.375 ns 1000000x { ctl::string s(big_c); }
    11.427 ns 1000000x { ctl::string s(big); }
    11.641 ns 1000000x { ctl::string s2(big_copy); }
    11.968 ns 1000000x { ctl::string s(big); ctl::string s2(std::move(s)); }
    21.931 ns 1000000x { ctl::string s(big); ctl::string s2(s); }
     1.328 ns 1000000x { ctl::string s(23, 'a'); }
    12.255 ns 1000000x { ctl::string s(24, 'a'); }
     1.328 ns 1000000x { ctl::string_view s2(s); }
    13.567 ns 1000000x { ctl::string s(big_trunc); }

copy_small.txt
    7.5763 ns 10000000x { ctl::string s; s.append("hello "); s.append("world"); }
     3.706 ns 1000000x { ctl::string s; for (int i = 0; i < 8; ++i) { s.append('a'); } }
   3.65094 ns 1000000x { ctl::string s; for (int i = 0; i < 16; ++i) { s.append('a'); } }
   3.77835 ns 1000000x { ctl::string s; for (int i = 0; i < 23; ++i) { s.append('a'); } }
   3.88333 ns 1000000x { ctl::string s; for (int i = 0; i < 24; ++i) { s.append('a'); } }
   3.68775 ns 1000000x { ctl::string s; for (int i = 0; i < 32; ++i) { s.append('a'); } }
     4.766 ns 1000000x { ctl::string s(small_c); }
     3.502 ns 1000000x { ctl::string s(small); }
     1.474 ns 1000000x { ctl::string s2(small_copy); }
     3.503 ns 1000000x { ctl::string s(small); ctl::string s2(std::move(s)); }
     3.517 ns 1000000x { ctl::string s(small); ctl::string s2(s); }
    13.457 ns 1000000x { ctl::string s(big_c); }
    11.951 ns 1000000x { ctl::string s(big); }
    11.318 ns 1000000x { ctl::string s2(big_copy); }
    11.991 ns 1000000x { ctl::string s(big); ctl::string s2(std::move(s)); }
    22.523 ns 1000000x { ctl::string s(big); ctl::string s2(s); }
     1.325 ns 1000000x { ctl::string s(23, 'a'); }
    12.587 ns 1000000x { ctl::string s(24, 'a'); }
     1.327 ns 1000000x { ctl::string_view s2(s); }
     1.489 ns 1000000x { ctl::string s(big_trunc); }

@mrdomino
Copy link
Collaborator Author

For reference, here is master from before the small-string optimization with the current bnechmarks:

   17.6038 ns 10000000x { ctl::string s; s.append("hello "); s.append("world"); }
   3.88975 ns 1000000x { ctl::string s; for (int i = 0; i < 8; ++i) { s.append('a'); } }
   3.89375 ns 1000000x { ctl::string s; for (int i = 0; i < 16; ++i) { s.append('a'); } }
   3.63517 ns 1000000x { ctl::string s; for (int i = 0; i < 23; ++i) { s.append('a'); } }
   4.17204 ns 1000000x { ctl::string s; for (int i = 0; i < 24; ++i) { s.append('a'); } }
   3.69219 ns 1000000x { ctl::string s; for (int i = 0; i < 32; ++i) { s.append('a'); } }
    12.699 ns 1000000x { ctl::string s(small_c); }
     11.85 ns 1000000x { ctl::string s(small); }
    12.188 ns 1000000x { ctl::string s2(small_copy); }
    16.282 ns 1000000x { ctl::string s(small); ctl::string s2(std::move(s)); }
    22.362 ns 1000000x { ctl::string s(small); ctl::string s2(s); }
    12.786 ns 1000000x { ctl::string s(big_c); }
    11.857 ns 1000000x { ctl::string s(big); }
    11.825 ns 1000000x { ctl::string s2(big_copy); }
    16.311 ns 1000000x { ctl::string s(big); ctl::string s2(std::move(s)); }
    22.344 ns 1000000x { ctl::string s(big); ctl::string s2(s); }
    12.459 ns 1000000x { ctl::string s(23, 'a'); }
    12.486 ns 1000000x { ctl::string s(24, 'a'); }
     1.328 ns 1000000x { ctl::string_view s2(s); }

mrdomino added 2 commits June 18, 2024 22:34
Takes 1ns off most benchmarks where the destructor is frequently called.
We create a small string whenever it will fit, instead of only when r is
small. This is safe because we always memcpy the whole blob when we call
reserve, and optimizes the case when r has been truncated.

This also reworks the test to follow the idiom adopted elsewhere re stl,
and adds a helper function to tell if a string is small based on data().
@mrdomino mrdomino force-pushed the string-opt branch 2 times, most recently from e99a014 to d17c155 Compare June 19, 2024 14:21
@mrdomino mrdomino force-pushed the string-opt branch 2 times, most recently from 8d4695f to 0ce98c7 Compare June 19, 2024 15:23
These two functions are the primitive pseudo-constructors for the struct
fields in the union; as such, it does not make sense for them to use the
checked accessors.
@mrdomino mrdomino merged commit 7e780e5 into jart:master Jun 20, 2024
@mrdomino mrdomino deleted the string-opt branch June 20, 2024 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants