Pool buffers in IndexedCapturingReader #404

drewnoakes · 2024-02-06T11:58:09Z

Looking at traces over the regression suite shows ~2% CPU spent allocating chunks for the IndexedCapturingReader.

We can pull these buffers from the array pool, and return them when we're done. When processing multiple items, reusing these arrays will improve overall performance.

Looking at traces over the regression suite shows ~2% CPU spent allocating 2kB buffers for the `IndexedCapturingReader`. We can pull these buffers from the array pool, and return them when we're done. When processing multiple items, reusing these arrays will improve overall performance.

iamcarbon · 2024-02-06T21:37:17Z

The implementation here looks sound, but let me do another pass on all the callers to make sure we're disposing everything correctly.

If we add GetMemory() or GetSequence APIs in the future, we'll also need to be extra diligent to make sure we don't reference one of these buffers to a public API (which could be sequently reused / corrupted).

drewnoakes · 2024-02-06T22:08:39Z

I'm wondering if we make all our reader classes internal in 2.9.0, exposing instead interfaces. A lot of the Get* methods could actually just be extension methods on those types. Just an idea at this point.

iamcarbon · 2024-02-06T23:19:40Z

Making IO an internal concern would allow us to continue to make more dramatic changes in the future. 👍

I'm also wondering if we bump the chunk size here from 2KB to 4KB to be closer to the disk sector size / OS page size.

drewnoakes · 2024-02-07T03:27:56Z

Making IO an internal concern would allow us to continue to make more dramatic changes in the future

Internal APIs are definitely more flexible for library authors. They limit extensibility though. Most users won't be extending the library (most extensions I know of have been made into contributions to the library itself). I don't know where the right point is on that continuum. Historically, certain APIs were more stable than others, based on an intuitive sense of the probabilities of their use outside this repo. In 2.9 we're batching up a bunch of changes, so it's a good time to contemplate what else we might want to change. But making things internal takes them away and might not leave an alternative.

I'm also wondering if we bump the chunk size here from 2KB to 4KB to be closer to the disk sector size / OS page size.

Experimenting with buffer sizes is a good idea. Matching the page size makes sense if they're aligned to OS pages, but I suspect that'd rarely be the case. It's more likely that a 2kB buffer first in a single page than a 4kB. But memory pipelines and architectures are complex and varied across OS's. Benchmarking is likely more instructive than pure theory.

Something that I wonder about for IndexedCapturingReader is a way to skip over chunks of data without capturing. Currently that's not possible, so if a file needs some data at the beginning, and some at the end, then we buffer everything in the middle. Maybe there's a way we can know that those bytes aren't interesting when skipping them.

let me do another pass on all the callers to make sure we're disposing everything correctly

I searched for all invocations of the constructor so believe I found all the usages within this library, but please do double-check!

drewnoakes · 2024-02-07T03:29:11Z

I'm going to merge this and will apply any feedback in a follow up PR.

drewnoakes added format-tiff format-png performance labels Feb 6, 2024

drewnoakes requested review from iamcarbon and kwhopper February 6, 2024 12:25

drewnoakes merged commit 0caa4d2 into main Feb 7, 2024

drewnoakes deleted the pool-indexed-capturing-reader-chunks branch February 7, 2024 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pool buffers in IndexedCapturingReader #404

Pool buffers in IndexedCapturingReader #404

Uh oh!

drewnoakes commented Feb 6, 2024

Uh oh!

iamcarbon commented Feb 6, 2024

Uh oh!

drewnoakes commented Feb 6, 2024

Uh oh!

iamcarbon commented Feb 6, 2024 •

edited

Loading

Uh oh!

drewnoakes commented Feb 7, 2024

Uh oh!

drewnoakes commented Feb 7, 2024

Uh oh!

Uh oh!

Pool buffers in IndexedCapturingReader #404

Pool buffers in IndexedCapturingReader #404

Uh oh!

Conversation

drewnoakes commented Feb 6, 2024

Uh oh!

iamcarbon commented Feb 6, 2024

Uh oh!

drewnoakes commented Feb 6, 2024

Uh oh!

iamcarbon commented Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drewnoakes commented Feb 7, 2024

Uh oh!

drewnoakes commented Feb 7, 2024

Uh oh!

Uh oh!

iamcarbon commented Feb 6, 2024 •

edited

Loading