Skip to content

Pool buffers in IndexedCapturingReader #404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

drewnoakes
Copy link
Owner

Looking at traces over the regression suite shows ~2% CPU spent allocating chunks for the IndexedCapturingReader.

We can pull these buffers from the array pool, and return them when we're done. When processing multiple items, reusing these arrays will improve overall performance.

image

Looking at traces over the regression suite shows ~2% CPU spent allocating 2kB buffers for the `IndexedCapturingReader`.

We can pull these buffers from the array pool, and return them when we're done. When processing multiple items, reusing these arrays will improve overall performance.
@iamcarbon
Copy link
Collaborator

The implementation here looks sound, but let me do another pass on all the callers to make sure we're disposing everything correctly.

If we add GetMemory() or GetSequence APIs in the future, we'll also need to be extra diligent to make sure we don't reference one of these buffers to a public API (which could be sequently reused / corrupted).

@drewnoakes
Copy link
Owner Author

I'm wondering if we make all our reader classes internal in 2.9.0, exposing instead interfaces. A lot of the Get* methods could actually just be extension methods on those types. Just an idea at this point.

@iamcarbon
Copy link
Collaborator

iamcarbon commented Feb 6, 2024

Making IO an internal concern would allow us to continue to make more dramatic changes in the future. 👍

I'm also wondering if we bump the chunk size here from 2KB to 4KB to be closer to the disk sector size / OS page size.

@drewnoakes
Copy link
Owner Author

Making IO an internal concern would allow us to continue to make more dramatic changes in the future

Internal APIs are definitely more flexible for library authors. They limit extensibility though. Most users won't be extending the library (most extensions I know of have been made into contributions to the library itself). I don't know where the right point is on that continuum. Historically, certain APIs were more stable than others, based on an intuitive sense of the probabilities of their use outside this repo. In 2.9 we're batching up a bunch of changes, so it's a good time to contemplate what else we might want to change. But making things internal takes them away and might not leave an alternative.

I'm also wondering if we bump the chunk size here from 2KB to 4KB to be closer to the disk sector size / OS page size.

Experimenting with buffer sizes is a good idea. Matching the page size makes sense if they're aligned to OS pages, but I suspect that'd rarely be the case. It's more likely that a 2kB buffer first in a single page than a 4kB. But memory pipelines and architectures are complex and varied across OS's. Benchmarking is likely more instructive than pure theory.

Something that I wonder about for IndexedCapturingReader is a way to skip over chunks of data without capturing. Currently that's not possible, so if a file needs some data at the beginning, and some at the end, then we buffer everything in the middle. Maybe there's a way we can know that those bytes aren't interesting when skipping them.

let me do another pass on all the callers to make sure we're disposing everything correctly

I searched for all invocations of the constructor so believe I found all the usages within this library, but please do double-check!

@drewnoakes
Copy link
Owner Author

I'm going to merge this and will apply any feedback in a follow up PR.

@drewnoakes drewnoakes merged commit 0caa4d2 into main Feb 7, 2024
@drewnoakes drewnoakes deleted the pool-indexed-capturing-reader-chunks branch February 7, 2024 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants