Skip to content

storage: fix ondemand read may cause IO err when enable prefetch #468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 7, 2022

Conversation

kevinXYin
Copy link
Contributor

For fscache scenario, when prefetch workers set data chunks to pending,
the ondemand read procedure does not wait for these pending chunks to
be downloaded and persisted before replying cread.

Make fetch_range_uncompressed() also waiting for inflight io complete
for fscache.

Signed-off-by: Xin Yin [email protected]

Copy link
Contributor

@hsiangkao hsiangkao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do like this as a start.

@@ -355,7 +355,16 @@ impl FileCacheEntry {
None => return Ok(0),
Some(v) => {
if v.is_empty() {
return Ok(0);
if wait_inflight {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if wait_inflight && !bitmap.wait_for_range_ready(chunk_index, count)? {
return Err()
} else {
return Ok(0)
}
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, make sense , will update soon.

@@ -355,7 +355,16 @@ impl FileCacheEntry {
None => return Ok(0),
Some(v) => {
if v.is_empty() {
return Ok(0);
if wait_inflight {
// we get inflight io , wait them
Copy link
Collaborator

@imeoer imeoer Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get inflight io, wait for them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will fix it

@changweige
Copy link
Contributor

Seems two read paths do_fetch_chunks can both read from the remote storage backend since check_range_ready_and_mark_pending is simultaneously called to check whether the target chunks are ready ending up with duplicated read from the backend. The two read paths both have non-empty pending vectors.
So not only for fscache usage but also for FUSE, we still have this problem.
I suppose there is no need to add the flag wait_inflight to do_fetch_chunks.

For this single patch, I think pending is empty means all chunks are ready in the local cache file, so the wait is needless.
We should check if chunks are in the progress of being read from remote storage and wait there if they are.

@kevinXYin
Copy link
Contributor Author

Seems two read paths do_fetch_chunks can both read from the remote storage backend since check_range_ready_and_mark_pending is simultaneously called to check whether the target chunks are ready ending up with duplicated read from the backend. The two read paths both have non-empty pending vectors. So not only for fscache usage but also for FUSE, we still have this problem. I suppose there is no need to add the flag wait_inflight to do_fetch_chunks.

For this single patch, I think pending is empty means all chunks are ready in the local cache file, so the wait is needless. We should check if chunks are in the progress of being read from remote storage and wait there if they are.

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

@changweige
Copy link
Contributor

Seems two read paths do_fetch_chunks can both read from the remote storage backend since check_range_ready_and_mark_pending is simultaneously called to check whether the target chunks are ready ending up with duplicated read from the backend. The two read paths both have non-empty pending vectors. So not only for fscache usage but also for FUSE, we still have this problem. I suppose there is no need to add the flag wait_inflight to do_fetch_chunks.
For this single patch, I think pending is empty means all chunks are ready in the local cache file, so the wait is needless. We should check if chunks are in the progress of being read from remote storage and wait there if they are.

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I checked the implementation IndexedChunkMap::check_range_ready_and_mark_pending, no inflight tracer is working or holding inflight states there. So pending is empty at least for IndexdChunkMap has no effect of waiting for inflight.

Moreover, I suppose do_fetch_chunks should be a synchronous operation. Maybe, we should clarify the semantics of it.
Then a simpler method might be just removing the return statement if pending is empty.

@changweige
Copy link
Contributor

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I see. There is a new type BlobStateMap wrapping IndexedChunkmap and the other one. Struct BlobStateMap has inflight_tracer. BlobStateMap IndexedChunkmap and the other type of ChunMap all implement RangeMap.
And only BlobStateMap is instantiated when creating a cache object.

@jiangliu Should we define do_fetch_chunks as a synchronous operation?

@jiangliu
Copy link
Collaborator

jiangliu commented Jun 6, 2022

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I see. There is a new type BlobStateMap wrapping IndexedChunkmap and the other one. Struct BlobStateMap has inflight_tracer. BlobStateMap IndexedChunkmap and the other type of ChunMap all implement RangeMap. And only BlobStateMap is instantiated when creating a cache object.

@jiangliu Should we define do_fetch_chunks as a synchronous operation?

Those fetch_xxx method defined by the BlobObject trait has no output buffer to receive the data, so all data must be written to the underlying cache files synchronously, otherwise the client may get stale data.
We should make do_fetch_chunks as synchronous operation.

@kevinXYin
Copy link
Contributor Author

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I see. There is a new type BlobStateMap wrapping IndexedChunkmap and the other one. Struct BlobStateMap has inflight_tracer. BlobStateMap IndexedChunkmap and the other type of ChunMap all implement RangeMap. And only BlobStateMap is instantiated when creating a cache object.
@jiangliu Should we define do_fetch_chunks as a synchronous operation?

Those fetch_xxx method defined by the BlobObject trait has no output buffer to receive the data, so all data must be written to the underlying cache files synchronously, otherwise the client may get stale data. We should make do_fetch_chunks as synchronous operation.

Refer to comment for BlobObject trait , seems only fetch_range_uncompressed need to make sure all data ranges are ready. Does that mean other methods can return before the data is persisted in a local file?

And if do_fetch_chunks called from prefetch worker is it still needed to wait all data ready? In this case the data does not need to be used immediately.

@jiangliu
Copy link
Collaborator

jiangliu commented Jun 6, 2022

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I see. There is a new type BlobStateMap wrapping IndexedChunkmap and the other one. Struct BlobStateMap has inflight_tracer. BlobStateMap IndexedChunkmap and the other type of ChunMap all implement RangeMap. And only BlobStateMap is instantiated when creating a cache object.
@jiangliu Should we define do_fetch_chunks as a synchronous operation?

Those fetch_xxx method defined by the BlobObject trait has no output buffer to receive the data, so all data must be written to the underlying cache files synchronously, otherwise the client may get stale data. We should make do_fetch_chunks as synchronous operation.

Refer to comment for BlobObject trait , seems only fetch_range_uncompressed need to make sure all data ranges are ready. Does that mean other methods can return before the data is persisted in a local file?

And if do_fetch_chunks called from prefetch worker is it still needed to wait all data ready? In this case the data does not need to be used immediately.

Seems it would be better to enforce synchronous for all fetch_xxx method in BlobObject, to avoid possible data crash issues. If it turns out that prefetch suffers from synchronous sematic, we may add new API for it. And we are trying to enable async io recently, which will help too.

@kevinXYin
Copy link
Contributor Author

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I see. There is a new type BlobStateMap wrapping IndexedChunkmap and the other one. Struct BlobStateMap has inflight_tracer. BlobStateMap IndexedChunkmap and the other type of ChunMap all implement RangeMap. And only BlobStateMap is instantiated when creating a cache object.
@jiangliu Should we define do_fetch_chunks as a synchronous operation?

Those fetch_xxx method defined by the BlobObject trait has no output buffer to receive the data, so all data must be written to the underlying cache files synchronously, otherwise the client may get stale data. We should make do_fetch_chunks as synchronous operation.

Refer to comment for BlobObject trait , seems only fetch_range_uncompressed need to make sure all data ranges are ready. Does that mean other methods can return before the data is persisted in a local file?
And if do_fetch_chunks called from prefetch worker is it still needed to wait all data ready? In this case the data does not need to be used immediately.

Seems it would be better to enforce synchronous for all fetch_xxx method in BlobObject, to avoid possible data crash issues. If it turns out that prefetch suffers from synchronous sematic, we may add new API for it. And we are trying to enable async io recently, which will help too.

OK , so for this patch we just drop wait_inflight , and if 'pending' is empty, call wait_for_range_ready to wait for all chunks to ready. Does it make sense?

And I still worry that this may invoke profemance drop for prefetch workers.

@jiangliu
Copy link
Collaborator

jiangliu commented Jun 7, 2022

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I see. There is a new type BlobStateMap wrapping IndexedChunkmap and the other one. Struct BlobStateMap has inflight_tracer. BlobStateMap IndexedChunkmap and the other type of ChunMap all implement RangeMap. And only BlobStateMap is instantiated when creating a cache object.
@jiangliu Should we define do_fetch_chunks as a synchronous operation?

Those fetch_xxx method defined by the BlobObject trait has no output buffer to receive the data, so all data must be written to the underlying cache files synchronously, otherwise the client may get stale data. We should make do_fetch_chunks as synchronous operation.

Refer to comment for BlobObject trait , seems only fetch_range_uncompressed need to make sure all data ranges are ready. Does that mean other methods can return before the data is persisted in a local file?
And if do_fetch_chunks called from prefetch worker is it still needed to wait all data ready? In this case the data does not need to be used immediately.

Seems it would be better to enforce synchronous for all fetch_xxx method in BlobObject, to avoid possible data crash issues. If it turns out that prefetch suffers from synchronous sematic, we may add new API for it. And we are trying to enable async io recently, which will help too.

OK , so for this patch we just drop wait_inflight , and if 'pending' is empty, call wait_for_range_ready to wait for all chunks to ready. Does it make sense?

And I still worry that this may invoke profemance drop for prefetch workers.

The prefetch performance issue could be traded off by add more working threads, and we are working on enabling async io framework, which will then fix the issue.
#467

For fscache scenario, when prefetch workers set data chunks to pending,
the ondemand read procedure does not wait for these pending chunks to
be downloaded and persisted before replying cread.

Make do_fetch_chunks() also waiting for inflight io complete.

Signed-off-by: Xin Yin <[email protected]>
@kevinXYin
Copy link
Contributor Author

AFAK, ‘pending’ is empty seams can not guarantee all chunks are persisted in local file, if we already have chunks inflight? please correct me if I am wrong.

I see. There is a new type BlobStateMap wrapping IndexedChunkmap and the other one. Struct BlobStateMap has inflight_tracer. BlobStateMap IndexedChunkmap and the other type of ChunMap all implement RangeMap. And only BlobStateMap is instantiated when creating a cache object.
@jiangliu Should we define do_fetch_chunks as a synchronous operation?

Those fetch_xxx method defined by the BlobObject trait has no output buffer to receive the data, so all data must be written to the underlying cache files synchronously, otherwise the client may get stale data. We should make do_fetch_chunks as synchronous operation.

Refer to comment for BlobObject trait , seems only fetch_range_uncompressed need to make sure all data ranges are ready. Does that mean other methods can return before the data is persisted in a local file?
And if do_fetch_chunks called from prefetch worker is it still needed to wait all data ready? In this case the data does not need to be used immediately.

Seems it would be better to enforce synchronous for all fetch_xxx method in BlobObject, to avoid possible data crash issues. If it turns out that prefetch suffers from synchronous sematic, we may add new API for it. And we are trying to enable async io recently, which will help too.

OK , so for this patch we just drop wait_inflight , and if 'pending' is empty, call wait_for_range_ready to wait for all chunks to ready. Does it make sense?
And I still worry that this may invoke profemance drop for prefetch workers.

The prefetch performance issue could be traded off by add more working threads, and we are working on enabling async io framework, which will then fix the issue. #467

Thanks , updated

Copy link
Contributor

@changweige changweige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@changweige
Copy link
Contributor

And I still worry that this may invoke profemance drop for prefetch workers.

@kevinXYin
In fact, prefetch can also be an evil thing according to your production environment since prefetch can steal network bandwidth from normal user business IO. So prefetch IO is usually limited especially when many nydusd processes are started in your nodes and cluster-wide.

So I personally don't think it is a big concern about prefetch performance.

@kevinXYin
Copy link
Contributor Author

Thanks, understood.

@imeoer imeoer merged commit 081f609 into dragonflyoss:master Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants