Skip to content

fix: wcow: fix race condition in localmounter #5885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2025

Conversation

profnandaa
Copy link
Collaborator

@profnandaa profnandaa commented Mar 28, 2025

WIP: still ongoing with investigation to determine
exactly which process is accessing the file, will
update on the issue #5807 thread for the records.

Fix the race condition with maximum 2 retries.
From several tests, 1 retry seems to be
enough, even without backoff. Added a simple
linear backoff for each retry starting at 30 ms.

fixes #5807


Here is a repro run + the mitigation - repro_run_8min.webm. See around t 1:55, on the 25th run. Hit the issue 1 in 100 runs, mostly it's been 2-3 in 100 runs.

Here's the repro script I was using:

# .\repro.ps1
param(
	[Parameter(Position = 0)]
	[int]$Count = 50,
	[Parameter(Position = 1)]
	[int]$Pause = 3
)

$dockerfile = @"
FROM mcr.microsoft.com/windows/nanoserver:ltsc2022 AS build
RUN mkdir out\sub && mklink /D sub out\sub && mklink /D sub2 out\sub && echo data> sub\foo 

FROM mcr.microsoft.com/windows/nanoserver:ltsc2022
COPY --from=build /sub/foo .
COPY --from=build /sub2/foo bar
"@

if (-not (Test-Path -Path ".\Dockerfile")) {
	Set-Content -Path .\Dockerfile -Value $dockerfile
}

$cmd = "buildctl build --frontend dockerfile.v0 --local context=. --local dockerfile=. --output type=image,name=docker.io/profnandaa/repro-5807,push=false --progress plain --no-cache"

for ($i = 1; $i -le $Count; $i++) { 
	Write-Host "`n=== Run $i`n"
	iex $cmd
	sleep $Pause
}

To run, save the script in repro.ps1, run buildkitd with breakpoint set at snapshot/localmounter_windows.go:78 , then:

mkdir auto
cp repro.ps1 auto
cd auto
.\repro.ps1 100 0

profnandaa referenced this pull request in AnastaZIuk/docker-nanoserver-msvc-winsdk Mar 28, 2025
@profnandaa profnandaa force-pushed the fix-5807-localmounter-race branch from 1f58068 to 080fd84 Compare March 28, 2025 09:03
@profnandaa
Copy link
Collaborator Author

Just some more stats:

type .\build-log-1000-repro.txt | sls "error: failed to solve:" | measure
Count             : 11

type .\build-log-1000-fixed.txt | sls "error: failed to solve:" | measure
Count             : 0

@profnandaa profnandaa force-pushed the fix-5807-localmounter-race branch from 080fd84 to 34f9de9 Compare March 28, 2025 19:26
@profnandaa profnandaa force-pushed the fix-5807-localmounter-race branch 6 times, most recently from 1a65345 to 7a63cd3 Compare March 31, 2025 09:07
> WIP: still going on with investigation to determine
> exactly which process is accessing the file, will
> update on the issue moby#5807 thread for the records.

Fix the race condition with maximum 2 retries for now.
From several test runs, 1 retry seems to be
enough, even without backoff. Added a simple
linear backoff for each retry starting at 30 ms.

fixes moby#5807

Signed-off-by: Anthony Nandaa <[email protected]>
@profnandaa profnandaa force-pushed the fix-5807-localmounter-race branch from 7a63cd3 to b3c2303 Compare April 3, 2025 19:15
@tonistiigi tonistiigi merged commit 7e2b28d into moby:master Apr 3, 2025
107 checks passed
@crazy-max crazy-max added this to the v0.21.0 milestone Apr 9, 2025
@profnandaa profnandaa deleted the fix-5807-localmounter-race branch May 9, 2025 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WCOW: test fails with file used by another process
3 participants