-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Contributing guidelines and issue reporting guide
- I've read the contributing guidelines and wholeheartedly agree. I've also read the issue reporting guide.
Well-formed report checklist
- I have found a bug that the documentation does not mention anything about my problem
- I have found a bug that there are no open or closed issues that are related to my problem
- I have provided version/information about my environment and done my best to provide a reproducer
Description of bug
BuildKit client hangs after successful image push (v0.23.2)
Summary
BuildKit client (buildctl
) hangs indefinitely after successfully pushing an image to a registry, even though the push operation completes successfully (receives HTTP 201 Created response). The client process never exits and must be forcefully terminated.
Environment
- BuildKit Version:
v0.23.2
(official Docker Hub image) - BuildKit Image:
moby/buildkit:v0.23.2
from Docker Hub - Platform: Linux (Kubernetes environment)
- Container Runtime: containerd
- Registry: Harbor (private registry)
- Worker: containerd worker
- Network: Host networking mode
- Deployment: Kubernetes using official Docker Hub image
Steps to Reproduce
- Execute a build with push to registry:
buildctl --addr tcp://buildkitd-service:12346 build \
--frontend dockerfile.v0 \
--no-cache \
--output type=image,name=registry.example.com/repo/image:tag,push=true \
--opt platform=linux/amd64 \
--opt force-network-mode=host \
--allow network.host \
--local context=. \
--local dockerfile=. \
--progress plain
- Observe that:
- Build completes successfully
- Image layers are pushed successfully
- Manifest is pushed successfully (HTTP 201 Created)
- Image appears in containerd:
ctr -n buildkit images ls
- Image appears in registry
- But
buildctl
process never exits
Expected Behavior
The buildctl
process should exit normally after the push operation completes successfully.
Actual Behavior
The buildctl
process hangs indefinitely and must be killed with SIGKILL
. The process appears to be waiting for some internal operation to complete.
Debugging Information
Client Process Analysis
Using strace
on the hanging client process shows:
[pid 20119] futex(0x198e420, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
Multiple threads are stuck in FUTEX_WAIT_PRIVATE
, suggesting the main thread is waiting in errgroup.Wait()
.
Daemon Process Analysis
The BuildKit daemon (official v0.23.2 Docker Hub image) shows:
[pid 29510] futex(0x2f4d920, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
Network connections remain established:
tcp6 0 0 daemon:12346 client:57532 ESTABLISHED
But no data is being transmitted (Recv-Q and Send-Q are both 0).
Log Analysis - Key Finding
Client logs show the push completing successfully:
11:48:51 #17 pushing manifest for harbor.XXX.io/test/XXXXX:main-271313406-18@sha256:1444138e6cb07041a256d102a9ec7f27628f2e4afd691e24340c94a03d68ff6e
11:48:52 #17 pushing manifest for harbor.XXX.io/test/XXXXX:main-271313406-18@sha256:1444138e6cb07041a256d102a9ec7f27628f2e4afd691e24340c94a03d68ff6e 0.4s done
11:48:52 #17 DONE 1.3s
Daemon logs (from official Docker Hub image) stop abruptly after the successful push:
time="2025-08-11T11:16:43Z" level=debug msg="fetch response received"
response.status="201 Created" span="exporting to image"
🔍 Critical Observation: The buildkitd logs show no corresponding "session finished" message after the push completes. In normal operations, we would expect to see:
time="2025-08-11T11:16:43Z" level=debug msg="session finished: <nil>" spanID=xxx traceID=xxx
This missing "session finished" log entry strongly suggests that the session cleanup process is stuck or never initiated, which explains why the client remains hanging.
Session State Analysis
The absence of the "session finished" log indicates:
- Session cleanup never started - The session termination process was never triggered
- Session cleanup stuck - The cleanup process started but got stuck somewhere
- Session reference leak - The session object is still being referenced, preventing cleanup
This correlates with the resource state showing Usage count: 0
but Reclaimable: false
.
Resource State
Using buildctl du
shows:
ID: 1jjc57vswpm46fex2yvsj97jt
Created at: 2025-08-11 11:16:22.613823175 +0000 UTC
Mutable: false
Reclaimable: false
Shared: false
Size: 0B
Description: local source for context
Usage count: 0
The resource has Usage count: 0
but Reclaimable: false
, indicating a resource leak that correlates with the missing session cleanup.
Additional Context
- Using the official open-source Docker Hub image
moby/buildkit:v0.23.2
- Key diagnostic: Missing "session finished" log entries in daemon logs
- Issue occurs in Kubernetes environment with official image
- The problem appears to be in the session cleanup mechanism after successful operations
- Session state becomes inconsistent (usage count 0 but not reclaimable)
- The problem is reproducible but intermittent
Note: Happy to provide additional debugging information if needed.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status