-
Notifications
You must be signed in to change notification settings - Fork 2.2k
runc exec: use CLONE_INTO_CGROUP #4812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
aa873c8
to
115aa1f
Compare
OK I did some debugging and have very bad news to share. Apparently GHA moves the process we create (container's init) to a different cgroup. Here's an excerpt from debug logs (using fs2 cgroup driver):
Here ^^^ runc created a container and put its init to
Here ^^^ the same container init is unexpectedly in the
And here ^^^ Maybe this is what systemd does? But it doesn't do that on my machine. I need some time to digest this. Any feedback is welcome. |
We mark the transient unit as |
This is fs2 driver, no transient unit is created and no way to say This is placed under When I do the same on my machine (Fedora 42, systemd v257), this is not happening. Wonder if this is specific to either Ubuntu or maybe even Azure/GHA. |
467d16c
to
dfcf22a
Compare
I notice that all the failures occurred in rootless container tests. This might be related to: runc/libcontainer/process_linux.go Line 205 in dfcf22a
However, you mentioned we're seeing an ENOENT error here, so that may not be the cause. |
dfcf22a
to
6095b61
Compare
@kolyshkin Wait, I thought we always communicated with systemd when using Is this just for our testing, or are users actually using this? Because we will need to fix that if we have users on systemd-based systems using cgroups directly without transient units... |
When you use I'm pretty sure it has been that way from the very beginning. One other thing is, when using systemd, we configure everything via systemd and then use fs/fs2 driver to write to cgroup directly. This is also how things have always been. One reason for that is we did not care much to translate OCI spec into systemd settings, which is now mostly fixed. Another reason is, systemd doesn't support all per-cgroup settings that the kernel has (so some of those can't be expressed as systemd unit properties). |
The thing is, while the comment says "EBUSY", the actual code doesn't check for particular error, going for this fallback on any error (including ENOENT). My guess is, with systemd driver we actually need |
b98e111
to
6e3bf36
Compare
This fixes the following warning (seen on Fedora 42 and Ubuntu 24.04): + sudo chown -R rootless.rootless /home/rootless chown: warning: '.' should be ':': ‘rootless.rootless’ Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
The main idea is to maintain the code separately (and eventually kill V1 implementation). Signed-off-by: Kir Kolyshkin <[email protected]>
Remove cgroupPaths field from struct setnsProcess, because: - we can get base cgroup paths from p.manager.GetPaths(); - we can get sub-cgroup paths from p.process.SubCgroupPaths. But mostly because we are going to need separate cgroup paths when adopting cgroups.AddPid. Signed-off-by: Kir Kolyshkin <[email protected]>
The main benefit here is when we are using a systemd cgroup driver, we actually ask systemd to add a PID, rather than doing it ourselves. This way, we can add rootless exec PID to a cgroup. The implementation requires opencontainers/cgroups#26. Signed-off-by: Kir Kolyshkin <[email protected]>
This is based on work done in [1]. Since the functionality requires a recent kernel and might not work, implement a fallback. [1]: https://go-review.googlesource.com/c/go/+/417695 Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
6e3bf36
to
9489925
Compare
Apparently, we are also not placing rootless container exec's into the proper cgroup (which is still possible when using cgroup v2 systemd driver, but we'd need to use Tacking it in #4822 |
Requires (and currently includes) PR #4822; draft until that one is merged.
It makes sense to make runc exec benefit from
clone2(CLONE_INTO_CGROUP)
, whenavailable. Since it requires a recent kernel and might not work, implement a fallback.
Based on work done in https://go-review.googlesource.com/c/go/+/417695.
Closes: #4782.