Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
So far, d-run concentrates on namespaces. However, there are more building blocks to containers. An important one is cgroups.
Cgroups (control groups) allow to set resource limits for processes. The cgroup of a process can be found by looking into
/proc/{pid}/cgroup
. They are managed via a special file system in/srs/fs/cgroup/
(spec). A cgroup is created by creating a folder in that subtree. Resource limits are set by writing to files in that folder. Cgroups can also be hierarchical. Who is allowed to create cgroups depends on the settings of the parent. On the root of the tree, only the root user can create cgroups.Systemd uses cgroups extensively. It creates a cgroup under
/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/
where user 1000 can create new cgroups. It also provides the systemd-run command that allows to create a cgroup with some properties, run a command in that cgroup, and then remove the cgroup again. As far as I understand,systemd-run --scope
will communicate with the privileged systemd process to setup the cgroup, but willexec
the container itself, so there is little chance of a privilege escalation. The available options for resource control are available atman systemd.resource-control
.The OCI container spec does not contain any metadata for resource control. However, it reserves some keywords that are used by the docker spec. They are quite limited though (only
Memory
,MemorySwap
, andCpuShares
).This pull request proposes to use
systemd-run
to implement support for the three config options defined by docker. This follows the principal "Use established tools for the complicated bits".However, I am not convinced: