Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carry #1111: specs-go/config: add Landlock LSM support #1241

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Zheaoli
Copy link

@Zheaoli Zheaoli commented Jan 2, 2024

Carry #1111

@Zheaoli
Copy link
Author

Zheaoli commented Jan 2, 2024

cc @l0kod @AkihiroSuda @vbatts @gnoack @cyphar

@AkihiroSuda AkihiroSuda added this to the v1.2.0 (tentative) milestone Jan 2, 2024
@AkihiroSuda
Copy link
Member

cc @kailun-qin (author of #1111) PTAL

@AkihiroSuda
Copy link
Member

Before merging this PR, can we have a POC implementation of runc (or crun, youki) to confirm the design?

@Zheaoli
Copy link
Author

Zheaoli commented Jan 9, 2024

opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it.

I will carry it

@@ -340,6 +340,26 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR should be carefully updated following the latest advances of landlock (rather than simply carried) as some of the spec proposals might be stale now. Pls see https://docs.kernel.org/userspace-api/landlock.html and https://github.com/landlock-lsm/go-landlock for details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you got time to continue this PR or I can continue carry(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls go ahead updating this PR. I'll need some time to follow up and dive into the updates of landlock (even for review).

@l0kod
Copy link

l0kod commented Jan 10, 2024

Thanks for working on this!

You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/

FYI, I plan to review these changes next week.

@gnoack
Copy link

gnoack commented Jan 10, 2024

Thanks for working on this!

You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/

FYI, I plan to review these changes next week.

FWIW the most common implementation problem so far has been the use of the "refer" right, whose logic unfortunately works slightly differently than for other access rights. I tried to explain this in https://blog.gnoack.org/post/landlock-best-effort/ last year.

With Linux 6.7, Landlock also has gained support for restricting the use of bind(2) and connect(2) for TCP. (Landlock ABI version 4)

@Zheaoli
Copy link
Author

Zheaoli commented Jan 10, 2024

FYI, I plan to review these changes next week.

Thanks for the important information, I will update this spec in this week ASAP I can

@utam0k
Copy link
Member

utam0k commented Jan 11, 2024

I wonder what the behavior should be when a new process(e.g. docker exec) jumps into the existing container.

@Zheaoli Zheaoli marked this pull request as draft February 27, 2024 16:33
@Zheaoli Zheaoli force-pushed the manjusaka/carry-1111 branch from 56a5a1d to 6730eb2 Compare March 1, 2024 03:18
@Zheaoli Zheaoli marked this pull request as ready for review March 1, 2024 03:23
@Zheaoli
Copy link
Author

Zheaoli commented Mar 1, 2024

I have update the PR, PTAL @kailun-qin @gnoack @l0kod

@Zheaoli Zheaoli requested a review from kailun-qin March 1, 2024 03:24
// LandlockFSAction used to specify the FS actions that are handled by a ruleset or allowed by a rule.
type LandlockFSAction string

// Define actions on files and directories that Landlock can restrict a sandboxed process to.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation nit: The word "define" is usually unnecessary in docstrings (all of the exported symbols are defining something after all)

config.md Outdated
The `ruleset` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAssessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are avaliable when the ABI version >= 4. the behavior of the NETWORK typed actions is not used when the ABI version is less than 4 will depend on the **`disableBestEffort`**)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure you meant "Access" here?

config.md Outdated
The `ruleset` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAssessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are avaliable when the ABI version >= 4. the behavior of the NETWORK typed actions is not used when the ABI version is less than 4 will depend on the **`disableBestEffort`**)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The sentences in parenthesis have broken grammar)

config.md Outdated
1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
* **`disableBestEffort`** (bool, OPTIONAL) the `disableBestEffort` field disables the best-effort security approach for Landlock access rights.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is slightly confusing in my mind that the boolean condition has the "inverted" logic here. I assume that in this config format, it is not possible to call this variable "bestEffortFallback" and to make true the default value instead?)

@gnoack
Copy link

gnoack commented Mar 1, 2024

High level remark: Looks good so far. I added a bunch of documentation nits.

As for Landlock-specific comments, I would recommend to re-think some of the type names, and I left comments in these places.

I think the bigger chance for mistakes is in the implementation itself. - Are you planning to add that in the same PR, or will that come separately?

@Zheaoli
Copy link
Author

Zheaoli commented Mar 1, 2024

I think the bigger chance for mistakes is in the implementation itself. - Are you planning to add that in the same PR, or will that come separately?

I think it would be great to implement it in a single PR(Actually, I don't know which is better to split the PR. Depend on the ABI version or else)

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it.

I will carry it

As there are no tests there, can you share the example config you used to test it?

I'm curious about overlayfs and the behavior in old runtimes.

@@ -340,6 +340,52 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the runtime doesn't know this field? I guess in runc it will be completely ignored, right? In that case, I think we at least need to expose landlock support via the features command too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the runtime will ignore the field. I think it's not necessary to add feature command like other optional features.

Copy link
Member

@rata rata Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli why not? This is a security feature, I can see use cases where upper layers want to enforce this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for replying late. I misunderstand something before. I think it's necessary to export the feature we support for landlock in features command

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't done, right?

config.md Outdated
Comment on lines 350 to 363
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work on overlayfs? https://docs.kernel.org/userspace-api/landlock.html#bind-mounts-and-overlayfs

It is very widely used, can you clarify what works and what doesn't?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not testing it on overlayfs yet. But I think it still work on overlayfs. Because we will always target to restrict the rule on the merged hierarchy.

Copy link
Author

@Zheaoli Zheaoli Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want to figure out is it working on the following circumstance?

  1. We got directories A and B, A and B will merged into C
  2. We give the process write permission for A and B
  3. The process still have write permission for C or not?

Copy link
Member

@rata rata Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understandig of the link is, if C is the merged dir by overlayfs, then no, you don't have permissions to write even if you can write to A and B.

But I think if we use C for the rules, it might work. I don't know if we also need permissions in A and B (not 100% clear to me when reading the doc), in which case it will be tricky.

IMHO, it will be nice to understand that, as overlay is a widely used fs for containers. But not sure that will change something on how we support it in OCI, so maybe it is okay?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Zheaoli/py-landlock/blob/main/examples/demo.py

I have some overlayfs test case here. PTAL if you got time

Copy link
Member

@rata rata May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli I can try to download the python modules and run that locally. But it would be way simpler if you just share what works and what doesn't :)

At the OCI layer we don't know the lowerdir nor the upperdir, we just know the merged dir. It will be great to find if we apply a landlock policy restricting the merged dir, what effect that has on reading files AND also writing files.

Knowing if it works, or its limitations, is key to evaluate a PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. Here's the full test result:

  1. A and B are lower dirs, and C is a merged dirs.
  2. Policy to C would no longer affect A and B. The reverse action would be the same(policy on A and B would not affect C).
  3. The write and read (or any permission) on C is not depend on the permission on A and B. The permission between C and AB is independent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli thanks! This LGTM.

I was playing with this, to make sure I'm not missing any case and indeed it seems to work just fine.

To play further, I used the example from: https://github.com/landlock-lsm/go-landlock/blob/efb66220540a9ef86aa0160d15e55f429d5b94d9/examples/go-landlock-configurable/main.go

With this json config:

JSON config:
{
  "forbidden_access": [
    "read_dir",
    "write_file",
    "make_reg"
  ],
  "exceptions": [
    {
      "paths": [
        "./rootfs-merged/tmp"
      ],
      "permitted_access": [
        "read_dir",
        "make_reg",
        "write_file"
      ]
    }
  ],
  "best_effort": false
}

And rootfs-merged is an overlayfs that I mounted manually. I tried it works as expected (can't create files on any paths except on ./rootfs-merged/tmp/, can't ls/read files except in that path), also if I chroot into that (chroot rootfs-merged /bin/sh).

The files are created by overlayfs in the upper dir and all is as expected.

@AkihiroSuda AkihiroSuda requested a review from rata May 17, 2024 17:44
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a cursory look, I need to look later at what field are exactly proposed. But this seems in the right direction to me.

Added some comments/issues that I see.

config.md Outdated
1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
* **`disableBestEffort`** (bool, OPTIONAL) the `disableBestEffort` field disables the best-effort security approach for Landlock access rights.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't completely work, sadly, due to some historic reasons. I've mentioned this also here: #1241 (comment)

The thing is, if the container runtime doesn't know about landlock, the whole landlock section will be silently ignored. So, you might think you will have this landlock policy or an error, but the truth is, if the runtime doesn't know about landlock, you can end up without a landlock policy at all.

For this reason, we should expose this in the features section too. This way, users of the OCI runtime can know if the runtime knows about landlock and handle that case properly. Without exposing it there, it is not really usable for this security features.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we can remove the disableBestEffort field and export the features we support in features command instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do boths, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can output the feature check like this ?

{
    "linux": {
        "landlock": {
            "enabled": true,
            "bestEffort": {
                "enabled": true // False
            },
            "fs":{
                "enabled": true,
                "actions": [
                    "exectute",
                    "write_file",
                    "read_file",
                    "read_dir",
                    "remove_dir",
                    "remove_file",
                    "make_char",
                    "make_dir",
                    "make_reg",
                    "make_sock",
                    "make_fifo",
                    "make_block",
                    "make_sym",
                    "refer",
                    "truncate"
                ]
            },
            "network": {
                "enabled": true,
                "actions": [
                    "bind",
                    "connect"
                ]
            }
        }
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to expose all of that. I think just the landlock object with the enabled field will be enough.

Then, if the runtime doesn't recognize some of the constants or if the kernel ABI is not the one needed, then make the runtime fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

config.md Outdated
Comment on lines 350 to 363
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli thanks! This LGTM.

I was playing with this, to make sure I'm not missing any case and indeed it seems to work just fine.

To play further, I used the example from: https://github.com/landlock-lsm/go-landlock/blob/efb66220540a9ef86aa0160d15e55f429d5b94d9/examples/go-landlock-configurable/main.go

With this json config:

JSON config:
{
  "forbidden_access": [
    "read_dir",
    "write_file",
    "make_reg"
  ],
  "exceptions": [
    {
      "paths": [
        "./rootfs-merged/tmp"
      ],
      "permitted_access": [
        "read_dir",
        "make_reg",
        "write_file"
      ]
    }
  ],
  "best_effort": false
}

And rootfs-merged is an overlayfs that I mounted manually. I tried it works as expected (can't create files on any paths except on ./rootfs-merged/tmp/, can't ls/read files except in that path), also if I chroot into that (chroot rootfs-merged /bin/sh).

The files are created by overlayfs in the upper dir and all is as expected.

config.md Outdated
3. ABI version >= 3:
1. truncate
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict.
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portBeneath? Is this for all ports below the number? Or is this just a c&p mistake?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's for all ports below the number. Actually I follow style with pathBeneath

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay, thanks. I'm in an off-site this week, so I'll check this up next week.

Copy link
Member

@rata rata Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make sense out of that.

If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.

But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?

Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, exactly -- this is still confusing and would better be named "net port" in line with the landlock.h header file.

It does not permit access for all ports below a given number, but only for that specific port number itself, and the word "hierarchy" is also misleading, as there is nothing hierarchical about that (probably copied over from the file system access rights documentation).

@@ -340,6 +340,52 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping?

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli thanks for the PR! Added some comments.

I think the config.json interface needs more work, but I'd love for this to be ready.

I read @gnoack comments, that weren't addressed, and I fully agree with them. Please take a look their review comments.

config.md Outdated
3. ABI version >= 3:
1. truncate
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict.
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules.
Copy link
Member

@rata rata Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make sense out of that.

If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.

But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?

Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132

@AkihiroSuda
Copy link
Member

Needs rebase

@AkihiroSuda
Copy link
Member

Needs another rebase

@AkihiroSuda
Copy link
Member

@Zheaoli Could you rebase this? 🙏
Let's merge this soon

@Zheaoli
Copy link
Author

Zheaoli commented Dec 16, 2024

Could you rebase this? 🙏
Let's merge this soon

Sure, I will carry this today and fix the review idea during this week.

(BTW I'm watching new job opportunities)(Free man today

@Zheaoli Zheaoli force-pushed the manjusaka/carry-1111 branch 2 times, most recently from 7cea8e2 to af20627 Compare December 20, 2024 09:54
kailun-qin and others added 2 commits December 20, 2024 17:57
Linux kernel 5.13 adds support for Landlock Linux Security Module (LSM).
This allows unprivileged processes to create safe security sandboxes
that can securely restrict the ambient rights (e.g. global filesystem
access) for themselves.

opencontainers#1110

Co-authored-by: Zheao Li <me@manjusaka.me>
Signed-off-by: Kailun Qin <kailun.qin@intel.com>
Signed-off-by: Manjusaka <me@manjusaka.me>
Signed-off-by: Manjusaka <me@manjusaka.me>
@Zheaoli Zheaoli force-pushed the manjusaka/carry-1111 branch from af20627 to 77ec826 Compare December 20, 2024 09:57
@Zheaoli Zheaoli requested review from gnoack and rata December 20, 2024 09:58
@Zheaoli
Copy link
Author

Zheaoli commented Dec 20, 2024

cc @AkihiroSuda @gnoack @utam0k @rata I made a huge update based on the reivew idea. PTAL

@utam0k
Copy link
Member

utam0k commented Dec 22, 2024

I'd like to know the use cases with LandLock.
I'm worried that it will become a feature that no one uses, and it will only increase the burden of maintenance. I'm not familiar with this, so please let me know 🙏

@AkihiroSuda
Copy link
Member

Can we get the POC opencontainers/runc#3194 synced with this PR, so as to verify the feasibility and usability?

@Zheaoli
Copy link
Author

Zheaoli commented Dec 23, 2024

Can we get the POC opencontainers/runc#3194 synced with this PR, so as to verify the feasibility and usability?

SGTM, Let me carry the runc#3194

@rata
Copy link
Member

rata commented Jan 21, 2025

@Zheaoli thanks! It's very hard to review if you just "do huge updates" but don't really answer the comments. It's not easy to know what you tackled, what you didn't and why, and what you might just forgotten. If you can take the time to answer each comment, it would be greatly appreciated.

I had a quick look, it does look better from a 10k feet point of view at least, but the part of exposing this in features is completely missed. I fear without it it might not be possible to use it in some scenarios, as old runtimes without this will just ignore this. This doesn't play well with the disable_best_effort thing. It seems to do one thing (enforce it), but it doesn't (the old runtime will just ignore things it doesn't know about).

@Zheaoli Can you have a look at that?

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement on the previous iterations, but I don't think we should merge this, I think we still need changes :(

Besides some errors on how landlock behaves (and some typos that I removed the comments to make the review shorter, as there are other more important blockers IMHO), I don't like the idea of having the possible words in the spec as the only way to configure landlock (e.g. read_file and all allowed_access, the same for the network section, etc.).

In a lot of kernel releases (I think most) there were landlock changes, I mentioned the example of ioctls as the last recent example (this PR was updated in Dec and is already old). I think we need a better way to have something that is more "future proof", because the linux kernel is released every 6-8 weeks, we can't update the spec and runtimes so often for any new thing that can be restricted in a new kernel.

I suggested using the plain numbers, as defined in the landlock.h uapi header. This way, we can expose nice names in the go bindings, but users like containerd and so can still use new features if they just use the numbers (they can define a friendly constant for the number, etc.). This removes the runtime-spec and runtimes to create new releases to support every new "string" supported. We will need a new release for things like "now we can restrict a whole new category", like the network stuff that was added at some point in the past, but that isn't as often and seems like a reasonable trade-off for me. But I'm open to other ideas, of course :-)

* **`handledAccess`** (object, OPTIONAL) specifies the access rights that will be restricted by the ruleset.
The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what landlock does? I thought it will allow everything that is not handled here. Otherwise, backwards compatibility (when you add a new access type here, like kernel X added access type Y) will be impossible, right?

Doing a local experiment, this is not indeed what landlock does, so we can't promise this. It will be the case if we don't mention it in path_beneath. But if we mention it here, this is not true. Maybe it is just c&p that went unnoticed?

The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAccessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are available when the ABI version >= 4. The behavior when the ABI version is less than 4 will depend on the **`enableBestEffort`**)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the last parenthesis. That is the case with all the things, like some handledAccessFs types might not be supported, etc.

The `rules` currently contains the following types:
* **`pathBeneath`** (array of objects, OPTIONAL) is an array of the file-hierarchy typed rules.
Entries in the array contain the following properties:
* **`allowedAccess`** (array of strings, OPTIONAL) is an array of FS typed actions that are allowed by a rule. The actions are grouped by the ABI version in the following description:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want strings or ints here? If we use ints, we can handle new values without requiring to add them to the spec. Of course, we will need to provide a friendly name for known things in the go bindings, but until a new release is done, people can use a number and it will just work.

Furthermore, what is here is already out of date. ABI 5 now is supported by new kernels (to restrict ioctls). So I think we need a more "future proof" spec :-)

What do others think?

* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict.
* **`networkPort`** (array of objects, OPTIONAL) is an array of the network socket rules.
Entries in the array contain the following properties:
* **`allowedAccess`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are allowed by a rule. The actions are grouped by the ABI version in the following description:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem here, we might get away with ints?

1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
* **`enableBestEffort`** (bool, OPTIONAL) the `enableBestEffort` field disables the best-effort security approach for Landlock access rights.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, without exposing landlock enabled as a bool in the features subcommand, a high level container runtime (like containerd or crio) can't know if the low-level container runtime implementing this (like crun or runc) supports landlock at all. And in that case, the runtimes that don't support landlock will just ignore this whole section and the container will be created just fine, no landlock enforced.

Therefore, I think we really need to expose this in the features subcommand too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we need to be careful to not give users a false sense of security.

* **`enableBestEffort`** (bool, OPTIONAL) the `enableBestEffort` field disables the best-effort security approach for Landlock access rights.
This is for conditions when the Landlock access rights explicitly configured by the container are not supported or available in the running kernel.
If the best-effort security approach is enabled (`false`), the runtime SHOULD enforce the strongest rules configured up to the current kernel support, and only be [logged as a warning](runtime.md#warnings) for those not supported.
If disabled (`true`), the runtime MUST [generate an error](runtime.md#errors) if one or more rules specified by the container is not supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the other comment, this only work for runtimes that use this spec version. For runc 1.1 or 1.2 and probably 1.3, it's mandated by the spec to ignore unrecognized fields. Therefore, this is not safe when you take into account that we need to support older runtimes from a high-level container like containerd or crio


* **`handledAccess`** (object, OPTIONAL) specifies the access rights that will be restricted by the ruleset.
The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int ideas can apply here too.

The `handledAccess` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAccessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are available when the ABI version >= 4. The behavior when the ABI version is less than 4 will depend on the **`enableBestEffort`**)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem, what about using ints for this too?

1. ABI version >= 4:
1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a typo, we want int for ports, right?

@l0kod
Copy link

l0kod commented Feb 6, 2025

Besides some errors on how landlock behaves (and some typos that I removed the comments to make the review shorter, as there are other more important blockers IMHO), I don't like the idea of having the possible words in the spec as the only way to configure landlock (e.g. read_file and all allowed_access, the same for the network section, etc.).

In a lot of kernel releases (I think most) there were landlock changes, I mentioned the example of ioctls as the last recent example (this PR was updated in Dec and is already old). I think we need a better way to have something that is more "future proof", because the linux kernel is released every 6-8 weeks, we can't update the spec and runtimes so often for any new thing that can be restricted in a new kernel.

I suggested using the plain numbers, as defined in the landlock.h uapi header. This way, we can expose nice names in the go bindings, but users like containerd and so can still use new features if they just use the numbers (they can define a friendly constant for the number, etc.). This removes the runtime-spec and runtimes to create new releases to support every new "string" supported. We will need a new release for things like "now we can restrict a whole new category", like the network stuff that was added at some point in the past, but that isn't as often and seems like a reasonable trade-off for me. But I'm open to other ideas, of course :-)

Using keywords with a well-defined semantic should be favored, but I agree that giving the possibility for users to use currently-unknown-to-the-runtime access rights would enable them to benefit from more security features. I'd like to be able to specify both keywords and numbers though. Current Landlock users are looking for groups of access rights (e.g. read-only vs. read-write), and specific keywords could also be used for that (taking into account compatibility).

I started working on a standalone tool and library that reads a JSON configuration and creates a Landlock sandbox. The overall approach looks pretty similar to this specification and I'd like to combine our effort to get a common configuration format for Landlock. This would also be easier to synchronize and maintain wrt. to the kernel changes (that I maintain). The configuration should be a bit more flexible, closer to the kernel's UAPI, handle file descriptors as well, and ease handling of backward and forward compatibility. I plan to release an initial experimental version next week and I'll add a comment to this PR to point to the repository.

@rata
Copy link
Member

rata commented Feb 6, 2025

@l0kod +1 on not creating two different json specs to describe a landlock policy. I'll monitor this for when you post the link to the experimental release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants