Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arbitrary_self_types + derive_coerce_pointee allows calling methods whose where clauses are violated #136702

Open
RalfJung opened this issue Feb 7, 2025 · 27 comments · May be fixed by #136776
Open
Assignees
Labels
F-arbitrary_self_types `#![feature(arbitrary_self_types)]` F-derive_coerce_pointee Feature: RFC 3621's oft-renamed implementation needs-triage This issue may need triage. Remove it if it has been sufficiently triaged.

Comments

@RalfJung
Copy link
Member

RalfJung commented Feb 7, 2025

Here's an example by @steffahn:

#![forbid(unsafe_code)]
#![feature(arbitrary_self_types, derive_coerce_pointee)]

use std::any::TypeId;
use std::marker::CoercePointee;
use std::marker::PhantomData;

#[derive(CoercePointee)]
#[repr(transparent)]
struct SelfPtr<T: ?Sized>(*const T);

impl<T: ?Sized> std::ops::Deref for SelfPtr<T> {
    type Target = T;
    fn deref(&self) -> &T {
        panic!("please don't call me, I just want the `Receiver` impl!");
    }
}

trait GetTypeId {
    fn get_type_id(self: SelfPtr<Self>) -> TypeId
    where
        Self: 'static;
}

impl<T: ?Sized> GetTypeId for PhantomData<T> {
    fn get_type_id(self: SelfPtr<Self>) -> TypeId
    where
        Self: 'static,
    {
        TypeId::of::<T>()
    }
}

// no `T: 'static` bound 🐈‍⬛ necessary
fn type_id_of<T: ?Sized>() -> TypeId {
    SelfPtr(&PhantomData::<T> as *const (dyn GetTypeId + '_) as *const (dyn GetTypeId + 'static)).get_type_id()
}
fn type_id_of_val<T: ?Sized>(_: &T) -> TypeId {
    type_id_of::<T>()
}

fn main() {
    let mut x = 1234;
    let reference = &mut x;
    let tid_of_ref = type_id_of_val(&reference);
    let closure = || *reference += 1;
    let tid_of_closure = type_id_of_val(&closure);
    dbg!(tid_of_ref, tid_of_closure);
}

The double as *const dyn GetTypeId as *const dyn GetTypeId is necessary; the second as changes the lifetime something short-lived to 'static. That means we call get_type_id on a type that does not satisfy Self: 'static.

@BoxyUwU has a similar example:

#![feature(arbitrary_self_types, derive_coerce_pointee)]

use std::marker::CoercePointee;

#[derive(CoercePointee)]
#[repr(transparent)]
struct MyPtr<T: ?Sized>(*const T);

use std::ops::Receiver;

impl<T: ?Sized> Receiver for MyPtr<T> {
    type Target = T;
}

trait Trait {
    fn foo(self: MyPtr<Self>)
    where
        Self: Send;
}

impl Trait for *mut () {
    fn foo(self: MyPtr<Self>) {
        unreachable!()
    }
}

fn main() {
    let a = 0x1 as *const *mut ();
    let a = a as *const dyn Trait as *const (dyn Trait + Send);
    MyPtr(a).foo();
}

Boxy's example is mitigated by a FCW lint. However, the lifetime example doesn't trigger a lint, and while the code above is "fine", we should at least have a good idea for why this cannot cause UB in other ways. (My gut feeling is that this can cause UB but I haven't tried.)

Cc @rust-lang/types

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 7, 2025
@RalfJung RalfJung added F-arbitrary_self_types `#![feature(arbitrary_self_types)]` F-derive_coerce_pointee Feature: RFC 3621's oft-renamed implementation labels Feb 7, 2025
@traviscross
Copy link
Contributor

cc @adetaylor @rust-lang/lang

@steffahn
Copy link
Member

steffahn commented Feb 7, 2025

My gut feeling is that this can cause UB but I haven't tried.

I've had the same feeling about this and then I did try for a while and didn't find a way to make this UB. Which proves nothing, but changed my gut feeling to “this might be fine”.

@compiler-errors
Copy link
Member

compiler-errors commented Feb 7, 2025

I think this boils down essentially to the question: Can you use an impossible non-higher-ranked type-outlives where clause to do UB if you don't have a value of the type?

That's because for you to do something sneaky, for example, you'd need to have a value of the type to do something with that type-outlives where clause (e.g. move it into a thread), and you can't do that without that type showing up in a function signature, and you can't do that (in an dyn-compatible way) unless it shows up in the args of a dyn Trait<...>, at which point the 'static where clause doesn't hold bc dyn Tr<A, B, ...> + 'o: 'static only if 'o: 'static and all A: 'static, B: 'static, ....

So I'm semi-reasonably convinced this is not possible yet. However, it does still fill me with a bit of worry that we don't change the way implied bounds work or something that could make this UB in the future...


One way we could prevent this becoming a problem is by strengthening the derive(CoercePointee)/DispatchFromDyn impl validation to not allow dispatching to *const T/*mut T inner pointer types without the arbitrary_self_types_pointers feature gate. This essentially punts the question onwards to that feature gate.

@Darksonn
Copy link
Contributor

Darksonn commented Feb 7, 2025

One way we could prevent this becoming a problem is by strengthening the derive(CoercePointee)/DispatchFromDyn impl validation to not allow dispatching to *const T/*mut T inner pointer types without the arbitrary_self_types_pointers feature gate. This essentially punts the question onwards to that feature gate.

The primary motivating use-case for derive(CoercePointee) is Arc, which is implemented as a wrapper around a raw pointer.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 8, 2025

The other option would be to disallow changing that lifetime in as casts, just like we already disallow changing marker traits. If raw pointers have an important invariant about their vtable, we shouldn't allow safe casts to alter things which are relevant for that vtable.

@Darksonn
Copy link
Contributor

Darksonn commented Feb 8, 2025

My guess is that disallowing lifetime casts in general would be too breaking, but perhaps we could do it only for traits with a function that has a trait bound that implies Self: 'static.

@Darksonn
Copy link
Contributor

Darksonn commented Feb 8, 2025

The other option here is to make #[derive(CoercePointee)] unsafe, but fundamentally I think that permitting the lifetime cast goes against what was decided in #101336. We shouldn't permit you to make casts that change which vtable functions are available.

@Darksonn
Copy link
Contributor

Darksonn commented Feb 8, 2025

I just had an idea: What about we disallow all raw pointer casts of raw dyn pointers (both lifetime and auto trait case), but only when the trait has any methods with a where clause on it?

@compiler-errors
Copy link
Member

That seems pretty arbitrary, and also interacts poorly with adding certain kinds of trivial where clauses to methods which today which is not a breaking change. I'd really rather we don't try to design some fix for this that's cookie-cutter designed just for this problem.

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 9, 2025
…o_to_object-hard-error, r=<try>

Make `ptr_cast_add_auto_to_object` lint into hard error

In Rust 1.81, we added a FCW lint (including linting in dependencies) against pointer casts that add an auto trait to dyn bounds.  This was part of work making casts of pointers involving trait objects stricter, and was part of the work needed to restabilize trait upcasting.

We considered just making this a hard error, but opted against it at that time due to breakage found by crater.  This breakage was mostly due to the `anymap` crate which has been a persistent problem for us.

It's now a year later, and the fact that this is not yet a hard error is giving us pause about stabilizing arbitrary self types and `derive(CoercePointee)`.  So let's see about making a hard error of this.

r? ghost

cc `@adetaylor` `@Darksonn` `@BoxyUwU` `@RalfJung` `@compiler-errors` `@oli-obk` `@WaffleLapkin`

Related:

- rust-lang#135881
- rust-lang#136702

Tracking:

- rust-lang#127323
- rust-lang#44874
- rust-lang#123430
@BoxyUwU
Copy link
Member

BoxyUwU commented Feb 9, 2025

For the record we already forbid casting dyn Trait<'a> to dyn Trait<'b>. So the only thing here is casting of the trait object lifetime. I would like to see how breaking forbidding object lifetime casts is in practice and see if we can make it a FCW which I think would be fine for stabilization if we think this can't be exploited in practice

@BoxyUwU BoxyUwU self-assigned this Feb 9, 2025
@BoxyUwU BoxyUwU linked a pull request Feb 9, 2025 that will close this issue
bors added a commit to rust-lang-ci/rust that referenced this issue Feb 9, 2025
…, r=<try>

[WIP] Forbid object lifetime changing pointer casts

Fixes rust-lang#136702

r? `@ghost`
@steffahn
Copy link
Member

steffahn commented Feb 9, 2025

@BoxyUwU is this even possible as a FCW? Realistically, it’ll create borrow-check errors, I’d be surprised if the borrow checker supports conditions like “I want 'a: 'b, but not really, but pls warn if this constraint isn’t met”

@Darksonn
Copy link
Contributor

Darksonn commented Feb 10, 2025

I've seen the cast been used a fair number of times with dyn Future to pass it across thread boundaries in cases where the lifetime is enforced in other ways. For example, this is common when people implement their own scoped spawns.

I guess the real rule here is that if you cast from dyn A to dyn B where dyn B has callable methods that dyn A does not, then that's a problem. Even if the concrete type does have the new method that's only on dyn B, we could imagine that the vtable doesn't have it since the vtable was generated for dyn A.

So the precise check we need is to go through each method on the trait and verify that if it's callable on dyn B, then it's also callable on dyn A.

@adetaylor
Copy link
Contributor

@Darksonn are you proposing that the check you describe is done at the time of the cast, or at method call time?

I've been digging into what a "valid vtable pointer" might mean, and this example surprises me:

use std::any::TypeId;
use std::marker::PhantomData;

#[allow(dead_code)]
trait GetTypeId {
    fn foo(&self);
    fn get_type_id(&self) -> TypeId
    where
        Self: 'static;
}

impl<T: ?Sized> GetTypeId for PhantomData<T> {
    fn foo(&self) {}
    fn get_type_id(&self) -> TypeId
    where
        Self: 'static,
    {
        TypeId::of::<T>()
    }
}

fn make_bad_get_type_id<T: ?Sized>(_: &T) -> *const (dyn GetTypeId + 'static) {
    &PhantomData::<T> as *const (dyn GetTypeId + '_) as *const (dyn GetTypeId + 'static)
}

fn make_good_get_type_id<'a, T: ?Sized>(_: &'a T) -> *const (dyn GetTypeId + 'a) {
    &PhantomData::<T> as *const dyn GetTypeId
}

fn main() {
    let mut x = 1234;
    let reference = &mut x;
    let ok_tid_of_ref = make_good_get_type_id(reference);
    let rf = unsafe { &*ok_tid_of_ref };
    // let ok_type_id = rf.get_type_id(); // does not compile
    let bad_tid_of_ref = make_bad_get_type_id(reference);
    let rf = unsafe { &*bad_tid_of_ref };
    let bad_type_id = rf.get_type_id();
    dbg!(bad_type_id);
}

The surprises are:

  • The compiler generates identical vtables for ok_tid_of_ref and bad_tid_of_ref, even though there's no legal way to call get_type_id on bad_tid_of_ref.
  • The line marked "does not compile" doesn't compile. Somehow, the lifetime information persists through the raw pointer so that the compiler knows it can't allow calls to get_type_id.

I'm assuming your intention is that the checks you propose are done at cast-time, whereas to be consistent with the above behavior, maybe it's OK to allow *const dyn SomethingInvalid to exist, but then we need to somehow check the lifetime information within it at method-call time. I'm not sure how to do that. Advice welcome.

@Darksonn
Copy link
Contributor

My intent is that the checks I propose are done at cast time. This reinforces that even raw pointers must always have a valid vtable for the type when they are wide. This is consistent with #101336.

As for your examples, *const (dyn GetTypeId + 'static) has a vtable with two valid functions in it, whereas *const (dyn GetTypeId + 'a) has a vtable containing one valid function, and an invalid function. Casting the latter to *const (dyn GetTypeId + 'static) involves treating an invalid function to a valid one, which is not okay if we require raw pointers to hold a valid vtable.

@compiler-errors
Copy link
Member

Please be aware that type- and lifetime-outlives where clauses may never be used to disqualify a trait from a vtable, since we erase lifetime information post-borrowck we literally do not (and will never) have the information to know whether it may hold or not. So no, today *const (dyn GetTypeId + 'a) has two valid functions in its vtable regardless of the value of 'a.

@Darksonn
Copy link
Contributor

The entire crux of this issue is that calling the get_type_id function pointer in the vtable of *const (dyn GetTypeId + 'a) is not okay and never will be, because the where clauses are not satisfied. Surely that implies that the get_type_id function in the vtable should be treated as invalid? That's the case, even if we happen to put a valid function in that vtable entry due to implementation details related to lifetime erasure.

Or do you mean to say that it's okay to call the get_type_id function pointer in the vtable of *const (dyn GetTypeId + 'a)?

@compiler-errors
Copy link
Member

compiler-errors commented Feb 10, 2025

The entire crux of this issue is that calling the get_type_id function pointer in the vtable of *const (dyn GetTypeId + 'a) is not okay and never will be, because the where clauses are not satisfied

Yes, for some substitutions of 'a it's not valid to call get_type_id, but obviously not all substitutions of 'a (e.g. 'a = 'static). That means that the only place I can truly verify whether a method is going to be impossible to call ever is post-monomorphization. There is a distinction here, though, because lifetimes are erased after borrowck. Unlike constructing vtables with impossible trait where clauses where we have to re-check those where clauses post-mono, this is not possible to do for type- and lifetime-outlives since we simply do not have sufficient information during codegen time.

The notion that lifetime generics in general are not treated on the same footing as type and const generics in the type system is pervasive, and type- and lifetime-outlives where clauses are never practically going to be checkable in codegen.

All that we can do with lifetimes is gather their side-effects during borrowck and validate them; this is why what Boxy's checking in #136776 is likely going to be the only way forward here.

@nikomatsakis
Copy link
Contributor

I'm trying to understand exactly what this example is showing me and in what way it truly requires arbitrary-self-types.

I guess the wacky thing we are now able to do is to combine

(1) the pre-existing rules that allow you to cast a *X to *Y without much of a safety check;
(2) with the fact that you can a *const dyn;
(3) with the fact that you can then dispatch based on self-type (potentially through dyn, even) without unsafe?

@Darksonn
Copy link
Contributor

Basically by combining the two features, you can call a function pointer in the vtable of a raw pointer safely.

@nikomatsakis
Copy link
Contributor

Yeah. I regret permitting the cast between raw pointers in quite the way we do, but I feel like this is the "heart of the problem"

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Feb 12, 2025

I've been thinking a bit about this problem. I think the root of what is going is something like this:

  • In the past, arbitrary casts between raw pointers was allowed. The reason was that we assumed that any data behind that raw pointer was going to be itself unsafe, so really the proof obligation occurs there, the idea being that you have to be able to justify the provenance of your pointer and also that its type is accurate.
  • With arbitrary-self-types+coerce-pointee, however, it is possible to make a method invocation based on the vtable. This doesn't in-and-of-itself create UB, but—as we've seen—it can give you access to an environment with false lifetime-based premises (notably that Self: 'static).

I presume the reason that @steffahn had trouble creating UB is that to actually access the raw pointer, you still have to use unsafe, at which point you have that proof obligation, and the idea is that to be able to responsibly prove it, you'd have to rule out this kind of cast.

But we can see the risk from those predicates, in that it can allow access to safe abstractions that are gated on 'static (like type_of).

So I guess I can see the answer to "why is this a problem NOW when it wasn't before". I still feel like the root problem is indeed casting raw pointers in safe code, but migrating away from that will be tricky.

Does this sound like an accurate summary, @RalfJung or others?

@RalfJung
Copy link
Member Author

RalfJung commented Feb 12, 2025 via email

@steffahn
Copy link
Member

I presume the reason that @steffahn had trouble creating UB is that to actually access the raw pointer, you still have to use unsafe, at which point you have that proof obligation, and the idea is that to be able to responsibly prove it, you'd have to rule out this kind of cast.

To be clear, that accessing the pointer’s target would require unsafe is a given; the open question to begin with – as far as I understand – was only about whether or not there is unsoundness that doesn’t involve dereferencing the pointer. To this end, as far as I remember my experience trying to break soundness here (which must have been several weeks ago), it mirrors the description that @compiler-errors gave above, i.e.

for you to do something sneaky, for example, you'd need to have a value of the type to do something with that type-outlives where clause (e.g. move it into a thread), and you can't do that without that type showing up in a function signature, and you can't do that (in an dyn-compatible way) unless it shows up in the args of a dyn Trait<...>, at which point the 'static where clause doesn't hold bc dyn Tr<A, B, ...> + 'o: 'static only if 'o: 'static and all A: 'static, B: 'static, ....

So I'm semi-reasonably convinced this is not possible yet.

@BoxyUwU
Copy link
Member

BoxyUwU commented Feb 12, 2025

I wrote down my thoughts on arbitrary self types from a soundness pov which involved writing down stuff about this issue: here. Relevant part would be:

Something that is interesting to think about is why raw pointers allowing safe casting of object types hasn't caused problems before arbitrary_self_types/derive_coerce_pointee (#127323, #136702). The answer to this, I believe, is just that all types in std that allow performing vtable calls also require unsafe to be constructed from a raw pointer.

For example while you can:

  • Create some Box<Foo>
  • Coerce it to Box<dyn Trait>
  • Call Box::into_raw
  • Cast to*const dyn Trait + Send + Sync + 'static
  • Call Box::from_raw
  • Do a vtable call to some method that required Self: Send + 'static

The Box::from_raw step requires unsafe, and this is true of all smart pointers in std that allow for vtable calls. What arbitrary_self_types/derive_coerce_pointee do is allow you to safely construct a smart pointer type that can make vtable calls.

If it's going to be safe to make vtable calls, and safe to construct types that can make vtable calls, then it just has to be unsafe to have an object type that allows methods to be called on its vtable that shouldnt be called1.

In some sense raw pointers and vtable calls have never been sound, but as the unsoundness only affects implementors of std/core and the traits involved were relatively obviously "special" it just didn't matter very much.


I need to spend some time looking at the results of the crater run myself I know @RalfJung has looked over them already though. Hopefully we can just forbid safe pointer casting of object lifetimes though if we can't there are probably other paths forwards

Footnotes

  1. An alternative might be for derive(CoercePointee) to require the field containing the vtable to be unsafe (though such a feature is not even implemented unstabley). This would effectively force construction of the smart pointer to be unsafe (slash safely encapsulated)

@adetaylor
Copy link
Contributor

I also went through the same motions of trying to do this using any of the existing std types, and reached the same conclusion.

(The other experiment I was doing was trying to determine if the sheer existence of *const dyn SomethingInvalid could possibly have been problematic prior to arbitrary self types/derive_coerce_pointee. I was trying to see if I could persuade the type system "extract" a lifetime from the *const dyn to use it elsewhere, and cause unsoundness that way. I couldn't figure out a way to do this either.)

@steffahn
Copy link
Member

steffahn commented Feb 12, 2025

If we focus on the case of parameter-less trait bounds (which also looks like it’s the most common use-case) [and perhaps additionally traits with type parameters that are : 'static; or more concretely even lifetime-free, so e.g. the () parameter of FnOnce() is covered] then the question of soundness boils down to the question of:

  • will it always be true that every type T has a corresponding type T0: 'static, identical in representation, obtained by “replacing all free lifetime in T with 'static
    • in particular, could this “replace all free lifetimes with'static” result in an invalid type (e.g. violation of implied bounds) – or more generally, considering other trait bounds on the methods are possible, could any trait bound (or constraint) that doesn't involve other free lifetimes, be met by T but not by T0?
  • even if all of the above is true, would exposing this fact perhaps still be able to break certain API’s soundness, anyway?
    • probably very technically that’s a clear yes; e.g. an API could offer a sealed trait Foo and no visible implementor; and some function fn foo<T: Foo + 'static> that’s UB to call; and another function fn bar(f: impl Callback) calling some trait Callback { fn call<T: Foo>(); }.
      This way the user only ever sees an implementor of Foo through this generic API, and this generic API never provides T: 'static.
      The assumption is thus that foo is sound because it’s impossible to call (using safe Rust), but with a way to turn T: Foo into a T0: 'static + Foo, it becomes callable.1
      • the relevant question is if any practical API might want to make reasonably want to depend on these soundness assumptions?

If all of the above is true / unproblematic, one could even expose this via a trait, I suppose

trait RemoveFreeLifetimes {
    type WithoutFreeLifetimes: ?Sized + 'static;
}
// automatically implemented by the compiler
// for every type `T: ?Sized`

Then for any type T, the abovementioned corresponding T0 is <T as RemoveFreeLifetimes>::WithoutFreeLifetimes.

(With such an API, you still don’t really express the “T: TraitBound”-implies-T0: TraitBound relation for lifetime-free TraitBounds)

One can furthermore think of *const T as *const dyn Trait as *const (dyn Trait + 'static) as equivalent to *const T as *const <T as RemoveFreeLifetimes>::WithoutFreeLifetimes as *const (dyn Trait + 'static).


I’m not sure if the future-compatibility concerns @compiler-errors mentioned would also apply to this case of parameter-less (or lifetime-free-only parameter) trait bounds. Though it’s also not clear to me whether it’s technically possible to begin with2 to introduce an error only for cases outside of this more limited scope.

Footnotes

  1. This should be possible to do by using *const T to *const dyn Trait to *const dyn (Trait + 'static) casts with some trait Trait { fn call(self: *const Self) where Self: 'static; } and a impl<T: Foo> Trait for Foo { fn call(self: *const Self) where Self: 'static { foo::<Self>(); } }, relying on the arbitrary-self-type + casts mechanism this issue is about.

  2. While also not changing the semver contracts of Rust

@steffahn
Copy link
Member

steffahn commented Feb 13, 2025

which must have been several weeks ago

nevermind that… months ago: https://rust-lang.zulipchat.com/#narrow/channel/136281-t-opsem/topic/Can.20.60*mut.20dyn.20T.20.2B.20'a.20-.3E.20*mut.20dyn.20T.20.2B.20'b.60.20cast.20be.20exploited.3F/near/424767839

Edit: Actually, I think both might be true, I do believe that when I came up with the TypeId example stuff last month, I re-did some of this exploration. Yay for re-doing work of one’s past self 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-arbitrary_self_types `#![feature(arbitrary_self_types)]` F-derive_coerce_pointee Feature: RFC 3621's oft-renamed implementation needs-triage This issue may need triage. Remove it if it has been sufficiently triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants