Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Type Decorators #1100

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Implement Type Decorators #1100

wants to merge 3 commits into from

Conversation

rvcas
Copy link
Member

@rvcas rvcas commented Feb 15, 2025

For a while now people have been asking for a way to modify certain aspects of their types. Some examples include controlling the underlying Constr tag for type constructors or enforcing that a ByteArray field is of a specific length.

We also plan on supporting encoding such that you can ask for a type to be treated as a PlutusData::List instead of a Constr. But code gen support for this will come later.

There are certain rules that will be enforced during typechecking like which tags can combine with which and whether or not they are being used improperly. The parser is left fairly liberal in this regard, it just accumulates as many tags as it sees and assumes the type checker will sort out what is right or wrong to use where.

Ex:

Custom tags

pub type Thing {
  @tag(420)
  Yes
  @tag(69)
  No
}

Enforcing Length

pub Wow {
  @len(32)
  address: ByteArray
}

@KtorZ
Copy link
Member

KtorZ commented Feb 17, 2025

My two lovelace:

  1. Nice to see decorators making their way. Regarding naming, I'd strive for consistency with stdlib/prelude as much as possible and typically opt for longer names (optimize for reading, not writing). In particular, we've never been referring to tag, but we have referred to constr_index or constructor_index -- and similarly, length instead of len seems reasonable given the shape of the standard lib.

  2. While it makes total sense to constructor index decorators to sit near record fields; it seems wrong to extend that to type invariants. Here, what len does is really to reduce the surface of valid bytearrays that can be ByteArray; it reduces the domain ... so, if we squint a little -> it's a type refinement. It is similar to defining ,say, a ByteArray32 module that would encapsulate a ByteArray of exactly 32 bytes and provide smart constructors and accessors into it.

So, I strongly believe this type of decorator should live around type definition, not around record field (or if anything, only as a syntactic sugar!). Internally though, it is the type that should carry this information. This will be a lot easier to implement as well, from type-checking to codegen and for the blueprint too.

@rvcas
Copy link
Member Author

rvcas commented Feb 18, 2025

@KtorZ the naming thing I understand but I'm not following on the field thing with len

Right now the plan is to have these only really work during casting that is to say "at the boundary".

So like we support two ways of using len if you double check the validation rules.

  1. on a field that is of type ByteArray only
  2. on a type alias for a ByteArray

the second one I believe fits into what you are describing? and from there is seems harmless to allow it to modify the field.

so the only runtime behaviors for this would be that "at the boundary" an extra length check is generated. but the intention is not to have this work outside of this situation because having this behavior generally in other situations like when you instantiate new objects could produce large amounts of unnecessary checks which eat away at your budget in a non-transparent way.

if I understood what you're saying, this does indeed do that but also a little more by being available on fields.

you can always use this on an alias and then use the alias as the field's type, so not much of a difference functionally speaking.

it's important to note as well, that as is implemented I consider the type checking done for this. meaning, I did not plan on doing any other things only the validation rules. after discussing with @MicroProofs we were happy with how it is and then we were going to just do code gen now. i didn't really want the type system aware of these decorators in any significant way which is why I don't perceive them to be refinement types. the length thing is mostly cute. the tag one is actually useful.

this means that

pub type Thing {
  @length(12)
  wow: ByteArray
}

fn who() {
  Thing { wow: #"" }
}

would NOT be a type error because trying to handle this in the general case (i.e. what if it's a variable) leads to a situation were we start to look into the abyss that is whole program analysis to avoid redundantly inserting runtime checks for the length which seems largely out of scope for the moment.

I also was going to leave this information completely out of the blueprints. the end goal is just so that people can interop with other datums via controlling the tag. so I genuinely believe this is not type information but structural information. used only for casting from PlutusData to known types.

and for the names, although we use the word constructor index in many places, I believe in practice most people call these tags and has been how people have asked for this feature. so it seemed natural to name it tag so that it aligns with the language people use when discussing datums and various CIPs. but I don't feel strongly about this, @index is also ok but feels weird. I would also like to avoid extra long names as they can be aesthetically kind of ugly. if you let me have tag, I'll give you length lol :)

@KtorZ
Copy link
Member

KtorZ commented Feb 18, 2025

@rvcas the naming thing I understand but I'm not following on the field thing with len

Sad story then, because the actual interesting feedback is about the field thing 😅 ... Let me give it another try.

Clarifying intent

Already, thanks for clarifying the intent: it is not a global type invariant, but only something that happens at the boundary. So right off the bat, I think the syntax might be even more confusing because it suggests a global invariant. To convey the right idea, we probably need the syntax to hint at data somehow.

pub type Thing {
  @data(constructor.index = 420)
  Yes
  @data(constructor.index = 69)
  No
}
pub type Thing {
  @data(bytearray.length = 12)
  wow: ByteArray
}

From there, it becomes clearer that these are only about the underlying serialization rules when interfacing with data. And we can quite easily extend this little DSL to cover, e.g. all of the validation rules specified in CIP-0057 (which would be relatively straightforward with this Decorator you've now introduced):

type Natural {
  @data(integer >= 0) 
  inner: Int
}

The case of literals

@rvcas this means that

pub type Thing {
  @length(12)
  wow: ByteArray
}

fn who() {
  Thing { wow: #"" }
}

would NOT be a type error because trying to handle this in the general case (i.e. what if it's a variable) leads to a situation were we start to look into the abyss that is whole program analysis to avoid redundantly inserting runtime checks for the length which seems largely out of scope for the moment.

I see where you're coming from, and the second part makes sense (and seems also way more approachable indeed). I would however argue that in this particular instance, we should raise a warning (or even an error); because it is straightforward to do. If assigning a literal into something we know to have a specific length, then a warning is the least we can do.

We could push it a bit to constants (since they're compile-time executions, we could also very much measure the length of bytes produced from a constant, and carry that information throughout...), but let's leave that as an idea for now.

Scope

My main point in the previous comment was regarding the scope of those decorators. Right now, the scope is on fields and type-aliases. I did miss the type-aliases part initially, though it doesn't really change my point. Since these decorators are about restricting the domain of acceptable inputs, they should belong to types. The constructor annotations are necessary done on fields and they already belong to records (i.e. DataType) definitions; so all good -- or almost, see point below.

The bytearray length however, should also be a type-level annotation; specifically internally. Imagine the following (yes, using my suggested syntax to gaslight you into adopting it as well -- although to be frank, I am more attached to the data part than I am to the rest; @data(len = 32) is less clear in my opinion but also works):

@data(bytearray.length = 32)
type RootHash = ByteArray

type Datum {
  root_hash: RootHash,
}

Now things get a little confusing because, internally, the Datum type will hold an empty vec of Decorator, whereas in practice, this is ought to be equivalent to:

type Datum {
  @data(bytearray.length = 32)
  root_hash: ByteArray,
}

If instead, types were carrying this information; then the Datum wouldn't contain a mere ByteArray, but it would contain a Decorated<ByteArray> (or something like that). From a code-gen perspective then, it's also much easier to navigate because you can handle decorated cases easily in the same way you would have with a dedicated type; while also ignoring them easily in places where it's not needed.

Plus, the annotations composes this way and the first example would do what one would normally expect. So rather than extending the definition of records or type-aliases; I strongly believe we should augment the definition of Type.

Blueprint

@rvcas: I also was going to leave this information completely out of the blueprints. the end goal is just so that people can interop with other datums via controlling the tag. so I genuinely believe this is not type information but structural information. used only for casting from PlutusData to known types.

Well, blueprints are specifically there to document structural information! The standard was also thought with all kind of validation rules which we never made use of (minimum, maximum, minLength, minItems, etc ...). This is actually a perfect use-case for those.

Remember that the point of blueprints is that one can programmatically follow them to build valid input data to a contract. Now, if my contract expects bytearrays to be of length 32, but the blueprint indicates it can take any bytearray, I am going to run into troubles. So now we have an easy way to embed this information into blueprint. And, it becomes quite straightforward if decorators are carried by types 😬

@KtorZ
Copy link
Member

KtorZ commented Feb 18, 2025

Note that, a possible evolution I see with these is now to alleviate some of the restrictions that come with opaque types. With a furnished-enough decorator language, we can possibly specify rather complex deserialisation behaviors and declare some specific opaque types safe.

Note

This goes way beyond this PR, but just recording it for the sake of having this conversation. Can move to a proper discussions if everyone likes the idea.

Imagine the followings:

/// Opaque type used to ensure the sign of the Rational is managed strictly in the numerator.
@data(allow)
pub opaque type Rational {
  numerator: Int,
  @data(integer > 0)
  denominator: Int,
}

or

@data(allow)
pub opaque type Value {
  @data(map.order = asc) 
  inner: Dict<PolicyId, Assets>,
}

@data(map.order = asc, map.each_value != 0) 
type Assets = Dict<AssetName, Int>

And now, you could use Rational and Value on the lhs of expect with a rhs being Data, or simply, in Datum and Redeemer. As a starter, we could simply make decorators elaborate-enough for the need of the standard lib, and then introduce new decorators on a case-by-case.

@rvcas
Copy link
Member Author

rvcas commented Feb 18, 2025

@KtorZ yea ok, that's fire. we'll do it like that then, I didn't realize it could go that far.

@MicroProofs
Copy link
Member

MicroProofs commented Feb 18, 2025

Sad story then, because the actual interesting feedback is about the field thing 😅 ... Let me give it another try.

Clarifying intent

Already, thanks for clarifying the intent: it is not a global type invariant, but only something that happens at the boundary. So right off the bat, I think the syntax might be even more confusing because it suggests a global invariant. To convey the right idea, we probably need the syntax to hint at data somehow.


pub type Thing {

  @data(constructor.index = 420)

  Yes

  @data(constructor.index = 69)

  No

}


pub type Thing {

  @data(bytearray.length = 12)

  wow: ByteArray

}

From there, it becomes clearer that these are only about the underlying serialization rules when interfacing with data. And we can quite easily extend this little DSL to cover, e.g. all of the validation rules specified in CIP-0057 (which would be relatively straightforward with this Decorator you've now introduced):


type Natural {

  @data(integer >= 0) 

  inner: Int

}

The case of literals

@rvcas this means that

pub type Thing {

@Length(12)

wow: ByteArray

}

fn who() {

Thing { wow: #"" }

}

would NOT be a type error because trying to handle this in the general case (i.e. what if it's a variable) leads to a situation were we start to look into the abyss that is whole program analysis to avoid redundantly inserting runtime checks for the length which seems largely out of scope for the moment.

I see where you're coming from, and the second part makes sense (and seems also way more approachable indeed). I would however argue that in this particular instance, we should raise a warning (or even an error); because it is straightforward to do. If assigning a literal into something we know to have a specific length, then a warning is the least we can do.

We could push it a bit to constants (since they're compile-time executions, we could also very much measure the length of bytes produced from a constant, and carry that information throughout...), but let's leave that as an idea for now.

Scope

My main point in the previous comment was regarding the scope of those decorators. Right now, the scope is on fields and type-aliases. I did miss the type-aliases part initially, though it doesn't really change my point. Since these decorators are about restricting the domain of acceptable inputs, they should belong to types. The constructor annotations are necessary done on fields and they already belong to records (i.e. DataType) definitions; so all good -- or almost, see point below.

The bytearray length however, should also be a type-level annotation; specifically internally. Imagine the following (yes, using my suggested syntax to gaslight you into adopting it as well -- although to be frank, I am more attached to the data part than I am to the rest; @data(len = 32) is less clear in my opinion but also works):


@data(bytearray.length = 32)

type RootHash = ByteArray



type Datum {

  root_hash: RootHash,

}

Now things get a little confusing because, internally, the Datum type will hold an empty vec of Decorator, whereas in practice, this is ought to be equivalent to:


type Datum {

  @data(bytearray.length = 32)

  root_hash: ByteArray,

}

If instead, types were carrying this information; then the Datum wouldn't contain a mere ByteArray, but it would contain a Decorated<ByteArray> (or something like that). From a code-gen perspective then, it's also much easier to navigate because you can handle decorated cases easily in the same way you would have with a dedicated type; while also ignoring them easily in places where it's not needed.

Plus, the annotations composes this way and the first example would do what one would normally expect. So rather than extending the definition of records or type-aliases; I strongly believe we should augment the definition of Type.

Blueprint

Well, blueprints are specifically there to document structural information! The standard was also thought with all kind of validation rules which we never made use of (minimum, maximum, minLength, minItems, etc ...). This is actually a perfect use-case for those.

Remember that the point of blueprints is that one can programmatically follow them to build valid input data to a contract. Now, if my contract expects bytearrays to be of length 32, but the blueprint indicates it can take any bytearray, I am going to run into troubles. So now we have an easy way to embed this information into blueprint. And, it becomes quite straightforward if decorators are carried by types 😬

I like the idea too seems good and we can remove alias to simplify things.

@rvcas
Copy link
Member Author

rvcas commented Mar 2, 2025

we can probably sneak in #1090 and #1108 into this one

@KtorZ
Copy link
Member

KtorZ commented Mar 3, 2025

Wasn't that the entire goal 😅?

@rvcas
Copy link
Member Author

rvcas commented Mar 4, 2025

@KtorZ yes, just linking the issues so I remember to close them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🫡 In Progress
Development

Successfully merging this pull request may close these issues.

3 participants