Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify input and local declarations in data model #799

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented May 22, 2024

Closes #786

This change is a part of what was discussed in #718, i.e. the part that didn't get any critique in the issue.

In the syntax, keeping .input and .local as separate operations makes really good sense from a readability point of view. In the data model, as those concerns are not present, there is no reason to keep the separate type: 'input' and type: 'local' blocks. Both will almost certainly be processed the same way, and the input/local-ness can be inferred when necessary by

type = value.arg?.type == 'variable' && value.arg.name == name ? 'input' : 'local'

This also drops the need for a VariableExpression, which simplifies the TS & JSON Schema definitions quite a bit.

@eemeli eemeli added the data model Issues related with MF data Model label May 22, 2024
Copy link
Collaborator

@catamorphism catamorphism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've commented on this before, but the implication of this change is that either:

  1. the parser has to check for all duplicate declaration errors
  2. duplicate declaration error checking has to be split, with a special case in the parser for .local $x = {$x :func}, and then the more general check in a subsequent error-checking pass.

IMO it's useful for .local $x = {$x :func} to be representable in the data model, so that all duplicate declarations can be checked in a single pass that's separate from the parser.

@@ -78,13 +78,13 @@ type Message = PatternMessage | SelectMessage;

interface PatternMessage {
type: "message";
declarations: Declaration[];
declarations: (Declaration | UnsupportedStatement)[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the declarations field be renamed to something else? I'm not sure what, but it seems confusing to me that a list called "declarations" can include something that's not a declaration.

Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does seem to introduce an unnecessary complexity (and thus burden) on implementations, as @catamorphism. As I understand it, this PR would require that an implementation's parser would have to complect together the separate concerns of parsing and duplicate declaration.

I like the attempt at simplifying the spec, but it currently doesn't seem to me to an improvement b/c it undoes an important distinction. Although, this discussion is useful and perhaps can be converted into important documentation explaining why these constructs have to remain separate. WDYT?

FWIW, for the record, even though the PR description says that it represents the uncritiqued parts of the discussion in #718, I see concerns on this topic in that discussion that didn't get resolved.

If the `value` has a `VariableRef` `arg` with the same `name`
as the `Declaration`,
it represents an _input-declaration_.
Otherwise, it represents a _local-declaration_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of the blanket value: Expression, which requires this wording here. I'd much prefer if the data model didn't need these additional validity constraints. I think the current data model achieves this to a larger extent.

I wouldn't want anyone to think that it's OK to have .input {:func} or .input {|1|} -- which I think the proposed change does suggest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is in fact removing a MUST validity constraint on InputDeclaration, in addition to removing a similar implicit constraint on the LocalDeclaration that it must not contain in its value a VariableExpression which uses the same name as used at the top level of the declaration.

Could you explain how this proposed language suggests that something like .input {:func} is ok? As it explicitly requires the expression to have a VariableRef argument with a matching name, isn't that the same set of requirements that are expressed in the syntax?

annotation?: FunctionAnnotation | UnsupportedAnnotation;
attributes: Attributes;
}
type Expression = OperandExpression | AnnotationExpression;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting suggestion worth pursuing in a separate PR.

@mihnita
Copy link
Collaborator

mihnita commented Jul 22, 2024

I don't think these should be unified, they look the same, but are different concepts.

For example a translator can define as many local variables as they want.
But they MUST NOT define inputs, as inputs correspond to something in the calling code.

Similar to programming languages:

function foo(String arg1, int arg2 = 42) { // input
   int bar = 7 // local
   int baz = arg2 // local
   String locStr = arg1 // local

TLDR: not unify, they look superficially the same, but are different concepts.

@eemeli
Copy link
Collaborator Author

eemeli commented Jul 29, 2024

Responding to @echeran:

I like the attempt at simplifying the spec, but it currently doesn't seem to me to an improvement b/c it undoes an important distinction.

With this change, the .input and .local declarations are still distinct and distinguishable from each other in the data model, as the sets of valid "input" and "local" object contents are exclusive of each other. This means that we're removing a current duplication of data.

Responding to @mihnita:

I don't think these should be unified, they look the same, but are different concepts.

For example a translator can define as many local variables as they want. But they MUST NOT define inputs, as inputs correspond to something in the calling code.

As discussed during the 2024-07-15 call, there are situations in which it may be valid for a translator (or their tooling) to add a .input declaration where one did not exist previously.

I can provide a real-world example with this Firefox message, which in MF2 is effectively

{$num :integer} (default)

in the source locale (en-US), but its Japanese translation depends on the OS, requiring the use of a custom :platform selector:

.input {$num :integer}
.match {:platform}
macos {{{$num} (デフォルト)}}
* {{{$num} (既定)}}

Note how the introduction of the .input declaration has not changed the requirements or restrictions on the num value, as in both the source and target locales it's formatted the same way using :integer.

MF2 allows for this, while also allowing for a specific user to choose to restrict changes such as the above. It's also good to note that similar restrictions on the input may also be imposed by a .local declaration:

.local $foo = {$num :integer}
.match {:platform}
macos {{{$foo} (デフォルト)}}
* {{{$foo} (既定)}}

As in the example above using .input, this message requires the num value to be usable as an integer.

In other words, the more appropriate "MUST NOT" to impose on translators (e.g. via tooling) is changes to expressions where the operand is externally provided. Say, if the original was instead

{$num} (default)

then it would not be valid to change the expression to {$num :number} in any locale based on a presumption about the value's type from its name.

That is a restriction which applies to all .input declarations, but it also applies to many .local declarations and placeholders as well; we really should not have .input be special in the data model, as it could mislead other developers the same way about what sorts of changes and transforms could be safe, and which are not.

@aphillips aphillips added blocker-candidate The submitter thinks this might be a block for the Technology Preview Future Deferred for future standardization labels Sep 10, 2024
@aphillips
Copy link
Member

Discussed in 2024-09-10 call. Moving to post-46 but pre-2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker-candidate The submitter thinks this might be a block for the Technology Preview data model Issues related with MF data Model Future Deferred for future standardization
Projects
None yet
6 participants