Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Define Ambiguity node #337

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Define Ambiguity node #337

wants to merge 2 commits into from

Conversation

dennwc
Copy link
Member

@dennwc dennwc commented Dec 5, 2018

This PR defines a new node type for SemUAST called Ambiguity.

This node will represent multiple alternative meanings of the same node in case driver cannot determine its type with a 100% certainty (without running the compiler or symbol resolution), but can propose a small number of alternative meanings.

The first use case is that languages like Python, Go and some others have built-in names like True/False that can be redefined. The driver might mark them as Identifier and be done with it, but this won't be that helpful for the end user. In fact, the driver knows that in this specific language the True identifier with 99% certainty is a boolean literal. So instead of dumping a generic Identifier, the driver may emit an Ambiguity node with both Bool and Identifier nodes.

The second use case is that sometimes operators might be overloaded in the language itself. I don't want to state that this will solve an issue of user-defined operator overloading, though. The current use case is a bit simpler for now. For example, in Python there is a % operator that can be a modulo operator or a string interpolation operator. The driver may emit both possible versions as an Ambiguity node if it cannot assert the type of the first operand.

Later, this may help to identify places in UAST that needs additional analysis. And it will immediately allow searching for nodes of more specific types in Babelfish instead of generic ones.


This change is Reviewable

Denys Smirnov added 2 commits December 5, 2018 21:29
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Signed-off-by: Denys Smirnov <denys@sourced.tech>
@juanjux
Copy link
Contributor

juanjux commented Dec 6, 2018

While I think this is a nice feature to have, it was unplanned and the impact of this and the Bool PR is not trivial since would require to update/debug all drivers again so we should probably discuss not so much when to merge this (since it seems to be backward compatible until you start upgrading nodes) but when to upgrade the drivers.

I suspect having finished C# and Typescript drivers (plus issues that are starting to pile) have more priority than doing another pass over the existing ones, but this is something that @creachadair will have to talk with management.

I also have couple questions:

  1. Imagine I have some annotation (for example Argument) that set some node that is, on a previous annotation, converted into another semantic node (for example, Identifier) that, with this, would be ambiguous so instead of just Identifier it could be Ambiguous:Identifier|String. Wont this break all those annotations? (Argument in this case).

  2. If the answer to the previous question is "Yes", how would the new annotation work to express "This argument name is an Identifier or an Ambiguitiy node that could be an Identifier". We should strive to avoid making annotations/the DSL more complex than they already are so if there was some way in the SDK to get an Ambiguity node match any annotation that expect one of its possible values it would be great.

@creachadair
Copy link
Contributor

While I think this is a nice feature to have, it was unplanned and the impact of this and the Bool PR is not trivial

If I understand the proposal correctly, this is effectively adding sum types to our abstract type system. That's not necessarily a bad idea—several languages have variations of sum-typing that we might want to incorporate anyway. However, I agree with @juanjux that it's a much larger design question, and we should consider it as a new feature rather than adding it in piecewise.

I tend to agree that short term we should probably be aiming for "breadth" of coverage (e.g., more languages) before "depth" of coverage (e.g., more detailed types). Both are valuable, but I think the immediate impact for our customers is higher if we go broad first.

@dennwc
Copy link
Member Author

dennwc commented Dec 6, 2018

The aim of the proposal is to start a discussion about this specific problem. Just in case - I'm not proposing to update all the drivers :)

I also agree that extending language coverage should be the highest priority. This PR is more like a brain-dump of the problem I was thinking about in the background. If you think the proposal is reasonable - let's discuss it (same - in the background).

@juanjux Those are really good questions. First, this node type won't break annotation code, because Semantic pass happens before the Annotation pass and the last one is already prepared to deal with arbitrary Semantic types. And annotations won't ever target Semantic types directly, because it doesn't make sense to do so (Semantic types are "roles" themselves).
For the second question, I think you are referring to the normalization pass, not to the annotation pass. There is a way to make this "or" selector work, but I think it's unnecessary. Adding Ambiguity, by the definition, will happen as the last transform, and all other transforms will work on a "pure" tree.
And if you are thinking about XPath as well, we can solve it in a better way. Since we control the projection now, and projection knows the schema, we can "squash" the Ambiguity node and pretend that there are two alternative nodes in that XML tree. Queries will just work with both types of nodes in this case.

@creachadair I'm not sure that this is similar to sum types. Sum types assume that there is a set of defined types and the sum type may consist of few of them. And this is somehow expressed in AST/code/type system/whatever.
This change, in contrast, aims to describe a specific instance of an AST node and provides alternative interpretations for it. This won't happen automatically for specific node types.

Instead, the transformation code will search for specific patterns (x % y, for example, which is py:BinOp %) and will try to assert some condition (x is a string).
If the assertion is successful, the transform will replace the node with a specific UAST type (string interpolation in this case, if we imagine we define it).
And if the assertion cannot be made (x is an identifier, and the driver won't bother with flow analysis to find out the type), the transform will replace the node with an Ambiguity (either "modulo" or "string interpolation" node).

I think it's a useful change, in general. For most languages, we know 2-3 possible meanings of operators or some identifiers and we can easily be more specific than "unknown binop %" or "identifier true".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants