Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for UTF8 character representation within string literals #74

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ian-Grant
Copy link

  • Added UTF8 Unit in mosmllib

  • Modified PP.sml to correctly format multi-octed UTF8 strings (it needs to parse the UTF8 representations to do this)

  • Changed Lexer.lex to check UTF8 encodings in string literals and allow full ISO/IEC 1064 UCS encodings to be used in numerical escapes:

    \U+XXXXXX

    as well as

    \uXXXX

    as specified in the Standard ML Definition.

UTF8 checking within character strings is non-standard compiler behavior and the switch Meta.utf8 is provided to switch this checking on. Thinking about it now, the extended syntax for numeric character literals should probably be conditional on that too.

Some of the logic came from the HOL Theorem prover, but they are doing it differently now, see: src/portableML/UTF8.sml in https://github.com/HOL-Theorem-Prover/HOL

  - Added UTF8 Unit in mosmllib
  - Changed Lexer.lex to check UTF8 encodings in string literals and allow
    full ISO/IEC 1064 UCS encodings to be used in numerical escapes:

      \U+XXXXXX

    as well as

      \uXXXX

    as specified in the Standard ML Definition.

    UTF8 checking within character strings is non-standard compiler behavior
    and the switch Meta.utf8 is provided to switch this checking on. Thinking
    about it now, the extended syntax for numeric character literals should
    probably be conditional on that too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant