Skip to content

Commit

Permalink
Clarify various descriptions
Browse files Browse the repository at this point in the history
  • Loading branch information
Chris Whealy committed Mar 31, 2023
1 parent 45bddb6 commit 9ff7c61
Show file tree
Hide file tree
Showing 7 changed files with 49 additions and 40 deletions.
3 changes: 2 additions & 1 deletion _posts/2023-03-17-hieroglyphy.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ Here is my version of [Hieroglyphy](https://github.com/ChrisWhealy/hieroglyphy).
# Overview

There has been some investigation into encoding the source code of a JavaScript program such that it uses a reduced alphabet, but remains syntactically valid and executable.
Irrespective of whether or not the encoded program remains human readable, you must still be able to `eval` or execute it.

The object of the exercise here is not to create a program that remains human readable, but one that can be `eval`ed and executed.

For example:

Expand Down
5 changes: 3 additions & 2 deletions chriswhealy/hieroglyphy/bootstraps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@ Finally, we can derive integer `1` by performing numeric coercion on `!![]`

![Coerce Integer One](/chriswhealy/hieroglyphy/img/coerce_1.png)

## Natural Numbers
## Counting Numbers

We have seen above that when a Boolean value appears in an arithmetic expression, `false` is coerced to `0` and `true` is coerced to `1`. Knowing this we can derive the natural counting numbers.
We have seen above that when a Boolean value appears in an arithmetic expression, `false` is coerced to `0` and `true` is coerced to `1`.
Knowing this we can now derive the counting numbers.

Since `2` is `1 + 1`, we can rewrite `1 + 1` as `true + true` and still get `2`.

Expand Down
19 changes: 11 additions & 8 deletions chriswhealy/hieroglyphy/but-why/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,20 @@

This exercise is not entirely pointless because it explores the area of language encoding.

When deciding on the size of an encoding alphabet, there needs to be a balance between providing sufficient characters to allow for expressiveness and readability, yet not providing every possible character simply because it has a non-zero probability of being used.
When deciding on the size of an encoding alphabet, there needs to be a balance between providing sufficient characters to allow for expressiveness and readability, yet not providing every possible character simply because within your language, it has a non-zero probability of being used.

Given that most high-level programming languages use English keywords, we naturally expect the encoding alphabet to include:

| Type | Characters | Count
|---|---|--:
| The letters of the Roman alphabet | `[a..z][A..Z]` | 52
| The digits | `[0..9]` | 10
| Graphic and currency characters | `@#%^_\`, `£$€` | 9
| Punctuation characters | `!?:;,."'` | 8
| Mathematical operators | `&|+-*/=<>` | 9
| Different styles of delimiter | `(){}[]` | 6
| The letters of the Roman alphabet | `[a..z][A..Z]` | 52
| The digits | `[0..9]` | 10
| Graphic and currency characters | `@#%^_\` and `£$€`| 9
| Punctuation characters | `!?:;,."'` | 8
| Mathematical operators | `&|+-*/=<>` | 9
| Parenthesis Pairs | `(){}[]` | 6

This why a regular English keyboard makes provision for at least 94 characters; and in languages that need diacritics, their keyboards often have more.
This why a regular English keyboard makes provision for at least 94 characters; and in languages that use diacritics, their keyboards often require more.

However, as we reduce the size of our alphabet, we will see a corresponding drop in legibility and an increase in word length.
This is simply because as the number of letters in your alphabet decreases, so the number of letters needed to represent a unique word increases.
Expand All @@ -39,6 +39,9 @@ Many alphabets could be chosen here of varying sizes, but just for fun, we're go
| `()` | Function invocation, Expression delimiter to avoid parsing errors
| `{}` | Gets us `NaN` and the infamous string `[object Object]`

As an option, I have extended the coding in [hieroglyphy.mjs](https://github.com/ChrisWhealy/hieroglyphy/blob/master/hieroglyphy.mjs) to allow you to include the digits characters `['0'..'9']` in the encoding alphabet.
This increases the alphabet size from 8 characters up to 18, but has the benefit of reducing the encoded length by approximately 40%.

> ***FYI:***<br>
> This 8-character alphabet is close to minimal in size.<br>
> A minimal alphabet drops the use of curly braces `{}`.
Expand Down
2 changes: 1 addition & 1 deletion chriswhealy/hieroglyphy/checkpoint1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
|---|---|---
| [Pulling Some Strings](/chriswhealy/hieroglyphy/strings/) | What Have We Achieved So Far? | [Extracting Characters From Keywords](/chriswhealy/hieroglyphy/keywords/)

So far, we are able to encode the 10 digit characters `0` to `9`, so we will place them into a character cache that so far contains:
So far, we are able to encode the 10 digit characters `'0'` to `'9'`, so we will place them into a character cache that so far contains:

| Character | Derived From | Encoding
|---|---|---
Expand Down
12 changes: 6 additions & 6 deletions chriswhealy/hieroglyphy/keywords/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ Now that we have both the integers and their string representations, we can star
## undefined

***Q:*** What does JavaScript return if you access a non-existent array element?<br>
***A:*** `undefined`
***A:*** The keyword `undefined`

Using our close-to-minimal alphabet, we can obtain the keyword `undefined` by accessing element `0` of an empty array: `[][0]`.

Further to this, we know that integer `0` can be encoded as `+[]`, so we can get `undefined` from `[][+[]]`.

If we now concatenate this value to an empty list, we can convert the reserved word into the string `'undefined'`, from which we can then extract the individual letters:
If we now concatenate this value to an empty list, we can convert the keyword `undefined` into the string `'undefined'`, from which we can then extract the individual letters:

```javascript
// Access element zero of an empty array
Expand All @@ -34,8 +34,8 @@ If we now concatenate this value to an empty list, we can convert the reserved w
([][+[]]+[])[5] // 'undefined'[5] -> 'i'
```

The only thing we need to modify here is the fact that we cannot directly use an integer as the array index.
So we need to substitute each integer for its encoded representation:
If we have switched off digit encoding, then the above representation is complete.
However, by default, digit encoding is switched on, so we need to replace the index digits with their encoded representation:

```javascript
// Substitute the integer index for the encoded integer
Expand All @@ -49,7 +49,7 @@ So we need to substitute each integer for its encoded representation:

## Booleans

Let's now repeat the same trick, but this time, extract the characters from the reserved words `true`, `false`, `NaN` and `[object Object]`.
Let's now repeat the same character extraction trick, but this time on the keywords `true`, `false`, `NaN` and the string `[object Object]`.

```javascript
![] // Reserved word false
Expand All @@ -70,7 +70,7 @@ Let's now repeat the same trick, but this time, extract the characters from the
(!![]+[])[+!![]+!![]+!![]] // 'true'[3] = 'e'
```

In cases where we have multiple ways to encode the same character (so far, we have three ways to encode the letter `'e'`), the shortest encoding will be used.
In cases where we have multiple ways to encode the same character (for example, we have three ways to encode the letter `'e'`), the shortest encoding should be used.

## Not a Number: NaN

Expand Down
18 changes: 9 additions & 9 deletions chriswhealy/hieroglyphy/numbers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,24 @@
|---|---|---
| [Extracting Characters From Keywords](/chriswhealy/hieroglyphy/keywords/) | Tricks With Big Numbers | [So Where Are We Now?](/chriswhealy/hieroglyphy/checkpoint2/)

Now that we have the string representation of all the digits and the letter `'e'`, we can construct strings to represent large numbers such as <code>10<sup>100</sup></code> and <code>10<sup>1000</sup></code>.
Now that we have the string representation of all the digits and the letter `'e'`, we can use JavaScript's exponent notation to construct strings that represent large numbers such as <code>10<sup>100</sup></code> and <code>10<sup>1000</sup></code>:

We first form the strings `'1e100'` and `'1e1000'`, then coerce these strings to numbers which returns the numeric values `1e+100` and `Infinity` (notice that JavaScript has helpfully included a `'+'` for us).

Then, by coercing `1e+100` and `Infinity` back to strings, we can obtain the characters `'+'`, `'I'` and `'y'`.
* Form the strings `'1e100'` and `'1e1000'`
* Coercing these strings to numbers gives `1e+100` and `Infinity`
* Coercing these numbers back to strings to give `'1e+100'` and `'Infinity'`
* Extract the characters `'+'`, `'I'` and `'y'`

## Plus Sign

To obtain the `'+'` sign, we first need to construct the number `1e+100`.
This can be done as follows:
It might seem almost magical that the `'+'` sign can be obtained from a string containing only the characters `1e100`; however, when this string is coerced to a number then back to a string, JavaScript helpfully inserts the `'+'` for us...

```javascript
+'1e100' // Create the string and coerce to a number -> 1e+100
(+'1e100')+[] // Overload plus to convert the number back to a string -> '1e+100'
((+'1e100')+[])[2] // Extract character at index 2 -> '+'
((+'1e100')+[])[2] // Extract the character at index 2 -> '+'
```

Here's the above code with encoded values substituted for `'e'`, `1` and `0`:
Here's the above code with `'e'`, `1` and `0` represented in their encoded form:

```javascript
+(+!![]+(!![]+[])[+!![]+!![]+!![]]+(+!![])+(+[])+(+[])) // Coerce string '1e100' to number 1e+100
Expand All @@ -31,7 +31,7 @@ Here's the above code with encoded values substituted for `'e'`, `1` and `0`:

## Two More Alphabetic Characters

The number `1e+1000` is too large for JavaScript to store as a 64-bit floating point number, so instead, it simply returns the word `Infinity`.
The number `1e+1000` is too large for JavaScript to store as a 64-bit floating point number, so instead, it simply returns the keyword `Infinity`.
This is very helpful because it contains the previously unavailable characters `'I'` and `'y'`

```javascript
Expand Down
30 changes: 17 additions & 13 deletions chriswhealy/hieroglyphy/strings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,26 +31,30 @@ Using this naïve scheme of repeatedly adding one, the number `17` would be repr
!![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] // 17
```

However, if we convert the digits of index `17` to the strings `'1'` and `'7'`, encode these digits then concatenate and coerce to a number, we will have a much shorter representation.
However, we can create a much shorter representation by:

* Encoding each digit of integer `17` to the strings `'1'` and `'7'`
* Concatenating the string representations
* Coerce the string back to a number

Given that our minimal alphabet consists only of the characters `+!(){}[]`, how do we coerce a value to a string?

The answer is to overload the plus `+` operator, thus forcing the conversion of the operands to strings.
This can be done by concatenating our numeric value to an empty list:
The answer comes from realising that when coerced to a string, an empty list `[]` becomes an empty string `''`.
Knowing this, we can overload the plus `+` operator by concatenating our numeric value to an empty list:

![Coerce String One](/chriswhealy/hieroglyphy/img/coerce_str_1.png)

So simply by adding `+[]` to the end of each digit, we can obtain that digit's string representation:

```javascript
+[] + [] // 0 + []-> '0'
+!![] + [] // 1 + []-> '1'
!![] + !![] + [] // 2 + []-> '2'
!![] + !![] + !![] + [] // 3 + []-> '3'
!![] + !![] + !![] + !![] + [] // 4 + []-> '4'
!![] + !![] + !![] + !![] + !![] + [] // 5 + []-> '5'
!![] + !![] + !![] + !![] + !![] + !![] + [] // 6 + []-> '6'
!![] + !![] + !![] + !![] + !![] + !![] + !![] + [] // 7 + []-> '7'
!![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + [] // 8 + []-> '8'
!![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + [] // 9 + []-> '9'
+[] + [] // 0 + [] -> '0'
+!![] + [] // 1 + [] -> '1'
!![] + !![] + [] // 2 + [] -> '2'
!![] + !![] + !![] + [] // 3 + [] -> '3'
!![] + !![] + !![] + !![] + [] // 4 + [] -> '4'
!![] + !![] + !![] + !![] + !![] + [] // 5 + [] -> '5'
!![] + !![] + !![] + !![] + !![] + !![] + [] // 6 + [] -> '6'
!![] + !![] + !![] + !![] + !![] + !![] + !![] + [] // 7 + [] -> '7'
!![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + [] // 8 + [] -> '8'
!![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + !![] + [] // 9 + [] -> '9'
```

0 comments on commit 9ff7c61

Please sign in to comment.