Skip to content
Lorenzi edited this page Oct 4, 2020 · 13 revisions

So, you want to write programs for that new microprocessor you've built, or that custom virtual machine you've created? We'll take a look at how to define your instruction set and then use it to assemble programs.

Assembling your first program

Let's start off by creating the following file, which we can name main.asm:

#bits 8

#ruledef
{
    nop => 0x00
    hlt => 0xff
}

nop
nop
hlt

You can assemble this by running:

$ customasm main.asm

The command above will write a binary file to disk. You can also print the output to screen, which might make it easier to understand what is being produced, by running:

$ customasm main.asm -p

Yet another option is to use the online version, which doesn't require any downloads. You can copy and paste the code above into the page, and hit the "Assemble" button.

After it is assembled, you should see three bytes' worth of code being output! 00 00 ff, corresponding to the three-instruction program given.

Anatomy of a customasm file

We'll now take a closer look at the contents of our main.asm file.

There are two parts to this file: a #ruledef block defining your instruction set, and a list of actual instructions that form your actual program. You can see that the #ruledef lives in the same file as the rest of your program, but you can also split it up into multiple files using #include directives.

Defining the smallest addressable unit

You can start the file with a #bits directive, which defines the smallest addressable unit for your machine. The default is 8 if you don't specify it, which is the most common value for modern CPUs.

Defining an instruction set

You should use a #ruledef block to list mnemonics and their binary representations. You can have as many #ruledef blocks as you need, so you can easily split up your declarations. You can combine any number of letters, words, and punctuation for a given mnemonic. For example, these are valid patterns:

#ruledef
{
    mov a, #b => 0x35

    sub x, [hl] => 0b11010001

    add.gt r0, r3, r4, LSL #6 => 0x46
}

For the binary representation, the way you write out values matter. Their size is derived from the number of digits given. So, for example, 0x0 is four bits long, since it's a single hexadecimal digit, and 0x001 is 12 bits long (which is to say: leading zeroes do matter). In the example below, you can also see single-line comments, which start with a ;.

#ruledef
{
    ; a single-byte instruction
    mov a, b => 0x35

    ; double-byte instructions
    add a, b => 0x6834
    sub a, b => 0x0002
}

You can also split up these values for visual aid by using the concatenation operator @:

#ruledef
{
    ; a single-byte instruction
    ; 3 bits + 2 bits + 3 bits = 8 bits
    mov a, b => 0b101 @ 0b11 @ 0b001

    ; a double-byte instruction
    ; 8 bits + 4 bits + 4 bits = 16 bits
    add a, b => 0x08 @ 0x3 @ 0b1001
}

[to be finished...]