Skip to content
hlorenzi edited this page Jan 17, 2019 · 13 revisions

So, you want to write programs for that new microprocessor you've built, or that custom virtual machine you've created? We'll take a look at how to define your instruction set and then use it to assemble programs.

Assembling your first program

Let's start off by creating the following file, which we can name main.asm:

#cpudef
{
    #bits 8
    
    nop -> 0x00
    hlt -> 0xff
}

nop
nop
hlt

You can assemble this by running:

$ customasm main.asm

You can also print the resulting binary to screen, which might make it easier to understand what is being produced, by running:

$ customasm main.asm -f hexdump

Yet another option is to use the online version, which doesn't require any downloads, then copying and pasting the code above into the page, and hitting the "Assemble" button.

After it is assembled, you should see three bytes' worth of code being output! 00 00 ff, corresponding to the three-instruction program given.

Anatomy of a customasm file

We'll now take a closer look at the contents of our main.asm file.

There are two parts to this file: a #cpudef structure defining your instruction set, and a list of actual instructions that form your assembly program. You can see that the #cpudef lives in the same file as the rest of your program, but you can also split them up into multiple files and use #includes.

Defining a CPU

The #cpudef structure starts with a #bits declaration. This may be thought of as the number of bits in a byte for your particular CPU. So, while this is usually 8 for modern CPUs, you can really use any value, if you have some kind of esoteric machine. The size of a byte impacts the address space of your machine: you won't be able to reference anything at a finer grain than a single byte. customasm will also use this value to verify the size of the binary representation of all instructions you define: every instruction must have a size that is a multiple of a byte. So, for an 8-bit CPU, valid instruction sizes are 8, 16, 24 bits, and so on.

Next, it starts defining the instruction set, by listing mnemonics together with their binary representations. You can combine any number of letters, words, and punctuation for a given mnemonic. For example:

#cpudef
{
    #bits 8
    
    mov a, #b -> 0x35

    sub x, [hl] -> 0b11010001

    add.gt r0, r3, r4, LSL #6 -> 0x46
}

For the binary representation, the way you write out values matter. Their size is derived from the number of digits given. So, for example, 0x0 is four bits long, since it's a single hexadecimal digit, and 0x001 is 12 bits long (which is to say that leading zeroes matter). In the example below, you can also see single-line comments, which start with a ;.

#cpudef
{
    #bits 8

    ; a single-byte instruction
    mov a, b -> 0x35

    ; double-byte instructions
    add a, b -> 0x6834
    sub a, b -> 0x0002
}

You can also split up these values for visual aid by using the concatenation operator @:

#cpudef
{
    #bits 8

    ; a single-byte instruction
    ; 3 bits + 2 bits + 3 bits = 8 bits
    mov a, b -> 0b101 @ 0b11 @ 0b001

    ; a double-byte instruction
    ; 8 bits + 4 bits + 4 bits = 16 bits
    add a, b -> 0x08 @ 0x3 @ 0b1001
}

[to be finished...]