Skip to content

How emulator works, Part II Structure and components for a Chip8 emulator

Camilo Andrés Mella Lagos edited this page Jun 13, 2017 · 12 revisions

In part one we studied the relationship between variable type and sizes and also we put that knowledge to the test by creating the memory containers and the program counter.

By reading about registers you should have been able to also create the container variables for the V registers ,but again if the logic behind your selection is not clear... Well, I'll walk you through it. Don´t get used tho´ hand holding will just slow down your progress.

As a rule to live by... Programmers must challenge themselves in order to keep growing, don´t stay on your comfort zone or use the same approach twice to solve the same problem.

But i digress, onto registers! Okay let´s see 16 8 bit registers numbered 0-F, this is an easy one. We already now which 8bit container to use so we have several possible ways of doing it. Let´s explore two possible ways on implementing registers.

We could do:

unsigned char V0, V1, V2, V3, V4, V5, V6, V7, V8, V9, VA, VB, VC, VD, VE, VF;

And we will be fine, we still have lots of room for variables in our program memory don´t we?

NO! another golden rule while programming is: "be smart", code for your future self, make him proud of you sparing him a costly and time consuming rewrite.

Always imagine your code is going to be discovered in a few years by an IT archaeologist... You only had one chance to make a good impression and guess what...?? You blew it ahahah

So.. try to use your space efficiently, not only your code will be neater, but also it will be understandable and easy to follow. Remember longer programs don´t equate to working harder... or rather it does, but not in a good way

Putting that rant aside, another way we could approach our register creation is just using an array thus:

unsigned char V[16];

An innate advantage of doing it this way, is that we could refer to each object using Hex values, so if you want to assign a value to the VE register you just use:

V[0xE] = 0xnn   (nn being a un unsigned number between 0 and 255)

To keep going we also need to understand a bit how a console works.

Let's say you are playing space invaders. Every time you destroy a ship, the game enters a subroutine prompting to increase your score and destroying the ship sprite from the screen, but also it has to be able to go back to where it was originally, before entering the sub routine, so you can go on playing your game. To do this, and only exercising our logic deductive powers, we should be able to see what we need

First we need an structure which can hold several levels of STACKED sub routines

Only being able to Stack sub routines won´t do much unless you have some way of Counting which level is active. We also then, need a STACK POINTER(SP). Finally, what we want to save on the STACK is a given position for an address in memory. Let´s back down a bit

unsigned char memory[4096];
unsigned short PC;

An Address is, pardon the pun, addressed by walking through the memory using the Program counter(PC), so if you want to access the element contained by an address you should say

memory[PC]

Now don't get confused by this.

PC is a unsigned short variable, while memory is a unsigned char variable. So how is that i'm combining them

To answer this, you can just pop the calculator, once more and use your raw logic! First memory it's an 8bit entity wich is declared as an array of 4096 positions, so by using PC you are just saying. HEY! i want to access 1 of those 4096 elements... and then save them to a container variable. THUS! that container variable should be of type???

unsigned char I;  

WRONG! Haha I'm sorry, allow me to take you on an hypothetical journey.

Let's say we use this way. Suppose I have an arbitrary string of numbers like this (representing a memory)

AF 23 89 DE CF F2

We are on DE and we are bout to enter a subroutine, so using our trusty unsigned char we assign

I = memory[pc];

Now, we are back from the subroutine... We are deep into an unknown address, but don't worry... Let's just jump to I... **Except that **

I = 0xDE

NICE! exactly what I wanted, now where was 0xDE??? Ohhh shiiiii#@$&

I bet you see the conundrum we are in. "I" shouldn't contain the element of the address... it rather should store the position of the memory before the jump. And that's why "I" mimics the PC and it's of type unsigned short (The register it's called I btw :P )

So we need a Stack , which in this case is of 16 levels and also we are storing 2byte variables on it... So

unsigned short Stack[16]; 
unsigned int SP; //Only integer positive numbers to travel up and down the stack

We are just gonna speedrun through the last ones, so strap in.

We also need some Canvas for the graphics, and if you have been doing your homework then you should already know that...

 Original CHIP-8 Display resolution is 64×32 pixels, and color is monochrome.

This is all we need. Once more!!! We have a choice here and it all depends on how comfortable you feel when implementing GFX drawing routines. Once more, I'm gonna take you on one path, just to trigger that AHA! later on that will ultimately cause you to marry a way and run with it for the rest of the project

Let's see, Monochrome means only two colors black and white. Sadly C++ don't have a BIT variable, but how about we use unsigned char once more?... But ]-[Dx, I hear you say in protest, don't we have like a gazillion numbers on a 8bit unsigned variable?

Actually 255, and we only need two and they are kinda self explanatory. If you have some knowledge on how RGB color works you are set, and probably are urging me, to go on and stop rambling :D ,but i'm here to address those who want insight and a complete understanding of the WHYs and HOWs of the topic at hand.

The thing is really simple. Think of Pink Floid famous prismatic logo. All colors combined forms the white (and from white and derivating on all length waves is where colors are born as seen by us)

So what's the maximum value we can store in an unsigned char variable? YEP!

255 or 0xFF, but what about black? Black is just your tv turned OFF... A dead Pixel on other words 0

Okay!, so we have our colors, and we have chosen our variable type so naturally it should be

unsigned char Gfx[64*32]; // Yeah I'm that lazy

Next... keys? are you familiar with ASCII code? ASCII is used to call Symbols by their ASCII decimal value and they go... You guessed it 0-255 (I bet you now where am I going with this) But how many keys should i be able to catch? It depends but just stick with Wikipedia

Input is done with a hex keyboard that has 16 keys which range from 0 to F. The '8', '4', '6', and '2' keys are 
typically used for directional input. Three opcodes are used to detect input. One skips an instruction if a specific 
key is pressed, while another does the same if a specific key is not pressed. The third waits for a key press, and 
then stores it in one of the data registers.

Not even going to help you with this one :P The only head scratching moment you should have from that definition is... OPCODE????

I hope you are still with me. So far it's been easy, but I'm about to unlock the piece of information that will connect all of this, the piece of information that many people use, but bypass it's true understanding. Knowing this will make you marginally less popular at parties hehe, but will open your universe in terms of connected knowledge... Yeah so much hype, but Opcodes deserve it haha

OKAY, so what the hell is an OPCODE?

An opcode as it names implies is known as "Operation code" and tho' it may sound trivial and you could be quick to dismiss it as... okay the operations the CPU does. It's way more than that. Don't jump ahead, concentrate on this next question

What is a ROM?

If you are an emulation junkie, I bet you said... "Well, that't easy a GAME!" And you will be right. ROM comes from the acronym Read Only Memory and it's basically a game. But what's so important about a ROM being a game? Well for starters, it means the contents can't be changed, just accessed and addressed. Now... Ever tried opening a ROM in notepad?

  ¢.Â.2.¢.Ð.p.0@..`.q.1 ....€@ . @€.

Beautiful!! i know. By the way... that's the whole code of MAZE (i hope i don't get in trouble for redistributing ROMS lol)

So what's so unique about a bunch of garbage characters? Well everything actually... You just can't see it YET

A2 1E C2 01 32 01 A2 1A D0 14 70 04 30 40 12 00 60 00 71 04 31 20 12 00 12 18 80 40 20 10 20 40 80 10

This is the corrected string after being translated from hex

Now,from wikipedia, we know the following tidbit of information

CHIP-8 has 35 opcodes, which are all two bytes long and stored big-endian. The opcodes are listed below, in hexadecimal and with the following symbols:

Hopefully you are not too lost, let's start with the opening statement

CHIP-8 has 35 opcodes, which are all **two bytes long**

So the CPU can process 35 different types of instructions, furthermore they are ALL 2byte long... sounds familiar? It should. 2bytes are 16bits so it can fit on an unsigned short variable.

But... How is that related to the bunch of Hexadecimals we saw earlier? Let's bring them again

A2 1E C2 01 32 01 A2 1A D0 14 70 04 30 40 12 00 60 00 71 04 31 20 12 00 12 18 80 40 20 10 20 40 80 10

Take a look at the first 4 CHARACTERS or what's the same, the first TWO NUMBERS. Yep remember numbers between 0-255 (0x0 - 0xFF) is what each position can store. You are basically reading MEMORY[] So let's take a look at the first two positions

A2 1E

Well... that didn't do much... Wait, let's dig a little deeper ... It says an opcode is a 2byte entity and here we have 2 1byte long numbers by themselves, so how do we stick them together??? Bitwise shifting and ORing naturally ahhah now... that was anti climatic and if you are a newcomer sounded like a bunch of things you have no idea how to do... but I promise you, after this next section, this will all make, to quote Donald Trump, "SO MUCH SENSE!"

To better explain this, let me illustrate with my once again... awesome ASCII skills

1010 0010 = Binary representation of A2
0001 1110 = Binary representation of 1E

So here we have two 1byte numbers, and we need them to fit SIDE BY SIDE on a 2byte container like this

           1010 0010 = Binary representation of A2
           0001 1110 = Binary representation of 1E
0000 0000  0000 0000 = Empty container for the opcode of type unsigned short (2byte long)

Remember we re looking for the end result **A21E**, this is where it gets fun. 

You know these symbols? ">" and "<" Yes, the greater than and less than symbols... i know most of you use them to make >.< faces on the internet, but their power reach waaaaay beyond making girls fall for your charms (I was a late bloomer, sue me ) hhaha well so **using them like this >> after a number ** commands you to shift the bits on that number to whichever side you chose. Let's continue with our example and let's shift the first one 8bits to the left

           1010 0010 (A2) << 8
           0001 1110 (E1)
0000 0000  0000 0000 (opcode)

           0000 0000 (00)
           0001 1110 (E1)
0000 0000  0000 0000 (opcode)

AWWWW DUDE!! What the hell!!!. Where did my number go??? hahaha. You my friend, just threw that poor number overboard How you ask me? Let me playback the last play in slow motion for you haha

1010 0010 (A2) << 8    = Original Instruction
0100 0100 (44) << 7    = After the first shift
1000 1000 (88) << 6    = After the second shift
0001 0000 (10) << 5    = After the third shift

So basically you took that poor number's value and scrolled it to his death

Okay, don't worry let's rewind... Clearly just shifting won't do the trick, But take a look at this. We can OR The Opcode container with one of them (The one that leads in A2E1 is A2, so let's use that) What's that? What is ORing?

This is OR:

 0011
 0100      You compare two numbers and you ask. if any of them a 1? then the result is 1, if there is none, the result  
 ----      is 0. but why is this relevant to our case? Let's get back to our example... 
 0111          
 

           1010 0010 (A2)        Let's Logic OR this two together       
0000 0000  0000 0000 (opcode)
--------------------
0000 0000  1010 0010  = Result from the operation

Holy sh#$T! that's neat!, but HEY! isn't it in the wrong place?.... Well let's try the shifting once more

0000 0000  1010 0010  << 8
1010 0010  0000 0000   = after the shift

Yep! shifting when there is no information, just add a bunch of zeros and that's really useful. I bet you already know the last step. Just OR the other number against the result

           0001 1110 (E1)        That's perfect like tetris! One last OR 
1010 0010  0000 0000
--------------------
1010 0010  0001 1110   =  Result A2E1

GRATZ! you just created your first opcode! Wait... i thought opcodes were on the CPU??? That's the beauty of this! Earlier on, I asked you if you ever opened a ROM in NOTEPAD. Well ROMS/Games are nothing more than a ordered set of instructions

The memory on a ROM is nothing more than a bunch of Opcodes and thankfully for us they are all (IN THIS CASE) 2byte long. Don't get used tho' after a while you will be seeing machines like Gameboy which opcodes range between 1 and 3 in lenght, so it's important to take in consideration that opcodes can differ in length and what follows are their parameters. BUT! we are getting ahead of ourselves. let's go back to our comfy chip8 on these cool 2byte instructions. So to recap if you wanted to do it programatically you first need to access the two chars we want to merge

We already now how to access a member of the memory.

memory[PC]
But we need two... and also we need to OR into the container... well how about this?

opcode = (memory[PC] << 8) | (memory[PC+1] );

I can already hear you protesting... Wait i thought we had to OR two times? ------ Do we???? If the canvas is empty we can just assign it... Also notice that doing it this way the number doesn't get overflowed due Parenthesis and operation precedence. Take that in consideration

Phew! that was way too much work to get an opcode... I agree, I bet you are tired. What about we call it a night :P Don't worry you are closer than you think of being a emu author :D

I'm sorry for the heavy binary oriented chapter, but i believe is better get these things out of the way first than blindly accepting code and taken it for granted

See you in the next one. I'll try to address the main program loop and how should we go about implementing it. I hope you are excited. I know I am :D

]-[Dx