15 minutes
Compiler Devlog #1 - Stack-based Intermediate Representation (IR)
Prologue
So I started writing a compiler around a year ago, worked on it for about a month or two and then lost interest as with most of my side-projects… tsc tsc.
Let’s be honest, there’s very little reason to write a compiler nowadays, if any. Hardware has gotten so powerful and software has gone so complex that we should all just use what’s already there. I feel like a Baby-Boomer born inside a Millenial body, because I miss the “good old days” (when I wasn’t even conceived yet). To be fair, I remember my dad having a 386 and also him trying to teach me BASIC when I was 6 or 7 years old, I remember being excited about it but of course my kid brain wanted to just play games and have fun.
The point is, looking back at it, the software development landscape looked much simpler and it felt like a single person or a really small team could really do something innovative and disruptive, e.g. id Software creating state of the art FPS game engines, Linus Torvalds with linux, and so on. Of course that is still perfectly possible, just at a much higher level. I’m not going to google it right now but I’m pretty sure both Bitcoin and Ethereum were created mostly by a single individual. This web3 stuff really seems where most of the innovations are going to happen, I’m just… not really into it.
Anyway, I digress, but my passion is in learning how the hardware that I’m in contact with every day does stuff. How does a computer like… execute applications? What it’s actually doing it? Who’s controlling all that stuff? Oh, the CPU, okay… How does it know what to execute? How can I type 2+2 in my keyboard, see it in my screen, and actually get 4 back? You get the point, I want to understand what the CPU understands, so around a year ago I decided to learn Assembly, since my daily computer is a Windows machine with a x86_64 architecture, I decided to start with that.
Unfortunately I don’t have the habit of reading books aside from fiction. I’m easily bothered doing exercises from books to try and learn stuff. I prefer to try and do something until it works, if it doesn’t works, there’s always the Internet… Oh yes the Internet 😁. Also tinkering with stuff yourself is way more exciting and since it’s not an actual job you are free to experiment different ways of thinking, so I decided to try and create an actual compiler to learn Assembly and what my code actually get’s transformed when it’s running in the CPU.
The Programming Language
Having previously written some toy interpreters and virtual machines, I wasn’t threading in completely new waters. After these toy projects, it’s pretty much ingrained in my mind how to write a Recursive Descent Parser, so I decided to start with that. I decided to “create” a new programming language instead of writing it for an existing one, because we all have our favourite features from a number of languages that we think would be nice together.
I’m not going to spend too much time in this section for now, but just to give an idea of the programming language itself, here’s the Game Of Life written in it:
import std::system as system
import test as t
import string
type Field2D = struct {
data: &int // Pointer to int
width: int
height: int
}
type Automaton = struct {
field: Field2D
new_field: Field2D
}
func makeField(sx: int, sy: int): Field2D {
let data, err = system::alloc(
sizeof(int)*(cast(usize)sx)*(cast(usize)sy)
)
return Field2D{ cast(&int)data, sx, sy }
}
func set(f: &Field2D, x: int, y: int, val: int) {
f.data[y * f.width + x] = val
}
func get(f: &pure Field2D, x: int, y: int): int {
return f.data[y * f.width + x]
}
func clear(f: &Field2D) {
memset(f.data, 0, f.width*f.height*sizeof(int))
}
func makeAutomaton(ftext: []pure uint8): Automaton {
let f = split(ftext, "\n")
defer free(f)
let height = cast(int) f.len
let width = 0
for (y in 0,f.len) {
if (width < f[y].len) {
width = cast(int) f[y].len
}
}
let field = makeField(width, height)
let newfield = makeField(width, height)
for (y in 0,field.height) {
for (x in 0,field.width) {
field->set(x, y, (
(x < f[y].len && f[y][x] == '#') then 1 else 0
))
}
}
let a = Automaton{}
a.field = field
a.new_field = newfield
return a
}
func update(a: &Automaton) {
a.new_field->clear()
for (y in 1,a.field.height - 1) {
for (x in 1,a.field.width - 1) {
let moore_sum = (
a.field->get(x-1, y-1) + a.field->get(x, y-1)
+ a.field->get(x+1, y-1) + a.field->get(x+1, y)
+ a.field->get(x+1, y+1) + a.field->get(x, y+1)
+ a.field->get(x-1, y+1) + a.field->get(x-1, y)
)
let cell = a.field->get(x, y)
let alive = (cell == 1)
then (moore_sum == 2 || moore_sum == 3)
else (moore_sum == 3)
a.new_field->set(x, y, alive then 1 else 0)
}
}
let tmp = a.field
a.field = a.new_field
a.new_field = tmp
}
func print(a: &Automaton) {
let buf, err = system::alloc(a.field.width * a.field.height + 256)
if (err != nil) { system::exit(1) }
defer system::free(buf)
for (y in 0,a.field.height) {
memset(buf, 0, a.field.width * a.field.height + 256)
let s = []uint8{ (cast(&uint8)buf), 0 }
for (x in 0,a.field.width) {
let cell = a.field->get(x,y)
s[x] = cell == 1 then '@' else '.'
s.len += 1
}
println(s)
}
}
func makeGun(): Automaton {
let gunfield = "*******************************************
* *
* *
* *
* *
* *
* # # *
* ## # # *
* ## # # *
* *
* *
* *
* *
*******************************************"
return makeAutomaton(gunfield)
}
func gameOfLife() {
let a = makeGun()
for (i in 0,150) {
a->update()
a->print()
system::sleepMs(200)
}
}
There’s a handful of things that still are temporary or not really well-designed, but if I give you the compiler now, you could compile it and run it. At the moment, both Windows and Linux x64 are supported, the most important thing missing right now is floating-point arithmetic, but that will come very soon, or maybe not. 😳
Architecture
Again, in a possible future Devlog I may give a better overview of the overall architecture and design, but the truth is it’s all a mess right now 😅.
I started very simple:
Lexer (Tokens) -> Parser (AST) -> Type Checker -> x64 ASM Generator
However this didn’t work out at all, the generated Assembly had expressions stepping in each other’s registers all the time and some complex expressions I simply had no idea how to transform into valid Assembly code.
In my research I learned that most compilers have some sort of IR (Intermediate Representation), so after some consideration I decided I needed an IR to be a middle-point between the high-level AST and the low-level Assembly code, so the architecture looks like this right now:
Lexer (Tokens) -> Parser (AST) -> Type Checker -> Desugar -> IR -> x64 ASM
The IR
Finally we are at the main subject of this post. 🙃
Much like the Parser, I am also pretty familiar with Stack machines, having written at least a couple of them. The idea is very simple: you push values to the stack, and pop it when you need to perform operations on them.
So the expression:
print(2 + 5)
Could be performed as:
PUSH 2
PUSH 5
ADD // operates on the top 2 operands on the stack and push the result
CALL print 1 // pop 1 operand from the stack to use as the argument
This is pretty similar to some Bytecodes from some interpreted programming languages, it’s simple and gets the job done.
At the time when I created this IR, I did not know about SSA or Three-address code. Looking back at it now, I feel like I should have done it like that, but what is done is done and I’m not going to refactor all the IR and Codegen right now LOL.
Anyway, my IR is kind of a hybrid between Stack machine and Three-address code. Here’s the IR for the clear function of the Game of Life example above:
func gameoflife::clear -> void
arg #0 f: &Field2D;
ir_deref ARG(0); // (push)
ir_stack_dup; // (push)
ir_stack_dup; // (push)
ir_mul [POP() . 1] [POP() . 2]; // (push)
ir_mul POP() 4; // (push)
ir_cast POP(); // (push)
ir_call memset [POP() . 0] 0 POP();
endf
As you can see, there’s a stack that operands are being pushed to, but there’s also direct operands, and in fact one of the operands (POP()) is a placeholder that says “Please pop from the stack to get this operand”.
Again, this might not be the correct way to do it, especially because some of the most well-known optimizations that are done to Three-address code and SSA are not applicable here, but I think if someone wants to seriously use this compiler, they need professional help, or at least to write a C back-end for it. 🤣
The cool thing about it though, is that I don’t need to worry about Register Allocation here, I just push things to an imaginary stack and someone else will have to implement this imaginary stack, in this case it’s myself, but at least I know that a stack WORKS, I just need to emulate it using the x64 assembly code.
IR to Assembly
First, here’s the Assembly (Intel syntax) generated from the IR above (without name mangling for clarity):
clear:
;func clear(&Field2D): {}
mov qword[rsp+8],rcx
push rbp
push rbx
push r12
mov rbp,rsp
sub rsp,32
;prolog end
mov rbx,qword[rbp+32]
;ir_deref A0; (push)
;ir_stack_dup; (push)
;ir_stack_dup; (push)
mov r10d,dword[rbx+8]
imul r10d,dword[rbx+12]
mov eax,r10d
;ir_mul [POP() . 1] [POP() . 2]; (push)
imul eax,4
;ir_mul POP() 4; (push)
movsxd rax,eax
;ir_cast POP(); (push)
mov r8,rax
xor dl,dl
mov rcx,qword[rbx+0]
call memset
;ir_call memset [POP() . 0] 0 POP();
clear$end:
add rsp,32
pop r12
pop rbx
pop rbp
ret
The original IR instructions are commented right after the generated Assembly so you can see which IR instruction generated which parts of the code. Let’s walk through this piece py piece:
;func clear(&Field2D): {}
mov qword[rsp+8],rcx
push rbp
push rbx
push r12
mov rbp,rsp
sub rsp,32
;prolog end
This is the function prologue, a common piece of code that is similar in the beginning of all functions, it transfer the function arguments to memory (sometimes), saves any callee-saved registers used by the function, and allocates space for it’s stack frame (local variables). In fact, I just found a bug here, the r12 register should not be pushed here 😅. But the concept is the same regardless.
mov rbx,qword[rbp+32]
;ir_deref A0; (push)
This bit here is implementing a pointer dereferencing, it’s taking the 1st argument of the function (which is a pointer to the Field2D struct) and moving the address contained there to the rbx register.
As you can see, this IR instructions pushes to the imaginary stack I was talking about, so let’s go in more detail about that.
First let’s visualize a stack:
| ------------------ |
| |
| 0 |
| |
| ------------------ |
This is a stack of one element (the number 0), so we say that 0 is at the top of the stack, let’s push one more element, the number 1:
| ------------------ |
| |
| 1 |
| |
| ------------------ |
| |
| 0 |
| |
| ------------------ |
And the number 2:
| ------------------ |
| |
| 2 |
| |
| ------------------ |
| |
| 1 |
| |
| ------------------ |
| |
| 0 |
| |
| ------------------ |
Now the number 2 is at the top of the stack, great, let’s pop 2 elements from it:
| ------------------ |
| |
Bye bye number 2: | 2 |
| |
| ------------------ |
| ------------------ |
| |
| 1 |
| |
| ------------------ |
| |
| 0 |
| |
| ------------------ |
The stack is like this now:
| ------------------ |
| |
| 1 |
| |
| ------------------ |
| |
| 0 |
| |
| ------------------ |
Let’s pop one more element:
| ------------------ |
| |
Bye bye number 1: | 1 |
| |
| ------------------ |
| ------------------ |
| |
| 0 |
| |
| ------------------ |
And the stack is back to just the number 0:
| ------------------ |
| |
| 0 |
| |
| ------------------ |
(OK this is basic CS knowledge but I love to visualize stuff in articles so bear with me)
Anyway, going back to the imaginary IR stack, there’s a number of registers that I classified as eligible to be used by the stack, I called them the IR stack registers, they change from platform to platform, but when compiling for Windows they are:
rbx, r12, r13, r14, r15
Also, rax is always pushed to the IR stack if the next instruction consumes at least one operand from the stack, because it is a volatile register, unlike the others above.
So, remember this ASM code?
mov rbx,qword[rbp+32]
;ir_deref A0; (push)
When we are moving to rbx we are actually pushing a value to the IR stack. So the stack looks like this now:
| ------------------ |
| |
| rbx |
| |
| ------------------ |
Okay, let’s look at the next instruction:
;ir_stack_dup; (push)
;ir_stack_dup; (push)
As you can see, these two instructions didn’t generate any assembly code, but they manipulate the internal IR stack at compile-time, ir_stack_dup will duplicate the value at the top of the stack, pushing it again. So the stack now is:
| ------------------ |
| |
| rbx | (pushed by second ir_stack_dup)
| |
| ------------------ |
| |
| rbx | (pushed by first ir_stack_dup)
| |
| ------------------ |
| |
| rbx | (pushed by ir_deref)
| |
| ------------------ |
Let’s advance to the next instructions:
mov r10d,dword[rbx+8]
imul r10d,dword[rbx+12]
mov eax,r10d
;ir_mul [POP() . 1] [POP() . 2]; (push)
Now there’s two POP() operands, they are evaluated from right-to-left, but in this case it doesn’t matter because multiplication is commutative and the operands are actually the same (the rbx register).
I’m not going into to much detail, but the [POP() . 1]
and [POP() . 2]
expressions means a struct field access, which at this level is just applying some offset to a memory address, resulting in the dword[rbx+8]
and dword[rbx+12]
assembly expressions, respectively. So the struct field expressions will POP an operand from the stack (the rbx register) and apply the field offsets to the address contained in there. And since the rbx is duplicated in there, both expressions will use it. What that means, is that they are both accessing a different struct field from the same struct, in this case, the 1st argument for the clear function.
Just for reference, going back to the source code, these two assembly instructions represent the expression highlighted here:
/*func clear(f: &Field2D) {*/
/*memset(f.data, 0, */f.width*f.height/**sizeof(int))*/
/*}*/
Okay, so the IR stack was popped two times, right? ;ir_mul [POP() . 1] [POP() . 2]; (push)
So how does it look now?
| ------------------ |
| |
| rbx |
| |
| ------------------ |
Only the good old original rbx register is alone in there without it’s duplicated clones.
Let’s look back for a moment at these instructions again:
mov r10d,dword[rbx+8]
imul r10d,dword[rbx+12]
mov eax,r10d
;ir_mul [POP() . 1] [POP() . 2]; (push)
At that last one mov eax,r10d
, we are moving the result of the multiplication, which lives in the r10d register, into the eax register, which is actually the lower 32-bits of the rax register, meaning they are actually the same.
This actually means we are pushing the result of the multiplication back to the IR stack, let’s visualize it:
| ------------------ |
| |
| eax |
| |
| ------------------ |
| |
| rbx |
| |
| ------------------ |
But now we are using the rax/eax register because the next instruction will pop it, let’s check it:
imul eax,4
;ir_mul POP() 4; (push)
movsxd rax,eax
;ir_cast POP(); (push)
We are popping from the stack, but also pushing to it again, and because the next instructions also pops ir_cast POP()
, we can just work with the eax register, modifying it’s value in-place, in this case, multiplying it by 4 (4 being the sizeof(int)
from the source code).
The movsxd rax,eax
is just sign-extending the 32-bit value in eax to itself, but in the 64-bit version, rax. So now the top of the stack actually contains the rax:
| ------------------ |
| |
| rax |
| |
| ------------------ |
| |
| rbx |
| |
| ------------------ |
Now let’s see the last instruction of the function:
mov r8,rax
xor dl,dl
mov rcx,qword[rbx+0]
call memset
;ir_call memset [POP() . 0] 0 POP();
In this particular ABI (Windows x64 fastcall), we use the rcx, rdx/dl, r8 and r9 as arguments for making a function call. And since the instruction operands are evaluated right-to-left, each line of assembly above the call
is representing a instruction operand.
So there’s two POP() in there, and the first evaluated is the rightmost one. Remember what is at the top of the stack?
| ------------------ |
| |
| rax |
| |
| ------------------ |
| |
| rbx |
| |
| ------------------ |
Okay, so we’ll pop rax and use it as the third argument for the function call:
mov r8,rax
And the stack now is:
| ------------------ |
| |
| rbx |
| |
| ------------------ |
For the second argument we’re just putting 0 in the register using the xor instruction:
xor dl,dl
And the first argument [POP() . 0]
is a struct field access that uses a value from the stack to get the struct pointer, which translates to:
mov rcx,qword[rbx+0]
So we popped the last rbx from the stack, which is now empty. And we have finished the function, great!
Oh, there’s also the function epilogue, which is just reversing everything that was done in the prologue, but that is for another time!
Outro
Well, all this was just to share some thoughts about how a noob in compiler-crafting solved register allocation.
I’m working slowly at this compiler and hope to at least have something that someone can make a snake game in it or something, the important part is that I’m having fun while doing it. Thank you for reading!
3100 Words
2022-04-28 00:30