- Acronyms
- Intro
- Scanners
- Parsing
- Abstract Syntax Tree
- JOOS
- Type Checking
- Virtual Machines
- {Bite}Code Generation
- Optimization
- Midterm Review
- Garbage Collection
- Virtual Machines (VirtualRISC)
- Native Code Generation
- Web Assembly
- GPU
- Final Exam
AOT | Ahead Of Time compilation |
AST | Abstract Syntax Tree |
CST | Concrete Syntax Tree |
GC | Garbage Collection |
IL | Intermediate Language |
IR | Intermediate Representation |
JIT | Just In Time compilation |
JVM | Java Virtual Machine |
LALR | Look Ahead Left-to-right Right derivation |
LL | Left-to-right Left derivation |
LR | Left-to-right Right derivation |
LSP | Local Stack Pointer |
PC | Program Counter |
SP | Stack Pointer |
VM | Virtual Machine |
- A compiler transforms source files (eg in Java, C) into target code (eg x86)
- A scanner transforms source file characters into more meaningful tokens
- Keyword - eg
if
- Variable identifiers - eg
var0
- Syntax/structural elements - eg
{
- Note that keywords take precedence over any other rules
- Whitespace is typically ignored
- Keyword - eg
- Language (Comp 330 Review)
- Σ - alphabet, set of symbols, typically finite
- Word - finite sequence of symbols from alphabet
- Σ* - set of all possible words in Σ
- Language - subset of Σ*
- Regular languages - languages that can be accepted by a DFA, and for which regular expressions exist
- At its core, a regular expression contains
- ∅ - language with no strings
- ε - empty string
- α for α ∈ Σ
- | - alternation
- · - concatenation
- * - zero or more occurrences
- Scanners
- First phase of compiler
- List of regular expressions; one per token type
- Internally, transforms regular expressions to DFAs
- Algorithm is to match all DFAs against input
- Find the longest match
- If not empty, remove input and return a matching token
- Otherwise, return input as output
- If max length is the same for multiple DFAs, first one is typically used
- Find the longest match
- Rules can use look-aheads to increase match length, even though token length remains unchanged
- Eg FORTRAN
363.EQ.363
; avoids mattching363.
as float
- Eg FORTRAN
- Context-sensitive grammars
- Eg in C,
(a) * b
is either a type cast or a multiplication ifa
is a type or variable respectively - Solution is to either feed semantic language to the scanner, or run multiple passes
- Eg in C,
- Scanners should have a
.
rule at the end to match any unexpected character if no other rules match - Line numbers in
flex
- Manually counting by matching against
\n
- Use
%option yylineno
to get the updated variable
- Manually counting by matching against
- Scanner actions
- Do nothing
- Perform arbitrary computation
- Return a token
- Scanner efficiency
- Reduce number of regular expressions; note that keywords are already valid identifiers
- Scanner error handler
- Eg if positive ints cannot start with 0
- Parser error - check for two consecutive int tokens and fail on match
- Lexical error - create a new rule capturing all integers (allowing
0
prefix), and throw if rule matches
- Parsers
- Second phase of compiler
- Aka syntactic analysis
- Takes tokens from scanner and generates a parse tree using CFG
- Pushdown automata - FSM + unbounded stack
- Used ot recognize CFG
- CFG - Context-free grammar
- V - set of variables
- Σ - set of terminals where V ∩ Σ = ∅
- R - set of rules, where LHS ∈ V & RHS ∈ V ∪ Σ
- S - start variable where S ∈ V
- While DFAs and NFAs are equally powerful, DPDAs do not recognize all CFGs
- BNF - backus-naur form
- EBNF
- Parse tree - aka concrete syntax tree
- Built exactly from CFG rules
- Parent nodes form LHS of readwrite rule
- Child nodes form RHS of readwrite rules
- Leaves form parsed input sentence
- Ambiguous grammar
- Multiple parse tree results
- Precedence - order of operations
- Associativity - grouping of operations with same precedence
- Rewriting grammar
- Operands must not expand to other operations of lower precedence
- If left associative, only LHS may expand; if right associative, only RHS may expand
- Dangling else problem - ambiguous if statements without terminating token; solution in C is to match each
else
with the closest unmatchedif
- Look Ahead Left-to-right Rightmost derivation parser (LALR)
- Bottom up
- Eg yacc + bison (LALR(1)), SableCC
- Left-to-right Leftmost-derivation parser (LL)
- Top down
- For LL(k), look k tokens ahead and determine when to reduce
- Note that not all grammars are LL(k) for any fixed k
- Some rules require unbounded lookahead
- Recursive Descent Parsers
- Repeatedly expand left non-terminal
- If one occurrence, use rule
- If multiple, there's a conflict
- If none, there's an error
- Problematic when rules have same prefixes of length greater than the lookahead count
- Resolved by factoring the grammar
- Eg
if then end
,if then else end
becomesif then ifend
, whereifend
→end | else end
- Left recursion also a problem. A → A β | ε expands to ε | A | A β | A β β ...
- Repeatedly expand left non-terminal
- Two token collections, one for input stream and one for stack representation
- Shift - place token from stream to stack; more symbols are needed before applying rule
- Reduce - replace multiple symbols on stack with single symbol according to grammar
- Accept - when start symbol (
S'
) is on the stack - Using a table, effectively like LR parsing but in reverse
- Deciding between shift and reduce
- Implemented as stack of states; represents top content without caring about sub contents
- Parsing conflicts
- Resolving conflicts
- For operations with same precedence & left associativity, prefer reducing
- When reduction contains operation of lower precedence than lookahead token, prefer shifting
- Compiler architecture
- One-pass compiler - scans only once
- Prevents modularity and limits optimization
- Historically good as it's fast and space efficient
- Multi-pass compiler - 5-15 phases
- Requires intermediate representation (IR) of program
- One-pass compiler - scans only once
- Abstract syntax tree (AST)
- Only important terminals kept
- Tree is not language dependent
- Building IRs
- Extend parser & execute semantic actions
- Semantic actions - arbitrary actions executed during parser execution
- Semantic values
- Terminals - provided by scanner
- Non-terminal - created by parser
- Building IRs
- Extends parser
- Executes semantic actions during process
- Values are terminal (provided by scanner) or non-terminal (created by parser)
- LALR(1)
- Left recursion, for efficiency
- Subset of Java
- Types - boolean, int, char, external (Object, Boolean, Integer, String, etc)
- Expressions - evaluate to a value
- Constants, variables, binary op, unary op, class creation, expression cast, method invocation
- Statements - no associated values
- Expression statements, blocks, control (if, while), return
- If we wish to create certain constraints (ie avoid division by 0), updating the parser will double the code
- Instead, we can weed out exceptions by parsing invalid code and running pass-throughs afterwards
- Can also be used in more complicated cases, like distinguishing casting or parentheses around expressions.
- Semantic analysis - analyzes meaning of program
- Symbol table - analyzes variable definitions & uses
- Maps identifier to meanings
- Symbol table - analyzes variable definitions & uses
- Used in JOOS for
- Class hierarchy - what classes, inheritance
- Class members - what fields, what methods, signatures
- Identifier - when defined, when used
- Type checking - analyzes expression types & uses
- Previously, var declarations were done in one pass. In this case, we wouldn't be able to use elements in the global scope until they were defined
- Identifiers in the same scope can be unique, but can shadow parent scopes
- Dynamic scoping - uses program state to resolve symbols
- Mutual recursion
- Requires two traversals, one to collect definitions and another to collect uses
- Type annotations are invariant about the run-time behaviour
- Rules
- Declaration - introduce variables
- Propagation - expression type determines enclosing expression type
- Restrictions - expression type constrained by usage context
- Logical rules
- (Γ ⊢ P) / ( Γ ⊢ C) - if P is provable under context Γ, then C is provable under context Γ
- Γ ⊢ E : T - under context Γ, it is provable that E is well typed with type T
- (Γ [x ↦ T] ⊢ S) / (Γ ⊢ x; S) - modify context
- x maps to T within context Γ, and S typechecks with x added to the symbol table
- (Γ(x) = T) / (Γ ⊢ x : T) - access context
- x := T - x is assignable to a value of type T
- a ∨ b - or conditions
- L - class library
- C - current class
- M - current method
- V - variables
- Type proof
- Internal nodes are inferences
- Leaves are axioms or true predicates
- Program is statically type correct iff it is root of some type proof
- Ahead of time compilation (AOT) - source code to machine code before execution
- Advantages
- Fast execution
- Allows optimization
- Intermediate languages facilitate code generation for target architectures
- Disadvantages
- Runtime information ignored
- Code generator must be built
- Virtual machines
- Not tied to any architecture
- Delays generating native code until execution time
- Advantages
- Interpreted - instructions read one at a time; not compiled
- Advantages
- Easy to generate virtual machine code
- Code is architecture independent
- Bytecode can be more compact
- Disadvantages
- Poor performance
- Advantages
- Just in time (JIT) - generate native code during program execution
- Advantages
- Target specific architectural details
- Observe program properties at runtime
- Efficiently allocate optimization time towards important methods
- Disadvantages
- Program performance depends on compile time
- Advantages
- Abstract machine - intermediate language
.class
files- Magic number (
0xCAFEBABE
) - Minor version/major version
- Constant pool
- Access flags
- This class
- Super class
- Interfaces
- Fields
- Methods
- Attributes
- Magic number (
- Java class loaders
- Extend
java.lang.ClassLoader
- Allow for loading classes from other sources & transforming classes during loading
- Extend
- Consists of
- Memory
- Stack (function call frames)
- Call stack (function call frames)
- Reference to constant pool
- Reference to current object (
this
) if any - Method arguments
- Local variables
- Local stack for intermediate results (baby stack)
- Baby/operand/local stack - operands & results from instructions
- Call stack (function call frames)
- Heap (dynamically allocated memory)
- Constant pool (shared constant data)
- Code segment (JVM instructions of currently loaded class files)
- Stack (function call frames)
- Registers
- No general purpose registers
- Stack pointer (
sp
) pointing to top of stack - Local stack pointer (
lsp
) points to location in current stack frame - Program counter (
pc
) points to current instruction
- Condition codes
- Execution unit
- Memory
- Data Types
- Primitives - boolean, integral (byte, short, int, ...), floating, internal
- Reference - class, array, interface
- Jasmin Code
- Z - boolean
- F - float
- I - int
- J - long
- V - void
- Reference types represented by fully qualified names
- Methods
- Signature -
.method <modifiers> <name>(<parameter types>) <return type>
- Stack limits
.limit stack <limit>
,.limit locals <limit>
- Method body
- Termination line
.end method
- Signature -
- JVM contains 256 instructions for arithmetic ops, constant loading, local operations, branch operations, stack operations, class operations, and method operations
- Unary arithmetic ops
ineg
- negatei2c
- to char,% 65536
- Binary arithmetic ops
iadd
isub
irem
- Direct ops
iinc k a
-local[k] += a
- Constant loading
iconst_0
- load 0aconst_null
-loadnull
ldc_int i
- loadint
- Local ops
iload k
- addlocal[k]
to stackistore k
- pop stack and store tolocal[k]
aload
,astore
, counterparts for references
- Field ops
getfield f sig
,putfield f sig
, modify fields of objects using value from stack.
- Branch ops
goto L
ifeq L
,ifgt L
,ifnonnull L
- reads first stack valueif_icmpeq_L
- compares top two elements
- Stack ops
dup
- repeat stack valuepop
swap
nop
- does nothing
- Class ops
new C
- allocate spaceinvokespecial C/<init>()V
- execute constructorinstance_of C
- puts 0 or 1 on stackcheckcast
- Method ops
invokevirtual m sig
- calls method taking ino
(self) plus all argsireturn
,return
- Unary arithmetic ops
- Java Class Loading
ClassLoader
finds class and checks that method exists- If method not loaded, it is verified
- Method body is interpreted
- If executed multiple times, bytecode is translated to native code
- If method becomes hot, native code is optimized
- Bytecode verification syntax
- First 4 bytes should be
0xCAFEBABE
- Bytecodes should be syntactically correct
- All instructions complete
- Branch targets within code segment
- Only legal offsets referenced
- Constants have appropriate types
- Execution must return
- Stack (Important)
- Should be same size along all execution paths
- Should have the same types along
- First 4 bytes should be
- Resource offsets
- For variables that can never exist together (same scope level), we can reuse offsets and lower our limit
- Extra slot required for non-static method
- Labels
if
- 1 labelifelse
- 2 labelswhile
- 2 labels||
and&&
- 1 label (short circuit)==
,<
,>
,<=
,>=
, and!=
- 2 labels (likeifelse
branching, as we are not saving a value in the stack)!:
- 2 labelstoString
coercion - 2 labels
- Code templates allow code generation from AST, without worrying about surrounding context
- Simple, recursive strategy
Template | Jasmin |
---|---|
if (E) S | E ifeq stop S stop: |
if (E) S_1 else S_2 | E ifeq else S_1 goto stop else: S_2 stop: |
- Focus on
- Reducing execution time
- Reducing code size
- Reducing power consumption
- Duff's Device - For loop unrolling for any count
register count; { register n = (count + 7) / 8; switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); } }
- Peephole optimization focuses on a small window at a time, without program context
- We can look at the instruction at the current and following positions
- We can check if a label is alive
- We can modify the current instruction, or delete a label
- When writing peephole patterns, ensure that it is sound
- Local variables have same values
- Stack height changes by same amount
- All paths yield same outcome
- Compiler phases
- Scan - chars to tokens
- Parse - tokens to syntax tree
- Weed - rejects invalid trees (syntax)
- Symbol - tracks context
- Type - AST to AST + types; rejects programs that are semantically correct but syntactically wrong
- Resource
- Code
- Optimize
- Emit
- Type checking definitions
- a / b - if a then b
- a ⊢ b - if under the assumptions of a, then b is provable
- a : b - a has type b
- Memory allocation
- In stack
- Function call info, local variables, return values
- Allocated & deallocated at beginning & end of function
- Contains fixed size data
- Contains constant pool, local vars, baby stack
- Memory allocation is easy
- In heap
- Dynamic - unknown size & time
- Requires runtime support for managing heap space
- Data never point from stack to heap
- Heap deallocation
- Manual - user code
- Advantages
- Reduces runtime complexity
- Programmer has full control
- May be more efficient
- Disadvantages
- Requires extensive effort
- Error prone
- Often less efficient
- Can free larger blocks at once
- Advantages
- Continuous - determined on the spot
- Periodic - interval checks
- Manual - user code
- In stack
- When are records dead?
- Ideally, when they are never accessed again, but this is undecidable
- Conservative assumption - if they are no longer used within stacks; dead records may still be referenced from other dead records
- Continuous GC
- Extra field to track pointers
- Increment on creation, decrement on destruction
- Dead when reference count is 0
- Disadvantage
- Cyclic references don't get GC'd
- Incremental; continuous slow down
- Automatic Reference Counting (ARC)
- For objective-C & Swift
retain
andrelease
inserted at compile time- Unnecessary updates are optimized
- Periodic GC
- Mark - starting from root (eg stack), if not marked, mark and apply dfs
- Sweep - go through all records and reclaim unmarked once
- Unmark all marked records as we go
- Assumes that
- We know the size of each record
- We know which fields are pointers
- Can run in parallel
- Disadvantage
- Scanning can be expensive
- Heap may become fragmented
- Can be resolved using compaction, where we collect all live objects and reorganize them to be contiguous
- Periodic GC
- Divides heap into two, only uses one at a time
- Runs fully copy of live record to other allocation, then switches roles of two parts
- Copy is done using BFS, and invokes
Forward
to ensure that all reference go to the to-space (before the switch) - Avoids fragmentation
- Avoids stack & pointer reversal
- Disadvantage
- Wastes half the memory
- Generational Collection
- The young die quickly; those that remain live for a long time will likely continue to remain live
- Divide heap into generations, and check GC more frequently for younger generations; promote if record survives several collections
- Incremental collection
- Interweave GC with program execution, to reduce undesirable pauses
- Two-player access
- Use mutator and collector
- Register-based IR
- Stack - for function call frames
- Heap - for dynamically allocated memory
- Global pool - for global variables
- Code segment - for VirtualRISC instructions
- Registers
Ri
- unbounded general purpose registerssp
- stack pointer to top of stackfp
- frame pointer to current stack framepc
- program counter to current instruction- Automatically incremented per instruction execution
- May also be changed by function calls & branches
- Instructions
- Movement
[..]
indicates memory location store in register- Store -
st <src>, <dst>
st Ri, [Rj]
;[Rj] := Ri
- Load -
ld <src>, <dst>
ld [Ri + C], Rj
;Rj := [Ri + C]
- Move -
mov <src>, <dst>
mov Ri, Rj
;Rj := Ri
- Math Ops
op <src1>, <src2>, <dst>
add Ri, Rj, Rk
;Rk := Ri + Rj
; (sub
,mul
,div
)
- Branching
b L
; (bg
,bge
,bl
,ble
,bne
)if R1 <= 0 goto L1
→cmp R1, 0; ble L1
- Special
call L
;R15 := pc; pc := L
save sp, -C, sp
; save registers, allocating C bytes on stackrestore
; restore registersret
;pc := R15 + 8
nop
; do nothing
- Movement
- Stack Frame
- Created by function call;
push fp; fp := sp; sp := sp + C
- Popped by function return
sp := fp; fp = pop
- Local variables stored relative to
fp
- Created by function call;
- Calling
- Function starts by
save sp, -C, sp
, where C is arbitrarily large (eg 112; multiple of 4) - Function ends by
restore
, thenret
, where the value isR0
- Parameters
- Passed in registers
R0
,R1
, etc - If in memory, convention is
fp = 68 + 4k
, where k is a non-negative integer; this is stored in the callers frame
- Passed in registers
- Local variables
- Use any general purpose register
- May use memory; convention is
fp = 4k
, where k is a non-negative integer (in class, starts at[fp = 12]
)
- Function starts by
- JOOS Code executes through
- Interpreter
- Ahead-of-time compiler
- Translate directly to native code
- Not as useful for Java/JOOS
- Method code fetched as needed
- Needs to allow code across internet
- Should support different native code sets
- Just-in-time compiler
- Merge interpreters with traditional compilation techniques
- When method is invoked for the first time
- Bytecode fetched
- Translated into native code
- Control given to generated code
- Needs to be fast to be useful
- Important problems
- Instruction selection - choose correct instructions from native code instruction set
- Memory modelling - decide where to store variables & allocate registers
- Method calling - determine calling conventions
- Branch handling - allocate branch targets
- Map locals/stacks to frame
- Stack memory can be arbitrarily large, but is limited by hardware
ulimit -a
- Registers are very limited, usually much less than number of program variables
- Want to allocate as many variables in register as possible
- Liveness analysis
- Variable is live if it can be read in the future
- Undecidable, but can be approximated
- Stack memory can be arbitrarily large, but is limited by hardware
- No assumptions between instructions; each must load respective contents to proper register
- Eg: loading and adding requires loading to registers, storing to stack, then loading from stack (to registers) and adding; optimal is to use registers directly
- m registers to first m locals
- n registers to first n stack locations
- k scratch registers
- Spill remaining locals & locations into memory
- Registers allocated once, and does not change
- No difficult control flow paths
- Wastes registers, and assumes first locals/stack locations are more important
- Registers allocated on demand
- Slots kept in registers within basic block
- Registers not wasted
- Less spill code
- Much more complex
- Spill occurs at end of basic block
- When branching forward in native code, target address is not yet known
- Unresolved branches are kept in a table, then patched when the code is generated
- ASM - LLVM to JS static compiler (2011)
- Load store consistently - ensuring that stored value and loaded value are of the same size
- For instance, setting an int and loading as a char breaks the consistency
- ASM assumes the program respects it, and does not make verification; can therefore store single int in one value of the stack
- Still no parallelism, and no garbage collection
- To support more features, JS must also grow
- WebAssembly aims to bridge the gap in performance
- Supports integers & float types natively
- Increased loading speed via binary decoding
- Parallelism via SIMD
- Garbage collector for main memory
- Code validation
- Hardware-independent
- Pipeline - decode → validate → instantiate & invoke
- Decoding/encoding
- Bytes as is
- Type map; eg
0x7F
→i32
- Validation
- Decoded values must be within limit (no overflow)
- Optimized for throughput
- Slow cores
- Larger number of hardware threads
- SIMT - Single Instruction, Multiple Thread
- Program specifies behaviour of single thread
- SIMD - Single Instruction, Multiple Data
- Program specifies behaviour of all threads in width
- Thread groups
- Collection of threads
- May work together
- Can be synchronized and share data
- Thread hierarchy
- Thread identifier - unique per thread
- Group identifier - unique per thread group
- Memory hierarchy
- Each thread has private memory
- Each thread group has shared memory
- Requires synchronization
- Cannot be accessed by other groups
- Coalescing - based on transaction size, we can minimize the number of accesses by retrieving a larger contiguous block of values
- Scanner & Parser
- How to implement scanner in
flex
orSableCC
- DFA's/NFA's, CFGs
- Parser ambiguity
- No precedence directives
- Scanner tokens
- Keywords
- Syntax elements
- Literals
- Identifier
- Comments
- Errors (.)
- Parsing
- Lists (empty, nonempty)
- Recursion
- Expressions
- How to implement scanner in
- Typechecking
- Can specify multiple rules
- Bytecode Generation
- Review both midterm and final
- Jasmin syntax
-
dup
,swap
,putfield Class