This page gives a brief introduction to how Mips2cs works. It is not intended to be a full description (for that you will have to look at the source code) but it should give you a rough idea of how things work.
The basic "pipeline" of Mips2cs is summarized in the following diagram:
The sequence of events during a typical run is as follows:
The following sections give more details.
The Intermediate representation is comprised of three main elements: Exprs, CondExprs and Statements. A full definition of these can be found in the source code (in module Mips2cs.Intermediate); here we give a brief description of each.
Represents an arbitrary integer expression. Examples:
(LitExpr 42)– the number 42
(LoadRegExpr "v0")– the contents of MIPS register
(LoadMemExpr MemWord (LitExpr 0x1234))– the 32-bit word currently stored at address 0x1234
(UnExpr Negate (LoadRegExpr "s1"))– the expression
(BinExpr Add (LoadRegExpr "t2") (LitExpr 1))– the expression
(t2 + 1)
(VarExpr "foo")– the current contents of the variable
Represents a condition (boolean expression). Examples:
(BinCond Equal (LoadRegExpr "v0") (LitExpr 23))– true if
v0is equal to 23
(BinCond LessThan (LoadRegExpr "t0") (LoadRegExpr "t1"))– true if
t0is less than
(LitCond True)– always true
Represents an operation performed by the MIPS machine. Examples include:
(StoreReg "v0" (LitExpr 2))– writes the number 2 into register
(StoreMem MemByte (LitExpr 0x1234) (LitExpr 42))– writes the byte 42 into memory location 0x1234
(Jump (BinCond Equal (LoadRegExpr "v0") (LitExpr 0)) 0x1234 0x5678)– if
v0==0, then goto address 0x1234, else goto 0x5678.
(Jump (LitCond True) 0x1234 0x5678)– jumps unconditionally to 0x1234
(IndirectJump (LoadRegExpr "ra"))– jumps to the address pointed to by the
(Let "foo" (LitExpr 31))– creates a variable called
fooand assigns the value 31 to it. (See below for more about variables.)
The Intermediate language contains the concept of variables. Variables are created by the
Let statement and can be used in expressions via the
VarExpr construct (see above).
Variables must follow two rules:
Variables are used extensively during the optimization passes of Mips2cs (this will be discussed further in the section on optimization, below).
The idea is that the entire MIPS machine code program can be converted into an equivalent program in the Intermediate language. To do this, each MIPS instruction is converted into zero or more Statements. This is usually straightforward. For example:
addiu v1, a0, 1→
StoreReg "v1" (BinExpr Add (LoadRegExpr "a0") (LitExpr 1))
subu a0, a1, s0→
StoreReg "a0" (BinExpr Sub (LoadRegExpr "a1") (LoadRegExpr "s0"))
mult t0, t1→
StoreReg "hi" (BinExpr MultHi (LoadRegExpr "t0") (LoadRegExpr "t1"))and
StoreReg "lo" (BinExpr Mult (LoadRegExpr "t0") (LoadRegExpr "t1"))
lw s0, 4(sp)→
StoreReg "s0" (LoadMemExpr MemWord (BinExpr Add (LitExpr 4) (LoadRegExpr "sp")))
sb s0, 4(sp)→
StoreMem MemByte (LoadRegExpr "s0") (BinExpr Add (LitExpr 4) (LoadRegExpr "sp"))
IndirectJump (LoadRegExpr "ra")
Jump (LitCond True) 0x1234 0x1234
Branches require special handling because of the MIPS delay slot mechanism. Mips2cs creates two Statements for a branch instruction: one for the delay slot, and one for the branch itself. In addition, any registers referenced in the branch condition are saved (using Let statements) to ensure that the correct value gets used.
For example, the following disassembly:
400084: 1060000c beqz v1,4000b8 400088: 24630001 addiu v1,v1,1
would be translated to the following three Statements:
Let "delay1" (LoadRegExpr "v1") StoreReg "v1" (BinExpr Add (LoadRegExpr "v1") (LitExpr 1)) Jump (BinCond Equal (VarExpr "delay1") (LitExpr 0)) 0x4000b8 0x40008c
Note that the
Let is essential in this example; without it, the wrong value of v1 would be used in the branch condition.
After generating the Statements for each machine code instruction, we organize them into basic blocks. (A basic block is a chunk of code that has exactly one entry point and one exit point.)
In Mips2cs basic blocks are represented by
The following invariants are maintained:
The results of the basic block analysis are used extensively during the optimization and code generation passes.
Once the Intermediate code has been produced, Mips2cs runs a series of optimization passes on it, in order to make it shorter and simpler. (This results in a shorter and faster C# program at the end of the process.)
The optimizations are done by the module Mips2cs.Simplifier, which runs several optimization passes, as follows.
We run a pass to simplify Exprs at compile time. This uses simple pattern matching to calculate compile time constants (e.g. "1 + 2" is changed into "3") and apply arithmetic identities (e.g. "x + 0" is changed into "x", for any expression x).
A naive translation of machine code into C# would tend to produce sequences like the following:
v0 = s0 + s1; v0 = v0 + s2; v0 = v0 + 10;
(where each line corresponds to a single machine code instruction). However, it would be more natural to write this in a single line, as follows:
v0 = s0 + s1 + s2 + 10;
Not only does this result in a shorter C# program, it also results in a smaller final compiled executable (at least when using the Microsoft C# compiler – which appears to generate a more efficient bytecode sequence for the second form of the code, as compared to the first).
Therefore, we introduce two optimization passes which are designed to turn the first form of the code into the second. These are as follows:
If a register is written multiple times, all writes bar the last can be changed into local variables (i.e.
Let statements). For example, the following:
StoreReg "v0" (BinExpr Add (LoadRegExpr "s0") (LoadRegExpr "s1")) StoreReg "v0" (BinExpr Add (LoadRegExpr "v0") (LoadRegExpr "s2")) StoreReg "v0" (BinExpr Add (LoadRegExpr "v0") (LitExpr 10))
can be changed into:
Let "dummy_var_1" (BinExpr Add (LoadRegExpr "s0") (LoadRegExpr "s1")) Let "dummy_var_2" (BinExpr Add (VarExpr "dummy_var_1") (LoadRegExpr "s2")) StoreReg "v0" (BinExpr Add (VarExpr "dummy_var_2") (LitExpr 10))
Because "Lets" are, by definition, local variables, it is legitimate to substitute the right-hand side of the Let directly into the expression where the variable is used, and then delete the Let. So, continuing our example, we may first substitute for
dummy_var_1 to obtain:
Let "dummy_var_2" (BinExpr Add (BinExpr Add (LoadRegExpr "s0") (LoadRegExpr "s1")) (LoadRegExpr "s2")) StoreReg "v0" (BinExpr Add (VarExpr "dummy_var_2") (LitExpr 10))
and then substitute for
dummy_var_2 to obtain:
StoreReg "v0" (BinExpr Add (BinExpr Add (BinExpr Add (LoadRegExpr "s0") (LoadRegExpr "s1")) (LoadRegExpr "s2")) (LitExpr 10))
We have now arrived at the desired form of the code, i.e. one single assignment instead of three separate assignments.
Note that there are some cases where we do not want to substitute out a Let statement. These are as follows:
If we have something like
Let "dummy_var_1" (some huge complex expression) StoreReg "v0" (VarExpr "dummy_var_1") StoreReg "v1" (VarExpr "dummy_var_1")
then we don't want to substitute for dummy_var_1, because we will then be evaluating the huge complex expression more than once. Therefore, before doing a substitution, Mips2cs checks that it is not going to result in repeated calculation being done.
Sometimes it is not safe to substitute a variable, e.g. in
Let "foo" (LoadRegExpr "v1") StoreReg "v1" (LitExpr 1) StoreReg "v2" (VarExpr "foo")
we cannot substitute
(LoadRegExpr "v1") in place of
(VarExpr "foo") , because the value of v1 has changed in the interim. We refer to this situation as a data hazard, and in such cases, substitution is not performed.
We say that a register is "dead" if its value is no longer needed after the current basic block completes. In other words, future basic blocks will not read from that register (or at least, they will always write to it before they read it).
Mips2cs includes a liveness analysis pass which can detect dead registers in many cases. If a dead register is found, the
StoreReg statement (where the dead value gets written) is changed into a
Let. The substitution pass is then run again; this often allows the
Let to be substituted into other expressions, or even removed entirely.
A common example is the MIPS multiply instruction which writes a 64-bit result to the
lo registers. In many cases, the
hi value is never used. In these cases, the liveness analysis (and following substitution pass) is usually able to remove the assignment to
There are also examples where liveness analysis does not remove a value entirely, but it does lead to further optimizations being made. For an example see the Worked example.
Once we have our optimized Intermediate code we can use it to generate the final C# code. By this point, all of the hard optimization work has been done, so we can translate the Intermediate code into C# in a relatively straightforward and mechanical way.
An example of the C# code generated by Mips2cs is given in cs_example.html.
Points to note are the following:
MipsCallbacksinterface allows the MIPS code to call back to arbitrary C# code.
MipsVMclass contains the state of the MIPS machine (data members) as well as the translated MIPS code (methods).
intvariables in the
intarray (the first index being the page number, and the second being the offset within the page).
new) without having to copy the entire memory contents.
WriteWordetc) rather than being done inline.
Exec65etc). There is also a top-level
Runmethod which dispatches to the appropriate
Execmethod (depending on the current position within the code).
Runmethod, is that the smaller
Execmethods are (in theory) more JIT-friendly, and should therefore run faster. (However, this has not been tested in practice, nor has the optimal size of the
Execmethods been determined.)
Execmethods themselves are structured as large
switchstatements. Each basic block of the original code becomes a single
caseblock in one of these methods.
StoreRegoperations become assignments, e.g.
StoreReg "v1" (BinExpr Add (LoadRegExpr "s0") (LoadRegExpr "s1")))translates into
v1 = s0 + s1;.
Letstatements become assignments to local variables.
Jumpstatements translate into
goto(if the destination is within the same
return(if we need to go to a different
Execmethod; in this case, the code goes back up to
Run(), which then sends us back down into the new
IndirectJumpstatements need special handling because we need to map the jump address into a basic block number. This mapping is handled by a hash table (called
case_tablein the code).
If you want to see how the above works in a real example then you may want to look at the Worked example page. This illustrates the complete pipeline (including translation to Intermediate code, optimization, and C# code generation) for a simple factorial function.