The semantic of the MiniImp language is implemented in the \href{../lib/miniImp/Semantics.mli}{Semantics.mli} and \href{../lib/miniImp/Semantics.ml}{Semantics.ml} file.
A \texttt{reduce} function is provided that transforms an AST into the evaluated value or an error.
The AST type is defined in \href{../lib/miniImp/Types.mli}{Types.mli} and in \href{../lib/miniImp/Types.ml}{Types.ml}.
A program \texttt{p} is defined as follows:
\begin{grammar}
<p> \(\defeq\) `def main with input' <x> `output' <y> as <c>
{\langle\sigma, c_1\rangle\to_c \sigma_1 &\langle\sigma_1, \texttt{while } b \texttt{ do } c_3 \texttt{; } c_2 \rangle\to_c \sigma_2}% chktex 1
{\langle\sigma, \texttt{for}\texttt{(} c_1 \texttt{, } b \texttt{, } c_2 \texttt{)}\texttt{ do } c_3 \rangle\to_c \sigma_2}% chktex 1 chktex 9
\end{center}
but the implementation exploits the structure and doesn't simply rewrite the for loop as a while loop.
\end{subsection}
\begin{subsection}{MiniFun Semantics}
The semantic of the MiniFun language is implemented in the \href{../lib/miniFun/Semantics.mli}{Semantics.mli} and \href{../lib/miniFun/Semantics.ml}{Semantics.ml} file.
As reflected in the grammar, tuples have been implemented and the unary functions fst and scn return respectively the first element of the tuple and the second.
In the files \href{../lib/miniFun/TypeChecker.mli}{TypeChecker.mli} and \href{../lib/miniFun/TypeChecker.ml}{TypeChecker.ml} there is the implementation of the deduction rules, but returns either the valid type of the expression or an error instead of simply the required option type of the valid type.
As seen in class, lexing and parsing is done with ocamellex and menhir in the files \href{../lib/miniImp/Lexer.mli}{Lexer.mli} and \href{../lib/miniImp/Parser.ml}{Parser.ml}.
Operators listed in order of precedence from highest to lowest:
The expressions \(c_1\texttt{;} c_2\) and \(c_3\texttt{;}\) are both recognized and give respectively \(\texttt{SEQUENCE(} c_1\texttt{,} c_2\texttt{)}\)% chktex 9
and \(c_3\), such that semicolons can be placed always at the end of a command.
As seen in class, lexing and parsing is done with ocamellex and menhir in the files \href{../lib/miniFun/Lexer.mli}{Lexer.mli} and \href{../lib/miniFun/Parser.ml}{Parser.ml}.
A decision was made to interpret \texttt{\textbackslash}, \texttt{lambda} and \texttt{fun} all as the start of the definition of a function just for ease of typing. They are associated to the same token \texttt{LAMBDA}.
Operators listed in order of precedence from highest to lowest:
Tuples require parenthesis in their definition, but the tuple type does not since there is no ambiguity. The symbol \texttt{->} that defines the function type is right associative and has lowest precedence.
Both MiniImp and MiniFun have each an interpreter (\href{../bin/miniFunInterpreter.ml}{miniFunInterpreter.ml} and \href{../bin/miniFunInterpreter.ml}{miniFunInterpreter.ml}) that uses the package \href{https://opam.ocaml.org/packages/clap/}{Clap} to parse command line arguments and generate help pages.
The input to the program can be supplied both in stdin or as a command parameter after \texttt{-v}. The MiniFun interpreter also check the types before computing the output of the program and returns an error in case the types mismatch.
\end{subsection}
\end{section}
\begin{section}{Control Flow Graph}
The control flow graph data structure is implemented in the analysis library in the files \href{../lib/analysis/Cfg.ml}{Cfg.ml} and \href{../lib/analysis/Cfg.mli}{Cfg.mli}.
Each node contains only an id to distinguish from others.
The control flow structure is composed of a flag to know if it is empty or contains nodes and the set of all contained nodes.
Since each node can only have at maximum 2 nodes as next nodes, the data structure contains a map from each node to a tuple of the two nodes or to a node.
The structure also contains the back edges of each node implemented as a map from each node to a list of nodes, the input value, the variables that are the input and output, the initial node and the terminal node.
MiniImp Simple Statements \(t\) is defined in the files \href{../lib/miniImp/CfgImp.ml}{CfgImp.ml} and \href{../lib/miniImp/CfgImp.mli}{CfgImp.mli} as follows:
The implemented CFG is neither minimal nor maximal, but can be either or both for some programs. In particular each node as associated a list of statements and sequence of statements in the AST is put, if possible, in the same node.
\texttt{?} is only allowed as the last element of the list of statements associated with a node and a node has associated a \texttt{?} if and only if they have two next nodes.
\node[above] at (externalboxall.north) {becomes:};
\end{tikzpicture}
\end{center}
We highlight the fact that the operation powermod is absent in the grammar of simple statements. In fact all powermod are replaced in the AST before translating into CFG with the function \texttt{rewrite_instructions} in \href{../lib/miniImp/replacePowerMod.ml}{replacePowerMod.ml} and \href{../lib/miniImp/replacePowerMod.mli}{replacePowerMod.mli}.
The variables \texttt{pow}, \texttt{exp}, \texttt{mod} and \texttt{res} are all fresh and the value of res is then substituted into powmod place. This might need some more scope than only the expression since \texttt{powmod} may be included in a \texttt{if} guard, thus it is placed before the \texttt{if}; in case it is in the guard of a \texttt{while} or a \texttt{for} loop it is also updated at the end of the body.
The reason for substituting \texttt{powmod} in the AST is that we would need to add nodes to form the \texttt{if} and \texttt{while} and it would prove more difficult.
\end{subsection}
\end{section}
\begin{section}{Intermediate Code Generation}
\begin{subsection}{MiniRISC CFG}
In the files \href{../lib/miniImp/CfgRISC.ml}{CfgRISC.ml} and \href{../lib/miniImp/CfgRISC.mli}{CfgRISC.mli} the CFG generated from the AST gets translated into intermediate code with the following MiniRISC simple statements:
\begin{grammar}\label{grammar:MiniRISC}
<t> \(\defeq\) Nop
\alt{} BRegOp <brop> <r> <r> \(\Rightarrow\) <r>
\alt{} BImmOp <biop> <r> <n> \(\Rightarrow\) <r>
\alt{} URegOp <urop> <r> \(\Rightarrow\) <r>
\alt{} Load <r> \(\Rightarrow\) <r>
\alt{} LoadI <n> \(\Rightarrow\) <r>
\alt{} Store <r> \(\Rightarrow\) <r>
<brop> \(\defeq\) Add | Sub | Mult | Div | Mod | Pow | And | Or
\alt{} Eq | Less | LessEq | More | MoreEq
<biop> \(\defeq\) AddI | SubI | MultI | DivI | ModI | PowI | AndI | OrI
\alt{} EqI | LessI | LessEqI | MoreI | MoreEqI
<urop> \(\defeq\) Not | Copy | Rand
\end{grammar}
Since we stride towards shorter code and less instructions, we would prefer to use the \texttt{biop} version of each operation whenever possible. So for some operations that are commutative if the first term is the immediate value we swap the terms and use the \texttt{biop} variant instead of loading the value into a register and using the register for the calculation. Also some operations like \texttt{>} and \texttt{<} are opposite, so to invert the order we need to use the other \texttt{biop} version.
The input variable and the output variable are also mapped to \texttt{in} and \texttt{out} registers, while all other variables are given fresh registers.
The MiniRISC CFG is finally translated into MiniRISC intermediate code by the function \texttt{convert} in the files \href{../lib/miniImp/RISC.ml}{RISC.ml} and \href{../lib/miniImp/RISC.mli}{RISC.mli}.
The grammar of MiniRISC is analogous to the one for \hyperref[grammar:MiniRISC]{MiniRISC Simple Statements}:
\begin{grammar}
<t> \(\defeq\) Nop
\alt{} BRegOp <brop> <r> <r> \(\Rightarrow\) <r>
\alt{} BImmOp <biop> <r> <n> \(\Rightarrow\) <r>
\alt{} URegOp <urop> <r> \(\Rightarrow\) <r>
\alt{} Load <r> \(\Rightarrow\) <r>
\alt{} LoadI <n> \(\Rightarrow\) <r>
\alt{} Store <r> \(\Rightarrow\) <r>
\alt{} Jump <l>
\alt{} CJump <r> <l> <l>
\alt{} Label <l>
<brop> \(\defeq\) Add | Sub | Mult | Div | Mod | Pow | And | Or
\alt{} Eq | Less | LessEq | More | MoreEq
<biop> \(\defeq\) AddI | SubI | MultI | DivI | ModI | PowI | AndI | OrI
\alt{} EqI | LessI | LessEqI | MoreI | MoreEqI
<urop> \(\defeq\) Not | Copy | Rand
\end{grammar}
where \texttt{l} is a string that uniquely identifies a label.
\end{subsection}
\begin{subsection}{RISC Semantics}
It is also implemented in the files \href{../lib/miniImp/RISCSemantics.ml}{RISCSemantics.ml} and \href{../lib/miniImp/RISCSemantics.mli}{RISCSemantics.mli} a reduce function, that evaluates MiniRISC code.
The labels are used as is and not replaced by offsets, so the code is translated into a map from labels to code blocks for ease of computation.
A refined CFG structure used for analysis is defined in \href{../lib/analysis/Dataflow.ml}{Dataflow.ml} and \href{../lib/analysis/Dataflow.mli}{Dataflow.mli}. The CFG is supplemented with a map from each node to the support structure that stores the list of defined variables or live variables. Since the CFG is not minimal, there is also a list for each simple statement. A fixed point function then applies the input function until the map does not change. Simple structural equality is not appropriate since order in the lists should not matter; an internal function for equality is used.
In the files \href{../lib/miniImp/definedVariables.ml}{definedVariables.ml} and \href{../lib/miniImp/definedVariables.mli}{definedVariables.mli} three functions are defined: \texttt{compute_defined_variables}, \texttt{compute_cfg} and \texttt{check_undefined_variables}.
\texttt{compute_defined_variables} creates the appropriate structure for the analysis and runs it. It returns the whole analysis structure.
\texttt{compute_cfg} returns the CFG from the analysis data structure; in the case of defined variables analysis the CFG returned is the same as the one in input of \texttt{compute_defined_variables}.
\texttt{check_undefined_variables} returns all variables that might be undefined at time of use.
Since the greatest fixed point is computed, first all variables are retrieved from all code, then assigned to each input and output list of variables for each line of code.
Since it is an approximation some behaviour might not be intuitive. For example:
\begin{lstlisting}
for (x := 0, x < 10, x := x + 1) do (
y := rand(x);
);
output := y;
\end{lstlisting}
will return the register associated with \texttt{y} as undefined since the guard of the for cycle might never be true.
\end{subsection}
\begin{subsection}{Live Variables}
In the files \href{../lib/miniImp/liveVariables.ml}{liveVariables.ml} and \href{../lib/miniImp/liveVariables.mli}{liveVariables.mli} three functions are defined: \texttt{compute_live_variables}, \texttt{compute_cfg} and \texttt{optimize_cfg}.
\texttt{compute_live_variables} creates the appropriate structure for the analysis and runs it. It returns the whole analysis structure.
\texttt{compute_cfg} returns the CFG from the analysis data structure.
\texttt{optimize_cfg} applies liveness analysis to reduce the number of registers used; returns the analysis structure (not the RISC CFG).
In the files \href{../lib/miniImp/reduceRegisters.ml}{reduceRegisters.ml} and \href{../lib/miniImp/reduceRegisters.mli}{reduceRegisters.mli} the function \texttt{reduce_registers} reduces the number of used registers by counting the syntactic occurrence of each variable and partitioning the set keeping the most used as registers. All registers are either renamed or put into memory. It is allowed for the input or output registers to be put in memory, in the latter case some code is added at the end of the program to retrieve the value and put into a register (in particular register \texttt{2}).
The file \href{../bin/miniImpInterpreterReg.ml}{miniImpInterpreterReg.ml} compiles from MiniImp to MiniRISC or execute the MiniRISC code. It uses the package \href{https://opam.ocaml.org/packages/clap/}{Clap} to parse command line arguments and generate help pages.
The input to the program can be supplied both in stdin or as a command parameter after \texttt{-v}. The flags for disabling the check for undefined variables or liveness analysis optimization are \texttt{-u} and \texttt{-l} respectively.
\end{subsection}
\end{section}
\begin{section}{Running the code}
The project uses the following packages: \href{https://dune.build/}{Dune}, \href{https://gallium.inria.fr/~fpottier/menhir/}{Menhir} and \href{https://github.com/rbardou/clap}{Clap}. They can be installed via \href{https://opam.ocaml.org/}{Opam} with the command \texttt{opam install dune menhir clap}.
To compile the project simply run \texttt{dune build}. To run the test run \texttt{dune runtest}.
To see a list of all options run \texttt{dune exec <interpreter> {-}{-} -h}. A binary version of the executables can also be found in the build directory: \href{./_build/default/bin/}{./\_build/default/bin/}.