First draft report, removing exercises

This commit is contained in:
elvis
2025-01-15 00:09:58 +01:00
parent 9b48cc005f
commit 11adaa5103
11 changed files with 624 additions and 294 deletions

425
report/report.tex Normal file
View File

@ -0,0 +1,425 @@
\begin{section}{Semantics}
\begin{subsection}{MiniImp}
The semantic of the MiniImp language is implemented in the \href{../lib/miniImp/Semantics.mli}{Semantics.mli} and \href{../lib/miniImp/Semantics.ml}{Semantics.ml} file.
A \texttt{reduce} function is provided that transforms an AST into the evaluated value or an error.
The AST type is defined in \href{../lib/miniImp/Types.mli}{Types.mli} and in \href{../lib/miniImp/Types.ml}{Types.ml}.
A program \texttt{p} is defined as follows:
\begin{grammar}
<p> \(\defeq\) `def main with input' <x> `output' <y> as <c>
<c> \(\defeq\) skip
\alt{} <x> `:=' <a>
\alt{} <c> `;' <c>
\alt{} `if' <b> `then' <c> `else' <c>
\alt{} `while' <b> `do' <c>
\alt{} `for' `(' <c> `,' <b> `,' <c> `)' `do' <c>
<b> \(\defeq\) <v> | <b> `\&\&' <b> | <b> `||' <b> | `not' <b>
\alt{} <a> `<' <a> | <a> `<=' <a> | <a> `>' <a> | <a> `>=' <a>
\alt{} <a> `==' <a>
<a> \(\defeq\) <x> | <n> | <a> `+' <a> | <a> `-' <a> | <a> `*' <a> | <a> `/' a
\alt{} <a> `\%' <a> | <a> `^' <a> | `powmod' `(' <a> `,' <a> `,' <a> `)' | `rand' `(' <a> `)'
\end{grammar}
Where \texttt{\%} is the modulo operator and \texttt{a a \% a} is the powermod operator;
the variables are all integers, \texttt{n} is an integer and \texttt{v} is a boolean litteral.
The additional arithmetic expressions' semantics are implemented in a similar manner as with the other.
The sematic of \texttt{for} is as follows:
\begin{center}
\inference[\texttt{for}]
{\langle\sigma, c_1\rangle \to_c \sigma_1 & \langle\sigma_1, \texttt{while } b \texttt{ do } c_3 \texttt{; } c_2 \rangle \to_c \sigma_2} % chktex 1
{\langle\sigma, \texttt{for} \texttt{(} c_1 \texttt{, } b \texttt{, } c_2 \texttt{)} \texttt{ do } c_3 \rangle \to_c \sigma_2} % chktex 1 chktex 9
\end{center}
but the implementation exploits the structure and doesn't simply rewrite the for loop as a while loop.
\end{subsection}
\begin{subsection}{MiniFun Semantics}
The semantic of the MiniFun language is implemented in the \href{../lib/miniFun/Semantics.mli}{Semantics.mli} and \href{../lib/miniFun/Semantics.ml}{Semantics.ml} file.
A \texttt{reduce} function is provided that transforms the AST into the avluated value or an error.
The AST type is defined in \href{../lib/miniFun/Types.mli}{Types.mli} and in \href{../lib/miniFun/Types.ml}{Types.ml}.
A program \texttt{t} is defined as follows:
\begin{grammar}
<t> \(\defeq\) <n> | <v> | <x> | `(' <t> `,' <t> `)'
\alt{} `fun' <x> `:' <type> `=>' <t> | <t> <t> | <op\textsubscript{1}> <t> | <t> <op\textsubscript{2}> <t> % chktex 38
\alt{} `powmod' `(' <t> `,' <t> `,' <t> `)'
\alt{} `rand' `(' <t> `)' |
\alt{} `if' <t> `then' <t> `else' <t>
\alt{} `let' <x> `=' <t> `in' <t>
\alt{} `let' `rec' <x> <y> `: ' <type> `=' <t> `in' <t>
<op\textsubscript{1}> \(\defeq\) `not' | `fst' | `scn'
<op\textsubscript{2}> \(\defeq\) `+' | `-' | `*' | `/' | `\%' | `^' | `\&\&' | `||' | `=='
\alt{} `<' | `<=' | `>' | `>='
\end{grammar}
As reflected in the grammar, tuples have been implemented and the unary functions fst and scn return respectively the first element of the tuple and the second.
\end{subsection}
\end{section}
\begin{section}{Types for MiniFun}
A type \(\tau\) is defined as either {\it int}, {\it bool}, a touple or a function.
\begin{equation*}
\tau \defeq {\it int\/}\ \vert\ {\it bool\/}\ \vert\ (\tau,\tau)\ \vert\ \tau \to \tau
\end{equation*}
The deduction rules regarding tuples are similar to those for functions:
\begin{center}
\inference[\texttt{Tuple}]
{\Gamma \vdash t_1 \triangleright \tau_1 & \Gamma \vdash t_2 \triangleright \tau_2} % chktex 1
{\Gamma \vdash (t_1, t_2) \triangleright \tau_1 * \tau_2} % chktex 1
\end{center}
\begin{center}
\inference[\texttt{Fst}]
{\Gamma \vdash t_1 \triangleright \tau_1 } % chktex 1
{\Gamma \vdash \texttt{fst} (t_1, t_2) \triangleright \tau_1} % chktex 1
\end{center}
\begin{center}
\inference[\texttt{Snd}]
{\Gamma \vdash t_2 \triangleright \tau_2 } % chktex 1
{\Gamma \vdash \texttt{snd} (t_1, t_2) \triangleright \tau_2} % chktex 1
\end{center}
The rules for function declaration with type annotations are thus:
\begin{center}
\inference[\texttt{Fun}]
{\Gamma[x \mapsto \tau] \vdash t \triangleright \tau'} % chktex 1
{\Gamma \vdash \texttt{fun} x \texttt{:} \tau \to \tau' \texttt{=>} t \triangleright \tau \to \tau'} % chktex 1
\end{center}
\begin{center}
\inference[\texttt{FunRec}]
{\Gamma[f \mapsto \tau \to \tau'; x \mapsto \tau] \vdash t_1 \triangleright \tau' & \Gamma[f \mapsto \tau \to \tau'] \vdash t_2 \triangleright \tau''} % chktex 1
{\Gamma \vdash \texttt{let rec} f x \texttt{:} \tau \to \tau' \texttt{ = } t_1 \texttt{ in } t_2 \triangleright \tau''} % chktex 1
\end{center}
In the files \href{../lib/miniFun/TypeChecker.mli}{TypeChecker.mli} and \href{../lib/miniFun/TypeChecker.ml}{TypeChecker.ml} there is the implementation of the deduction rules, but returns either the valid type of the expression or an error instead of simply the required option type of the valid type.
\end{section}
\begin{section}{Parsing}
\begin{subsection}{MiniImp}
Operators listed in order of precedence from highest to lowest:
\begin{center}
\begin{tblr}{colspec={|c|c|}, rowspec={|Q|QQQQQQQ|}}
Operator & Associativity \\
while & left \\
\^{} & right \\
* / mod & left \\
not & {-} \\
+ {-} \(\vert\vert\) \&\& & left \\
if & left \\
{;} & left \\
\end{tblr}
\end{center}
The expressions \(c_1 \texttt{;} c_2\) and \(c_1 \texttt{;}\) are both recognized and give respectively \(\texttt{SEQUENCE(} c_1 \texttt{,} c_2 \texttt{)}\) % chktex 9
and \(c_1\), such that semicolons can be placed always at the end of a command.
Integers with a preceding minus sign can be interpreted as the opposite integer, with obviously lower precedence than the binary operator minus.
\end{subsection}
\begin{subsection}{MiniFun}
A decision was made to interpret \texttt{\textbackslash}, \texttt{lambda} and \texttt{fun} all as the start of the definition of a function just for ease of typing. They are associated to the same token \texttt{LAMBDA}.
Operators listed in order of precedence from highest to lowest:
\begin{center}
\begin{tblr}{colspec={|c|c|}, rowspec={|Q|QQQQQQQQQQQQ|}}
Operator & Associativity \\
function application & right \\
let & left \\
fun & left \\
fst snd & left \\
not rand & {-} \\
\^{} & right \\
* / mod & left \\
+ {--} & left \\
== {\(<\)} {\(\leq\)} {\(>\)} {\(\geq\)} & left \\
\(\vert\vert\) \&\& & left \\
powmod & left \\
\(\lambda\) if let letrec & left \\
\end{tblr}
\end{center}
Tuples require parentesis in their definition, but the tuple type does not since there is no ambiguity. The symbol \texttt{->} that defines the function type is right associative and has lowest precedence.
\end{subsection}
\begin{subsection}{Interpreters}
Both MiniImp and MiniFun have each an interpreter (\href{../bin/miniFunInterpreter.ml}{miniFunInterpreter.ml} and \href{../bin/miniFunInterpreter.ml}{miniFunInterpreter.ml}) that uses the package \href{https://opam.ocaml.org/packages/clap/}{Clap} to parse command line arguments and generate help pages.
The input to the program can be supplied both in stdin or as a command parameter after \texttt{-v}. The MiniFun interpreter also check the types before computing the output of the program and returns an error in case the types mismatch.
\end{subsection}
\end{section}
\begin{section}{Control Flow Graph}
The control flow graph data structure is implemented in the analysis library in the files \href{../lib/analysis/Cfg.ml}{Cfg.ml} and \href{../lib/analysis/Cfg.mli}{Cfg.mli}.
Each node contains only an id to distinguish from others.
The control flow structure is composed of a flag to know if it is empty or contains nodes and the set of all contained nodes.
Since each node can only have at maximum 2 nodes as next nodes, the data structure contains a map from each node to a tuple of the two nodes or to a node.
The structure also contains the back edges of each node implemented as a map from each node to a list of nodes, the input value, the variables that are the input and ouput, the initial node and the terminal node.
Finally there is a map from each node to a list of generic elements that in our case are simple statements.
\begin{subsection}{MiniImp Simple Statement}
MiniImp Simple Statements \(t\) is defined as follows:
\begin{grammar}
<t> \(\defeq\) skip | <x> `:=' <a> | <b> `{?}'
<b> \(\defeq\) <v> | <b> `\&\&' <b> | <b> `||' <b> | `not' <b>
\alt{} <a> `==' <a> | <a> `<' <a> | <a> `<=' <a> | <a> `>' <a> | <a> `>=' <a>
<a> \(\defeq\) <n> | <x> | <a> `+' <a> | <a> `-' <a> | <a> `*' <a> | <a> `/' <a>
\alt{} <a> `mod' <a> | <a> `^' <a> | `rand' <a>
\end{grammar}
The implemented cfg is neither minimal nor maximal, but can be either or both for some programs. In particular each node as associated a list of statements and sequence of statements in the AST is put, if possible, in the same node.
\texttt{?} is only allowed as the last element of the list of statemets associated with a node and a node has associated a \texttt{?} if and only if they have two next nodes.
The for loop is translated as:
\begin{center}
\begin{tikzpicture}[
node/.style = {draw,rounded corners,blur shadow, fill=white,align=center},
box/.style={rectangle,draw,inner sep=10pt}
]
\node[node] at (0, 1.3) (b11) {$i_1^1$};
\node[node] at (0, 0) (b12) {$f_1^1$};
\node[box,fit = (b11) (b12)] (externalbox1) {};
\node[node, opacity=0] at (3, 1.3) (bb1) {$i_1^b$};
\node[node, opacity=0] at (3, 0) (bb2) {$f_1^b$};
\node[node] at (3, 0.65) (bball) {$i^b$};
\node[box,fit = (bb1) (bb2)] (externalboxb) {};
\node[node] at (6, 1.3) (b21) {$i_1^2$};
\node[node] at (6, 0) (b22) {$f_1^2$};
\node[box,fit = (b21) (b22)] (externalbox2) {};
\node[node] at (9, 1.3) (b31) {$i_1^3$};
\node[node] at (9, 0) (b32) {$f_1^3$};
\node[box,fit = (b31) (b32)] (externalbox3) {};
\node[fit = (externalbox1) (externalbox2) (externalbox3) (externalboxb)] (boxall) {};
\begin{scope}[rounded corners,-latex]
\path (externalbox1) edge (b11.north);
\path[dashed] (b11) edge (b12);
\path (b12) edge (externalbox1);
\path (externalboxb) edge (bball.north);
\path (bball) edge (externalboxb.south);
\path (externalbox2) edge (b21.north);
\path[dashed] (b21) edge (b22);
\path (b22) edge (externalbox2);
\path (externalbox3) edge (b31.north);
\path[dashed] (b31) edge (b32);
\path (b32) edge (externalbox3);
\end{scope}
\node[above] at (boxall.north) {\texttt{for (}\(c_1\)\texttt{,} \(b\)\texttt{,} \(c_2\)\texttt{) do} \(c_3\)};
\node[left] at (externalbox1.west) {\(c_1\):};
\node[left] at (externalboxb.west) {\(b\):};
\node[left] at (externalbox2.west) {\(c_2\):};
\node[left] at (externalbox3.west) {\(c_3\):};
\node[node] at (4.5, -2.7) (b11) {$i_1^1$};
\node[node] at (4.5, -4) (b12) {$f_1^1$};
\node[node] at (4.5, -5.3) (guard) {$i^b$};
\node[node] at (4.5, -6.6) (b31) {$i_1^3$};
\node[node] at (4.5, -7.9) (b32) {$f_1^3$};
\node[node] at (4.5, -9.2) (b21) {$i_1^2$};
\node[node] at (4.5, -10.5) (b22) {$f_1^2$};
\node[node] at (4.5, -11.8) (exitnode) {\texttt{skip}};
\node[box, fit = (b11) (b12) (guard) (b31) (b32) (b21) (b22) (exitnode),
inner sep=15pt] (externalboxall) {};
\begin{scope}[rounded corners,-latex]
\path[dashed] (b11) edge (b12);
\path[dashed] (b21) edge (b22);
\path[dashed] (b31) edge (b32);
\path (b12) edge (guard);
\path (guard) edge (b31);
\path (b32) edge (b21);
\path (b22.135) edge[bend left=20] (guard.220);
\path (guard.-40) edge[bend left=15] (exitnode.35);
\path (externalboxall.north) edge (b11.north);
\path (exitnode.south) edge (externalboxall.south);
\end{scope}
\node[above] at (externalboxall.north) {becomes:};
\end{tikzpicture}
\end{center}
We highlight the fact that the operation powermod is absent in the grammar of simple statements. In fact all powermod are replaced in the AST before translating into CFG with the function \texttt{rewrite_instructions} in \href{../lib/miniImp/replacePowerMod.ml}{replacePowerMod.ml} and \href{../lib/miniImp/replacePowerMod.mli}{replacePowerMod.mli}.
\texttt{powmod(}\(a_1\)\texttt{, }\(a_2\)\texttt{, }\(a_3\)\texttt{)} % chktex 9 chktex 36
is translated into:
\begin{lstlisting}
pow := £\(a_1\)£;
exp := £\(a_2\)£;
mod := £\(a_3\)£;
res := 1;
if exp < 0 then
exp := 0 - exp;
else
skip;
while exp > 0 do (
if 1 = exp % 2 then
res := (res * pow) % mod;
else
skip;
pow := (pow * pow) % mod;
exp := exp / 2;
)
\end{lstlisting}
The variables \texttt{pow}, \texttt{exp}, \texttt{mod} and \texttt{res} are all fresh and the value of res is then substituted into powmod place. This might need some more scope than only the expression since \texttt{powmod} may be included in a \texttt{if} guard, thus it is placed before the \texttt{if}; in case it is in the guard of a \texttt{while} or a \texttt{for} loop it is also updated at the end of the body.
The reason for substituting \texttt{powmod} in the AST is that we would need to add nodes to form the \texttt{if} and \texttt{while} and it would prove more difficult.
\end{subsection}
\end{section}
\begin{section}{Intermediate Code Generation}
\begin{subsection}{MiniRISC CFG}
In the files \href{../lib/miniImp/CfgRISC.ml}{CfgRISC.ml} and \href{../lib/miniImp/CfgRISC.mli}{CfgRISC.mli} the CFG generated from the AST gets translated into intermediate code with the following MiniRISC simple statements:
\begin{grammar}\label{grammar:MiniRISC}
<t> \(\defeq\) Nop
\alt{} BRegOp <brop> <r> <r> \(\Rightarrow\) <r>
\alt{} BImmOp <biop> <r> <n> \(\Rightarrow\) <r>
\alt{} URegOp <urop> <r> \(\Rightarrow\) <r>
\alt{} Load <r> \(\Rightarrow\) <r>
\alt{} LoadI <n> \(\Rightarrow\) <r>
\alt{} Store <r> \(\Rightarrow\) <r>
<brop> \(\defeq\) Add | Sub | Mult | Div | Mod | Pow | And | Or
\alt{} Eq | Less | LessEq | More | MoreEq
<biop> \(\defeq\) AddI | SubI | MultI | DivI | ModI | PowI | AndI | OrI
\alt{} EqI | LessI | LessEqI | MoreI | MoreEqI
<urop> \(\defeq\) Not | Copy | Rand
\end{grammar}
Since we stride towards shorter code and less instructions, we would prefer to use the \texttt{biop} version of each operation whenever possible. So for some operations that are commutative if the first term is the immediate value we swap the terms and use the \texttt{biop} variant instead of loading the value into a register and using the register for the calculation. Also some operations like \texttt{>} and \texttt{<} are opposite, so to invert the order we need to use the other \texttt{biop} version.
The input variable and the output variable are also mapped to \texttt{in} and \texttt{out} registers, while all other variables are given fresh registers.
\end{subsection}
\begin{subsection}{MiniRISC}
The MiniRISC CFG is finally tranlated into MiniRISC intermediate code by the function \texttt{convert} in the files \href{../lib/miniImp/RISC.ml}{RISC.ml} and \href{../lib/miniImp/RISC.mli}{RISC.mli}.
The grammar of MiniRISC is analogous to the one for \hyperref[grammar:MiniRISC]{MiniRISC Simple Statements}:
\begin{grammar}
<t> \(\defeq\) Nop
\alt{} BRegOp <brop> <r> <r> \(\Rightarrow\) <r>
\alt{} BImmOp <biop> <r> <n> \(\Rightarrow\) <r>
\alt{} URegOp <urop> <r> \(\Rightarrow\) <r>
\alt{} Load <r> \(\Rightarrow\) <r>
\alt{} LoadI <n> \(\Rightarrow\) <r>
\alt{} Store <r> \(\Rightarrow\) <r>
\alt{} Jump <l>
\alt{} CJump <r> <l> <l>
\alt{} Label <l>
<brop> \(\defeq\) Add | Sub | Mult | Div | Mod | Pow | And | Or
\alt{} Eq | Less | LessEq | More | MoreEq
<biop> \(\defeq\) AddI | SubI | MultI | DivI | ModI | PowI | AndI | OrI
\alt{} EqI | LessI | LessEqI | MoreI | MoreEqI
<urop> \(\defeq\) Not | Copy | Rand
\end{grammar}
where \texttt{l} is a string that uniquely identifies a label.
\end{subsection}
\begin{subsection}{RISC Semantics}
It is also implemented in the files \href{../lib/miniImp/RISCSemantics.ml}{RISCSemantics.ml} and \href{../lib/miniImp/RISCSemantics.mli}{RISCSemantics.mli} a reduce function, that evaluates MiniRISC code.
The labels are used as is and not replaced by offsets, so the code is translated into a map from labels to code blocks for ease of computation.
\end{subsection}
\end{section}
\begin{section}{Dataflow Analysis}
A refined CFG structure used for analysis is defined in \href{../lib/analysis/Dataflow.ml}{Dataflow.ml} and \href{../lib/analysis/Dataflow.mli}{Dataflow.mli}. The CFG is supplemented with a map from each node to the support structure that stores the list of defined variables or live variables. Since the CFG is not minimal, there is also a list for each simple statement. A fixed point function then applies the input fuction until the map does not change. Simple structural equality is not appropriate since order in the lists should not matter; an internal function for equality is used.
\begin{subsection}{Defined Variables}
In the files \href{../lib/miniImp/definedVariables.ml}{definedVariables.ml} and \href{../lib/miniImp/definedVariables.mli}{definedVariables.mli} three functions are defined: \texttt{compute_defined_variables}, \texttt{compute_cfg} and \texttt{check_undefined_variables}.
\texttt{compute_defined_variables} creates the appropriate structure for the analysis and runs it. It returns the whole analysis structure.
\texttt{compute_cfg} returns the CFG from the analysis data structure; in the case of defined variables analysis the CFG returned is the same as the one in input of \texttt{compute_defined_variables}.
\texttt{check_undefined_variables} returns all variables that might be undefined at time of use.
Since the greatest fixed point is computed, first all variables are retrived from all code, then assigned to each input and ouput list of variables for each line of code.
Since it is an approximation some behaviour might not be intuitive. For example:
\begin{lstlisting}
for (x := 0, x < 10, x := x + 1) do (
y := rand(x);
);
output := y;
\end{lstlisting}
will return the register associated with \texttt{y} as undefined since the guard of the for cycle might never be true.
\end{subsection}
\begin{subsection}{Live Variables}
In the files \href{../lib/miniImp/liveVariables.ml}{liveVariables.ml} and \href{../lib/miniImp/liveVariables.mli}{liveVariables.mli} three functions are defined: \texttt{compute_live_variables}, \texttt{compute_cfg} and \texttt{optimize_cfg}.
\texttt{compute_live_variables} creates the appropriate structure for the analysis and runs it. It returns the whole analysis structure.
\texttt{compute_cfg} returns the CFG from the analysis data structure.
\texttt{optimize_cfg} applies liveness analysis to reduce the number of registers used; returns the analysis structure (not the RISC CFG).
\end{subsection}
\end{section}
\begin{section}{Target Code Generation}
In the files \href{../lib/miniImp/reduceRegisters.ml}{reduceRegisters.ml} and \href{../lib/miniImp/reduceRegisters.mli}{reduceRegisters.mli} the function \texttt{reduceregisters} reduces the number of used registers by counting the syntactic occurrence of each variable and partitioning the set keeping the most used as registers. All registers are either renamed or put into memory. It is allowed for the input or output registers to be put in memory, in the latter case some code is added at the end of the program to retrive the value and put into a register (register \texttt{2}).
\begin{subsection}{MiniImp to MiniRISC compiler}
The file \href{../bin/miniImpInterpreterReg.ml}{miniImpInterpreterReg.ml} compiles from MiniImp to MiniRISC or execute the MiniRISC code. It uses the package \href{https://opam.ocaml.org/packages/clap/}{Clap} to parse command line arguments and generate help pages.
The input to the program can be supplied both in stdin or as a command parameter after \texttt{-v}. The flags for disabling the check for undefined variables or liveness analysis optimization are \texttt{-u} and \texttt{-l} respectively.
\end{subsection}
\end{section}
\begin{section}{Running the code}
The project uses the following packages: \href{https://dune.build/}{Dune}, \href{https://gallium.inria.fr/~fpottier/menhir/}{Menhir} and \href{https://github.com/rbardou/clap}{Clap}. They can be installed via \href{https://opam.ocaml.org/}{Opam} with the command \texttt{opam install dune menhir clap}.
To compile the project simply run \texttt{dune build}. To run the test run \texttt{dune runtest}.
In order to execute one of the interpreters run \texttt{dune exec <interpreter> {-}{-} <flags and options>}. To see a list of all options run \texttt{dune exec <interpreter> {-}{-} -h}. A binary version of the executables can also be found in \href{./_build/default/bin/}{./_build/default/bin/}.
\end{section}
%%% Local Variables:
%%% TeX-command-extra-options: "-shell-escape"
%%% TeX-master: "document.tex"
%%% End: