You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ChocoPy/WORKLOG.md

4.8 KiB

Compiler Construction PA2 Worklog

Team:

Apoorva Ranade(ar6496)   Sanjar Ahmadov(sa5640)   Yinqi Sun(ys3540)

Passes:

We generally used 2 passes to populate the symbol table and do the error checkings:

  • Initialization: Every symbol table (incl. the global one and underlying class/function/temporary scope) is created with the built-in types int str None list Empty object and built-in functions print len input.
  • The First Pass creates the global symbol tables by traversing the syntax tree's declaration part (non-recursively, will explain later.). It adds all the global symbols (incl. classes, global functions, global variables) to the global symbol table. Because we don't need type checking to populate the global symbol table, this pass is completely done in the DeclarationAnalyzer class. The driver for this pass is DeclarationAnalyzer.analyze(Program).
  • The Second Pass does type checking for each statement and definition. It also recursively 'expends' every class and function definition and creates sub-scopes for them. When expending, we first need to process the underlying declarations and add them to the sub-SymbolTable of the corresponding scope. The statements inside these classes/functions or blocks are checked with their corresponding sub-SymbolTable. More specifically, each function, class, variable declaration are visited twice, the first time to create (sub-)SymbolTables and the second time to determine types. We reused the code in DeclarationAnalyzer declarations in such sub-scopes. To do this, in the TypeChecker class, we create an object of DeclarationAnalyzer and dispatch the nodes we need to analyze and update the current SymbolTable from to declaration analyzer. After analyzing a sub-scope, every function and class declaration is visited the second time to dispatch the underlying statements. This process is done recursively until we reached the deepest structure(function/class). Since the dispatch method is basically a function call and the traversing order tp the AST nodes follows the scoping hierarchy, we naturally make use of the stack frame to push and pop the sub-SymbolTables. Overall, each declaration is visited exactly twice, and each statement is visited once.
  • The compilation will stop when there're errors found during the first pass. We didn't use more passes because this 2-pass architecture is sufficient to complete type checking for ChocoPy and more paths will only add to the complexity of the algorithm.

Recovery:

  • Whenever an error is encountered that causes ambiguity, we chose a default action and continue the compilation process. For example, when a Type mismatch happens, the default action is that the lhs keeps its original types.
  • The compilation process stops when errors are found in constructing global symbol table. Because declaration errors adds too much ambiguities and it will make less sense to continue compiling.

Challenges:

  • Nested structures were a challenge. A function inside a function/class needs us to build correct scoping as well as dealing with dependencies.
    • This is dealt by the declaration-statement-definition recursion we described in the second pass above.
  • Error reporting in PA2 is more complex than PA1, generally because there're more types of errors can happen in semantic analysis and there needs to be default actions when each type of error happens. And because of that, the correctness of error handling is very hard to check.
    • In order for us to easily determine the correctness, we intentionally matches the error messages to the reference implementation.
    • However, there're certainly discrepancies with the reference compiler because of implementation or architectural differences. We didn't matches those differences that doesn't seems to affect the overall correctness, we'll show these differences in diff.py.
  • Assignment compatibilities.
    • We dealt with this by finding the least common ancestor of the two classes. Special cases such as empty lists are dealt with seprately. This is implemented as a static helper method in class StudentAnalysis.
  • Testing various scenarious with similarly defined variables were time consuming. Instead, we defined certain set of variables in the begging of the student contributed test programs, and then used the same variables troughout the programs to cover various bad and good scenarious.
  • Another challenge was to come up with good test cases to have a broader cover. Our approach to this issue was investigating Type Checking rules and writing adverse code to those rules to see if our analyzer can make correct inferences.

Improvements:

  • Added more tests to rigorously check program flow. And a test(diff.py) to show a case where our implementation showed better recoverability compared to the reference compiler.