fixed regression on join condition awareness

3 years ago · b666d6d9b2
parent e5ba3f63d6
commit b666d6d9b2
3 changed files with 47 additions and 18 deletions
--- a/README.md
+++ b/README.md
@ -11,8 +11,13 @@ AQuery++ Database is a cross-platform, In-Memory Column-Store Database that inco
   - Finally run the image in **interactive** mode (`docker run -it --rm aquery`)
   - If there is a need to access the system shell, type `dbg` to activate python interpreter and type `os.system('sh')` to launch a shell.

-## Native Installation:
-### Requirements
+## CIMS Computer Lab (Only for NYU affiliates who have access)
+  1. Clone this git repo in CIMS.
+  2. Download the [patch](https://drive.google.com/file/d/1YkykhM6u0acZ-btQb4EUn4jAEXPT81cN/view?usp=sharing) 
+  3. Decompress the patch to any directory and execute script inside by typing (`source ./cims.sh`). Please use the source command or `. ./cims.sh` (dot space) to execute the script because it contains configurations for environment variables.
+  4. Execute `python3 ./prompt.py`
+# Native Installation:
+## Requirements
 1. Recent version of Linux, Windows or MacOS, with recent C++ compiler that has C++17 (1z) support. (however c++20 is recommended if available for heterogeneous lookup on unordered containers)
     - GCC: 9.0 or above (g++ 7.x, 8.x fail to handle fold-expressions due to a compiler bug)
     - Clang: 5.0 or above (Recommended)
@ -25,7 +30,7 @@ AQuery++ Database is a cross-platform, In-Memory Column-Store Database that inco

 3. Python 3.6 or above and install required packages in requirements.txt by `python3 -m pip install -r requirements.txt` 

-### Installation
+## Installation
 AQuery is tested on mainstream operating systems such as Windows, macOS and Linux

 ### Windows
@ -78,9 +83,9 @@ There're multiple options to run AQuery on Windows. You can use the native toolc
   In this case, upgrade anaconda or your compiler or use the python from your OS or package manager instead. Or (**NOT recommended**) copy/link the library from your system (e.g. /usr/lib/x86_64-linux-gnu/libstdc++.so.6) to anaconda's library directory (e.g. ~/Anaconda3/lib/).


-## Usage
+# Usage
 `python3 prompt.py` will launch the interactive command prompt. The server binary will be automatically rebuilt and started.
-#### Commands:
+### Commands:
 - `<sql statement>`: parse AQuery statement
 - `f <filename>`: parse all AQuery statements in file
 - `exec`: execute last parsed statement(s) with Hybrid Execution Engine. Hybrid Execution Engine decouples the query into two parts. The standard SQL (MonetDB dialect) part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
@ -94,30 +99,30 @@ There're multiple options to run AQuery on Windows. You can use the native toolc
 - `save <OPTIONAL: filename>`: save current code snippet. will use random filename if not specified.
 - `exit`: quit the prompt
 - `r`: run the last generated code snippet
-### Example:
+## Example:
   `f moving_avg.a` <br>
   `xexec`

 See ./tests/ for more examples. 

-### Automated Testing Scripts
+## Automated Testing Scripts
 - A series of commands can be put in a script file and execute using `script` command.
 - Can be executed using `script` command
 - See `test.aquery` as an example
-## Architecture 
+# Architecture 
 ![Architecture](./docs/arch-hybrid.svg)

-### AQuery Compiler
+## AQuery Compiler
 - The query is first processed by the AQuery Compiler which is composed of a frontend that parses the query into AST and a backend that generates target code that delivers the query.
 - Front end of AQuery++ Compiler is built on top of [mo-sql-parsing](https://github.com/klahnakoski/mo-sql-parsing) with modifications to handle AQuery dialect and extension.
 - Backend of AQuery++ Compiler generates target code dependent on the Execution Engine. It can either be the C++ code for AQuery Execution Engine or sql and C++ post-processor for Hybrid Engine or k9 for the k9 Engine.
-### Execution Engines
+## Execution Engines
 - AQuery++ supports different execution engines thanks to the decoupled compiler structure.
 - AQuery Execution Engine: executes queries by compiling the query plan to C++ code. Doesn't support joins and udf functions. 
 - Hybrid Execution Engine: decouples the query into two parts. The sql-compliant part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
 - K9 Execution Engine: (discontinued).
  
-## Roadmap
+# Roadmap
 - [x] SQL Parser -> AQuery Parser (Front End)
 - [x] AQuery-C++ Compiler (Back End)
   -  [x] Schema and Data Model 
@ -140,7 +145,7 @@ See ./tests/ for more examples.
  - [x] SDK and User Module 
  - [ ] Triggers 

-## Known Issues:
+# Known Issues:

 - [ ] Interval based triggers
 - [ ] Hot reloading server binary
--- a/reconstruct/ast.py
+++ b/reconstruct/ast.py
@ -674,8 +674,19 @@ class join(ast_node):
                            tablename += f' ON {_ex}' 
                        elif keys[1].lower() == 'using':
                            if _ex.is_ColExpr:
-                                self.join_conditions += (_ex.raw_col, j.get_cols(_ex.raw_col.name))
+                                self.join_conditions.append( (_ex.raw_col, j.get_cols(_ex.raw_col.name)) )
                            tablename += f' USING {_ex}'
+                    if keys[0].lower().startswith('natural'):
+                        ltbls : List[TableInfo] = []
+                        if isinstance(self.parent, join):
+                            ltbls = self.parent.tables
+                        elif isinstance(self.parent, TableInfo):
+                            ltbls = [self.parent]
+                        for tl in ltbls:
+                            for cl in tl.columns:
+                                cr = j.get_cols(cl.name)
+                                if cr:
+                                    self.join_conditions.append( (cl, cr) )
                    self.joins.append((tablename, self.have_sep))
                    self.tables += j.tables
                    self.tables_dir = {**self.tables_dir, **j.tables_dir}
@ -686,13 +697,14 @@ class join(ast_node):
            else:
                print(f'Error: table {node} not found.')
    
-    def get_cols(self, colExpr: str) -> ColRef:
+    def get_cols(self, colExpr: str) -> Optional[ColRef]:
        for t in self.tables:
            if colExpr in t.columns_byname:
                col = t.columns_byname[colExpr]
                if type(self.rec) is set:
                    self.rec.add(col)
                return col
+        return None
            
    def parse_col_names(self, colExpr:str) -> ColRef:
        parsedColExpr = colExpr.split('.')
@ -771,7 +783,7 @@ class create_table(ast_node):
    def produce(self, node):
        ct = node[self.name]
        tbl = self.context.add_table(ct['name'], ct['columns'])
-        self.sql = f'CREATE TABLE {tbl.table_name}('
+        self.sql = f'CREATE TABLE IF NOT EXISTS {tbl.table_name}('
        columns = []
        for c in tbl.columns:
            columns.append(f'{c.name} {c.type.sqlname}')
@ -787,7 +799,9 @@ class drop(ast_node):
        node = node['drop']
        tbl_name = node['table']
        if tbl_name in self.context.tables_byname:
-            tbl_obj = self.context.tables_byname[tbl_name]
+            tbl_obj : TableInfo = self.context.tables_byname[tbl_name]
+            for a in tbl_obj.alias:
+                self.context.tables_byname.pop(a, None) 
            # TODO: delete in postproc engine
            self.context.tables_byname.pop(tbl_name)
            self.context.tables.remove(tbl_obj)
--- a/reconstruct/storage.py
+++ b/reconstruct/storage.py
@ -121,7 +121,7 @@ class Context:
    def __init__(self):
        self.tables_byname = dict()
        self.col_byname = dict()
-        self.tables = []
+        self.tables : List[TableInfo] = []
        self.cols = []
        self.datasource = None
        self.module_stubs = ''
@ -176,6 +176,15 @@ class Context:
        self.queries.insert(self.module_init_loc, 'P__builtin_init_user_module')
        return ret + '}\n'
    
+    def finalize_query(self):
+        # clear aliases
+        for t in self.tables:
+            for a in t.alias:
+                if a != t.table_name:
+                    self.tables_byname.pop(a, None)
+            t.alias.clear()
+            t.alias.add(t.table_name)
+    
    def sql_begin(self):
        self.sql = ''

@ -195,6 +204,7 @@ class Context:
        self.procs.append(self.ccode + 'return 0;\n}')
        self.ccode = ''
        self.queries.append('P' + proc_name)    
+        self.finalize_query()
        
    def finalize_udf(self):
        if self.udf is not None: