Transferred AQuery grammar from private repo

2 years ago · b6022a7e0f
parent 1097344647
commit b6022a7e0f
2 changed files with 152 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -125,18 +125,117 @@ See files in ./tests/ for more examples.
 - See `test.aquery` as an example
 # User Manual
 AQuery++ has similar syntax to standard SQL with extensions for time-series analysis and user extensibility.
 ## Basic Grammar
 ```
 program : [query | create | insert | load | udf ]*
 /********* Queries *********/
 query : [WITH ID ['('columns')'] AS '(' single-query ')'] single-query
 single-query : SELECT projections FROM datasource assumption where-clause groupby-clause
 projections: [val as ID | val] (, [val as ID | val])*
 datasource : ID [ID | AS ID] |
  ID, datasource |
  ID [INNER] JOIN datasource [USING columns | ON conditions] |
  ID NATURAL JOIN datasource
 order-clause: ASSUMING ([ASC|DESC] ID)+
 where-clause: WHERE conditions;
 groupby-clause: GROUP BY expr (, expr )* [HAVING conditions]
 conditions: <a boolean expression>
 /********* Creating data *********/
 create: CREATE TABLE ID [AS query | '(' schema ')']
 schema: ID type (, ID type)*
 insert: INSERT INTO ID [query | VALUES '(' literals ')']
 literals: literal (, literal)*;
 /********* Loading/Saving data *********/
 load: LOAD DATA INFILE string INTO TABLE ID FIELDS TERMINATED BY string
 save: query INTO OUTFILE string FIELDS TERMINATED BY string
 /********* User defined functions *********/
 udf: FUNCTION ID '(' arg-list ')' '{' fun-body '}'
 arg_list: ID (, ID)*
 fun_body: [stmts] expr
 /********* See more udf grammar later. **********/
 stmts: stmt+ 
 stmt: assignment; | if-stmt | for-stmt | ;
 assignment: l_value := expr
 l_value: ID | ID '[' ID ']'
 if-stmt: if '(' expr ')' if-body [else (stmt|block) ]
 if-body: stmt | block (elif '(' expr ')' if-body)*
 for-stmt: for '(' assignment (, assignment)* ';' expr ';' assignment ')' for-body
 for-body: stmt|block
 block:  '{' [stmts] '}'
 /********* Expressions *********/
 expr: expr binop expr | fun_call | unaryop expr | ID | literal
 fun: ID | sqrt | avg[s] | count | deltas | distinct 
  | first | last | max[s] | min[s] | next
  | prev | sum[s] | ratios | <... To be added> 
 fun_call: fun '(' expr (, expr)* ')'
 binop: +|-|=|*|+=|-=|*=|/=|!=|<|>|>=|<=| and | or
 unaryop: +|-| not
 literal:  numbers | strings | booleans
 ```
 ## Data Types
 - String Types: `STRING` and `TEXT` are variable-length strings with unlimited length. `VARCHAR(n)` is for strings with upper-bound limits.
 - Integer Types: `INT` and `INTEGER` are 32-bit integers, `SMALLINT` is for 16-bit integers, `TINYINT` is for 8-bit integers and `BIGINT` is 64-bit integers. On Linux and macOS, `HGEINT` is 128-bit integers. 
 - Floating-Point Types: `REAL` denotes 32-bit floating point numbers while `DOUBLE` denotes 64-bit floating point numbers. 
 - Temporal Types: `DATE` only supports the format of `yyyy-mm-dd`, and `TIME` uses 24-hour format and has the form of `hh:mm:ss:ms` the milliseconds part can range from 0 to 999, `TIMESTAMP` has the format of `yyyy-mm-dd hh:mm:ss:ms`. When importing data from CSV files, please make sure the spreadsheet software (if they were used) doesn't change the format of the date and timestamp by double-checking the file with a plain-text editor.
- Boolean Type: `BOOLEAN` is a boolean type with values `TRUE` and `FALSE`.
+- Boolean Type: `BOOLEAN` or `BOOL` is a boolean type with values `TRUE` and `FALSE`.
 ## Create Table
 Tables can be created using `CREATE TABLE` statement. For example
 ```
 CREATE TABLE my_table (c1 INT, c2 INT, c3 STRING)
 INSERT INTO my_table VALUES(10, 20, "example")
 INSERT INTO my_table SELECT * FROM my_table
 ```
 You can also create tables using a query. For example:
 ```
 CREATE TABLE my_table_derived
 AS
  SELECT c1, c2 * 2 as twice_c2 FROM my_table
 ```
 ## Drop Table:
 Tables can be dropped using `DROP TABLE` statement. For example:
 ```
 DROP TABLE my_table IF EXISTS
 ```
 ## Load Data:
 - Use query like `LOAD DATA INFILE <filename> INTO <table_name> [OPTIONS <options>]`
 - File name is the relative path to the AQuery root directory (where prompy.py resides)
 - File name can also be absolute path.
 - See `data/q1.sql` for more information 
 ## Delete Data:
 - Use a query like `DELETE FROM <table_name> [WHERE <conditions>]` to delete rows from a table that matches the conditions.
 ## Built-in functions: 
 - `avg[s]`: average of a column. `avgs(col), avgs(w, col)` is rolling and moving average with window `w` of the column `col`.
 - `var[s]`, `stddev[s]`: [moving/rolling] **population** variance, standard deviation.
 - `sum[s]`, `max[s]`, `min[s]`: similar to `avg[s]`
 - `ratios(w = 1, col)`: moving ratio of a column, e.g. `ratios(w, col)[i]=col[i-w]/col[i]`. Window `w` has default value of 1.  
 - `next(col), prev(col)`: moving column back and forth by 1, e.g. `next(col)[i] = col[i+1]`.
 - `first(col), last(col)`: first and last value of a column, i.e. `first(col)= col[0]`, `last(col) = col[n-1]`.
 - `sqrt(x), trunc(x), and other builtin math functions`: value-wise math operations. `sqrt(x)[i] = sqrt(x[i])`
 - `pack(cols, ...)`: pack multiple columns with exact same type into a single column. 
 # Architecture 
 ![Architecture](./docs/arch-hybrid.svg)
@ -147,8 +246,7 @@ See files in ./tests/ for more examples.
 ## Execution Engines
 - AQuery++ supports different execution engines thanks to the decoupled compiler structure.
 - Hybrid Execution Engine: decouples the query into two parts. The sql-compliant part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
- AQuery Execution Engine: executes queries by compiling the query plan to C++ code. Doesn't support joins and udf functions. 
+- AQuery Library: A set of header based libraries that provide column arithmetic and operations inspired by array programming languages like kdb. This library is used by C++ post-processor code which can significantly reduce the complexity of generated code, reducing compile time while maintaining the best performance. The set of libraries can also be used by UDFs as well as User modules which makes it easier for users to write simple but powerful extensions. 
 - K9 Execution Engine: (discontinued).
 # Roadmap
 - [x] SQL Parser -> AQuery Parser (Front End)
@ -156,14 +254,21 @@ See files in ./tests/ for more examples.
   -  [x] Schema and Data Model 
   -  [x] Data acquisition/output from/to csv file
 - [ ] Execution Engine
-   -  [x] Projections and single-group Aggregations 
+  - [x] Single Query
-   -  [x] Group by Aggregations
+     -  [x] Projections and single-group Aggregations 
-   -  [x] Filters
+     -  [x] Group by Aggregations
-   -  [x] Order by
+     -  [x] Filters
-   -  [x] Assumption
+     -  [x] Order by
-   -  [x] Flatten
+     -  [x] Assumption
-   -  [x] Join (Hybrid Engine only)
+     -  [x] Flatten
-   -  [ ] Subqueries 
+     -  [x] Join (Hybrid Engine only)
  - [ ] Subquery
     -  [ ] With Clause
     -  [ ] From subquery
     -  [ ] Select sunquery
     -  [ ] Where subquery
     -  [ ] Subquery in group by
     -  [ ] Subquery in order by
 - [x] Query Optimization
  - [x] Selection/Order by push-down
  - [x] Join Optimization (Only in Hybrid Engine)
@ -185,3 +290,24 @@ See files in ./tests/ for more examples.
 - [ ] Bug: Order By after Group By
 - [ ] Functionality: Having clause, With clause
 - [ ] Decouple expr.py
 # Credit:
 - [mo-sql-parsing](https://github.com/klahnakoski/mo-sql-parsing) <br>
  Author: Kyle Lahnakoski <br>
  License (Mozilla Public License 2.0): https://github.com/klahnakoski/mo-sql-parsing/blob/dev/LICENSE 
 - [Fast C++ CSV Parser](https://github.com/ben-strasser/fast-cpp-csv-parser) <br>
  Author: Ben Strasser <br>
  License (BSD 3-Clause License): https://github.com/ben-strasser/fast-cpp-csv-parser/blob/master/LICENSE
 - [Dragonbox](https://github.com/jk-jeon/dragonbox)<br>
  Author: Junekey Jeon
  License (Boost, Apache2-LLVM): <br>https://github.com/jk-jeon/dragonbox/blob/master/LICENSE-Boost <br>
  https://github.com/jk-jeon/dragonbox/blob/master/LICENSE-Apache2-LLVM
 - [itoa](https://github.com/jeaiii/itoa) <br>
  Author: James Edward Anhalt III <br>
  License (MIT): https://github.com/jeaiii/itoa/blob/main/LICENSE
 - [MonetDB](https://www.monetdb.org) <br>
  License (Mozilla Public License): https://github.com/MonetDB/MonetDB/blob/master/license.txt
--- a/reconstruct/ast.py
+++ b/reconstruct/ast.py
@ -855,7 +855,8 @@ class filter(ast_node):
    def produce(self, node):
        filter_expr = expr(self, node)
        self.add(filter_expr.sql)
-        self.datasource.join_conditions += filter_expr.join_conditions
+        if self.datasource is not None:
            self.datasource.join_conditions += filter_expr.join_conditions
 class create_table(ast_node):
    name = 'create_table'
@ -945,7 +946,19 @@ class insert(ast_node):
                pass
        self.sql += ', '.join(list_values) + ')'
-  
+
 class delete_table(ast_node):
    name = 'delete'
    first_order = name
    def init(self, node):
        super().init(node)
    def produce(self, node):
        tbl = node['delete']
        self.sql = f'DELETE FROM {tbl} '
        if 'where' in node:
            self.sql += filter(self, node['where']).sql
 class load(ast_node):
    name="load"
    first_order = name