Transferred AQuery grammar from private repo

dev
Bill 2 years ago
parent 1097344647
commit b6022a7e0f

@ -125,18 +125,117 @@ See files in ./tests/ for more examples.
- See `test.aquery` as an example - See `test.aquery` as an example
# User Manual # User Manual
AQuery++ has similar syntax to standard SQL with extensions for time-series analysis and user extensibility.
## Basic Grammar
```
program : [query | create | insert | load | udf ]*
/********* Queries *********/
query : [WITH ID ['('columns')'] AS '(' single-query ')'] single-query
single-query : SELECT projections FROM datasource assumption where-clause groupby-clause
projections: [val as ID | val] (, [val as ID | val])*
datasource : ID [ID | AS ID] |
ID, datasource |
ID [INNER] JOIN datasource [USING columns | ON conditions] |
ID NATURAL JOIN datasource
order-clause: ASSUMING ([ASC|DESC] ID)+
where-clause: WHERE conditions;
groupby-clause: GROUP BY expr (, expr )* [HAVING conditions]
conditions: <a boolean expression>
/********* Creating data *********/
create: CREATE TABLE ID [AS query | '(' schema ')']
schema: ID type (, ID type)*
insert: INSERT INTO ID [query | VALUES '(' literals ')']
literals: literal (, literal)*;
/********* Loading/Saving data *********/
load: LOAD DATA INFILE string INTO TABLE ID FIELDS TERMINATED BY string
save: query INTO OUTFILE string FIELDS TERMINATED BY string
/********* User defined functions *********/
udf: FUNCTION ID '(' arg-list ')' '{' fun-body '}'
arg_list: ID (, ID)*
fun_body: [stmts] expr
/********* See more udf grammar later. **********/
stmts: stmt+
stmt: assignment; | if-stmt | for-stmt | ;
assignment: l_value := expr
l_value: ID | ID '[' ID ']'
if-stmt: if '(' expr ')' if-body [else (stmt|block) ]
if-body: stmt | block (elif '(' expr ')' if-body)*
for-stmt: for '(' assignment (, assignment)* ';' expr ';' assignment ')' for-body
for-body: stmt|block
block: '{' [stmts] '}'
/********* Expressions *********/
expr: expr binop expr | fun_call | unaryop expr | ID | literal
fun: ID | sqrt | avg[s] | count | deltas | distinct
| first | last | max[s] | min[s] | next
| prev | sum[s] | ratios | <... To be added>
fun_call: fun '(' expr (, expr)* ')'
binop: +|-|=|*|+=|-=|*=|/=|!=|<|>|>=|<=| and | or
unaryop: +|-| not
literal: numbers | strings | booleans
```
## Data Types ## Data Types
- String Types: `STRING` and `TEXT` are variable-length strings with unlimited length. `VARCHAR(n)` is for strings with upper-bound limits. - String Types: `STRING` and `TEXT` are variable-length strings with unlimited length. `VARCHAR(n)` is for strings with upper-bound limits.
- Integer Types: `INT` and `INTEGER` are 32-bit integers, `SMALLINT` is for 16-bit integers, `TINYINT` is for 8-bit integers and `BIGINT` is 64-bit integers. On Linux and macOS, `HGEINT` is 128-bit integers. - Integer Types: `INT` and `INTEGER` are 32-bit integers, `SMALLINT` is for 16-bit integers, `TINYINT` is for 8-bit integers and `BIGINT` is 64-bit integers. On Linux and macOS, `HGEINT` is 128-bit integers.
- Floating-Point Types: `REAL` denotes 32-bit floating point numbers while `DOUBLE` denotes 64-bit floating point numbers. - Floating-Point Types: `REAL` denotes 32-bit floating point numbers while `DOUBLE` denotes 64-bit floating point numbers.
- Temporal Types: `DATE` only supports the format of `yyyy-mm-dd`, and `TIME` uses 24-hour format and has the form of `hh:mm:ss:ms` the milliseconds part can range from 0 to 999, `TIMESTAMP` has the format of `yyyy-mm-dd hh:mm:ss:ms`. When importing data from CSV files, please make sure the spreadsheet software (if they were used) doesn't change the format of the date and timestamp by double-checking the file with a plain-text editor. - Temporal Types: `DATE` only supports the format of `yyyy-mm-dd`, and `TIME` uses 24-hour format and has the form of `hh:mm:ss:ms` the milliseconds part can range from 0 to 999, `TIMESTAMP` has the format of `yyyy-mm-dd hh:mm:ss:ms`. When importing data from CSV files, please make sure the spreadsheet software (if they were used) doesn't change the format of the date and timestamp by double-checking the file with a plain-text editor.
- Boolean Type: `BOOLEAN` is a boolean type with values `TRUE` and `FALSE`. - Boolean Type: `BOOLEAN` or `BOOL` is a boolean type with values `TRUE` and `FALSE`.
## Create Table
Tables can be created using `CREATE TABLE` statement. For example
```
CREATE TABLE my_table (c1 INT, c2 INT, c3 STRING)
INSERT INTO my_table VALUES(10, 20, "example")
INSERT INTO my_table SELECT * FROM my_table
```
You can also create tables using a query. For example:
```
CREATE TABLE my_table_derived
AS
SELECT c1, c2 * 2 as twice_c2 FROM my_table
```
## Drop Table:
Tables can be dropped using `DROP TABLE` statement. For example:
```
DROP TABLE my_table IF EXISTS
```
## Load Data: ## Load Data:
- Use query like `LOAD DATA INFILE <filename> INTO <table_name> [OPTIONS <options>]` - Use query like `LOAD DATA INFILE <filename> INTO <table_name> [OPTIONS <options>]`
- File name is the relative path to the AQuery root directory (where prompy.py resides) - File name is the relative path to the AQuery root directory (where prompy.py resides)
- File name can also be absolute path. - File name can also be absolute path.
- See `data/q1.sql` for more information - See `data/q1.sql` for more information
## Delete Data:
- Use a query like `DELETE FROM <table_name> [WHERE <conditions>]` to delete rows from a table that matches the conditions.
## Built-in functions:
- `avg[s]`: average of a column. `avgs(col), avgs(w, col)` is rolling and moving average with window `w` of the column `col`.
- `var[s]`, `stddev[s]`: [moving/rolling] **population** variance, standard deviation.
- `sum[s]`, `max[s]`, `min[s]`: similar to `avg[s]`
- `ratios(w = 1, col)`: moving ratio of a column, e.g. `ratios(w, col)[i]=col[i-w]/col[i]`. Window `w` has default value of 1.
- `next(col), prev(col)`: moving column back and forth by 1, e.g. `next(col)[i] = col[i+1]`.
- `first(col), last(col)`: first and last value of a column, i.e. `first(col)= col[0]`, `last(col) = col[n-1]`.
- `sqrt(x), trunc(x), and other builtin math functions`: value-wise math operations. `sqrt(x)[i] = sqrt(x[i])`
- `pack(cols, ...)`: pack multiple columns with exact same type into a single column.
# Architecture # Architecture
![Architecture](./docs/arch-hybrid.svg) ![Architecture](./docs/arch-hybrid.svg)
@ -147,8 +246,7 @@ See files in ./tests/ for more examples.
## Execution Engines ## Execution Engines
- AQuery++ supports different execution engines thanks to the decoupled compiler structure. - AQuery++ supports different execution engines thanks to the decoupled compiler structure.
- Hybrid Execution Engine: decouples the query into two parts. The sql-compliant part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed. - Hybrid Execution Engine: decouples the query into two parts. The sql-compliant part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
- AQuery Execution Engine: executes queries by compiling the query plan to C++ code. Doesn't support joins and udf functions. - AQuery Library: A set of header based libraries that provide column arithmetic and operations inspired by array programming languages like kdb. This library is used by C++ post-processor code which can significantly reduce the complexity of generated code, reducing compile time while maintaining the best performance. The set of libraries can also be used by UDFs as well as User modules which makes it easier for users to write simple but powerful extensions.
- K9 Execution Engine: (discontinued).
# Roadmap # Roadmap
- [x] SQL Parser -> AQuery Parser (Front End) - [x] SQL Parser -> AQuery Parser (Front End)
@ -156,14 +254,21 @@ See files in ./tests/ for more examples.
- [x] Schema and Data Model - [x] Schema and Data Model
- [x] Data acquisition/output from/to csv file - [x] Data acquisition/output from/to csv file
- [ ] Execution Engine - [ ] Execution Engine
- [x] Projections and single-group Aggregations - [x] Single Query
- [x] Group by Aggregations - [x] Projections and single-group Aggregations
- [x] Filters - [x] Group by Aggregations
- [x] Order by - [x] Filters
- [x] Assumption - [x] Order by
- [x] Flatten - [x] Assumption
- [x] Join (Hybrid Engine only) - [x] Flatten
- [ ] Subqueries - [x] Join (Hybrid Engine only)
- [ ] Subquery
- [ ] With Clause
- [ ] From subquery
- [ ] Select sunquery
- [ ] Where subquery
- [ ] Subquery in group by
- [ ] Subquery in order by
- [x] Query Optimization - [x] Query Optimization
- [x] Selection/Order by push-down - [x] Selection/Order by push-down
- [x] Join Optimization (Only in Hybrid Engine) - [x] Join Optimization (Only in Hybrid Engine)
@ -185,3 +290,24 @@ See files in ./tests/ for more examples.
- [ ] Bug: Order By after Group By - [ ] Bug: Order By after Group By
- [ ] Functionality: Having clause, With clause - [ ] Functionality: Having clause, With clause
- [ ] Decouple expr.py - [ ] Decouple expr.py
# Credit:
- [mo-sql-parsing](https://github.com/klahnakoski/mo-sql-parsing) <br>
Author: Kyle Lahnakoski <br>
License (Mozilla Public License 2.0): https://github.com/klahnakoski/mo-sql-parsing/blob/dev/LICENSE
- [Fast C++ CSV Parser](https://github.com/ben-strasser/fast-cpp-csv-parser) <br>
Author: Ben Strasser <br>
License (BSD 3-Clause License): https://github.com/ben-strasser/fast-cpp-csv-parser/blob/master/LICENSE
- [Dragonbox](https://github.com/jk-jeon/dragonbox)<br>
Author: Junekey Jeon
License (Boost, Apache2-LLVM): <br>https://github.com/jk-jeon/dragonbox/blob/master/LICENSE-Boost <br>
https://github.com/jk-jeon/dragonbox/blob/master/LICENSE-Apache2-LLVM
- [itoa](https://github.com/jeaiii/itoa) <br>
Author: James Edward Anhalt III <br>
License (MIT): https://github.com/jeaiii/itoa/blob/main/LICENSE
- [MonetDB](https://www.monetdb.org) <br>
License (Mozilla Public License): https://github.com/MonetDB/MonetDB/blob/master/license.txt

@ -855,7 +855,8 @@ class filter(ast_node):
def produce(self, node): def produce(self, node):
filter_expr = expr(self, node) filter_expr = expr(self, node)
self.add(filter_expr.sql) self.add(filter_expr.sql)
self.datasource.join_conditions += filter_expr.join_conditions if self.datasource is not None:
self.datasource.join_conditions += filter_expr.join_conditions
class create_table(ast_node): class create_table(ast_node):
name = 'create_table' name = 'create_table'
@ -945,7 +946,19 @@ class insert(ast_node):
pass pass
self.sql += ', '.join(list_values) + ')' self.sql += ', '.join(list_values) + ')'
class delete_table(ast_node):
name = 'delete'
first_order = name
def init(self, node):
super().init(node)
def produce(self, node):
tbl = node['delete']
self.sql = f'DELETE FROM {tbl} '
if 'where' in node:
self.sql += filter(self, node['where']).sql
class load(ast_node): class load(ast_node):
name="load" name="load"
first_order = name first_order = name

Loading…
Cancel
Save