- String Types: `STRING` and `TEXT` are variable-length strings with unlimited length. `VARCHAR(n)` is for strings with upper-bound limits.
- String Types: `STRING` and `TEXT` are variable-length strings with unlimited length. `VARCHAR(n)` is for strings with upper-bound limits.
- Integer Types: `INT` and `INTEGER` are 32-bit integers, `SMALLINT` is for 16-bit integers, `TINYINT` is for 8-bit integers and `BIGINT` is 64-bit integers. On Linux and macOS, `HGEINT` is 128-bit integers.
- Integer Types: `INT` and `INTEGER` are 32-bit integers, `SMALLINT` is for 16-bit integers, `TINYINT` is for 8-bit integers and `BIGINT` is 64-bit integers. On Linux and macOS, `HGEINT` is 128-bit integers.
- Floating-Point Types: `REAL` denotes 32-bit floating point numbers while `DOUBLE` denotes 64-bit floating point numbers.
- Floating-Point Types: `REAL` denotes 32-bit floating point numbers while `DOUBLE` denotes 64-bit floating point numbers.
- Temporal Types: `DATE` only supports the format of `yyyy-mm-dd`, and `TIME` uses 24-hour format and has the form of `hh:mm:ss:ms` the milliseconds part can range from 0 to 999, `TIMESTAMP` has the format of `yyyy-mm-dd hh:mm:ss:ms`. When importing data from CSV files, please make sure the spreadsheet software (if they were used) doesn't change the format of the date and timestamp by double-checking the file with a plain-text editor.
- Temporal Types: `DATE` only supports the format of `yyyy-mm-dd`, and `TIME` uses 24-hour format and has the form of `hh:mm:ss:ms` the milliseconds part can range from 0 to 999, `TIMESTAMP` has the format of `yyyy-mm-dd hh:mm:ss:ms`. When importing data from CSV files, please make sure the spreadsheet software (if they were used) doesn't change the format of the date and timestamp by double-checking the file with a plain-text editor.
- Boolean Type: `BOOLEAN` is a boolean type with values `TRUE` and `FALSE`.
- Boolean Type: `BOOLEAN`or `BOOL`is a boolean type with values `TRUE` and `FALSE`.
## Create Table
Tables can be created using `CREATE TABLE` statement. For example
```
CREATE TABLE my_table (c1 INT, c2 INT, c3 STRING)
INSERT INTO my_table VALUES(10, 20, "example")
INSERT INTO my_table SELECT * FROM my_table
```
You can also create tables using a query. For example:
```
CREATE TABLE my_table_derived
AS
SELECT c1, c2 * 2 as twice_c2 FROM my_table
```
## Drop Table:
Tables can be dropped using `DROP TABLE` statement. For example:
```
DROP TABLE my_table IF EXISTS
```
## Load Data:
## Load Data:
- Use query like `LOAD DATA INFILE <filename> INTO <table_name> [OPTIONS <options>]`
- Use query like `LOAD DATA INFILE <filename> INTO <table_name> [OPTIONS <options>]`
- File name is the relative path to the AQuery root directory (where prompy.py resides)
- File name is the relative path to the AQuery root directory (where prompy.py resides)
- File name can also be absolute path.
- File name can also be absolute path.
- See `data/q1.sql` for more information
- See `data/q1.sql` for more information
## Delete Data:
- Use a query like `DELETE FROM <table_name> [WHERE <conditions>]` to delete rows from a table that matches the conditions.
## Built-in functions:
- `avg[s]`: average of a column. `avgs(col), avgs(w, col)` is rolling and moving average with window `w` of the column `col`.
- `var[s]`, `stddev[s]`: [moving/rolling] **population** variance, standard deviation.
- `sum[s]`, `max[s]`, `min[s]`: similar to `avg[s]`
- `ratios(w = 1, col)`: moving ratio of a column, e.g. `ratios(w, col)[i]=col[i-w]/col[i]`. Window `w` has default value of 1.
- `next(col), prev(col)`: moving column back and forth by 1, e.g. `next(col)[i] = col[i+1]`.
- `first(col), last(col)`: first and last value of a column, i.e. `first(col)= col[0]`, `last(col) = col[n-1]`.
- `sqrt(x), trunc(x), and other builtin math functions`: value-wise math operations. `sqrt(x)[i] = sqrt(x[i])`
- `pack(cols, ...)`: pack multiple columns with exact same type into a single column.
# Architecture
# Architecture
![Architecture](./docs/arch-hybrid.svg)
![Architecture](./docs/arch-hybrid.svg)
@ -147,8 +246,7 @@ See files in ./tests/ for more examples.
## Execution Engines
## Execution Engines
- AQuery++ supports different execution engines thanks to the decoupled compiler structure.
- AQuery++ supports different execution engines thanks to the decoupled compiler structure.
- Hybrid Execution Engine: decouples the query into two parts. The sql-compliant part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
- Hybrid Execution Engine: decouples the query into two parts. The sql-compliant part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
- AQuery Execution Engine: executes queries by compiling the query plan to C++ code. Doesn't support joins and udf functions.
- AQuery Library: A set of header based libraries that provide column arithmetic and operations inspired by array programming languages like kdb. This library is used by C++ post-processor code which can significantly reduce the complexity of generated code, reducing compile time while maintaining the best performance. The set of libraries can also be used by UDFs as well as User modules which makes it easier for users to write simple but powerful extensions.
- K9 Execution Engine: (discontinued).
# Roadmap
# Roadmap
- [x] SQL Parser -> AQuery Parser (Front End)
- [x] SQL Parser -> AQuery Parser (Front End)
@ -156,14 +254,21 @@ See files in ./tests/ for more examples.
- [x] Schema and Data Model
- [x] Schema and Data Model
- [x] Data acquisition/output from/to csv file
- [x] Data acquisition/output from/to csv file
- [ ] Execution Engine
- [ ] Execution Engine
- [x] Projections and single-group Aggregations
- [x] Single Query
- [x] Group by Aggregations
- [x] Projections and single-group Aggregations
- [x] Filters
- [x] Group by Aggregations
- [x] Order by
- [x] Filters
- [x] Assumption
- [x] Order by
- [x] Flatten
- [x] Assumption
- [x] Join (Hybrid Engine only)
- [x] Flatten
- [ ] Subqueries
- [x] Join (Hybrid Engine only)
- [ ] Subquery
- [ ] With Clause
- [ ] From subquery
- [ ] Select sunquery
- [ ] Where subquery
- [ ] Subquery in group by
- [ ] Subquery in order by
- [x] Query Optimization
- [x] Query Optimization
- [x] Selection/Order by push-down
- [x] Selection/Order by push-down
- [x] Join Optimization (Only in Hybrid Engine)
- [x] Join Optimization (Only in Hybrid Engine)
@ -185,3 +290,24 @@ See files in ./tests/ for more examples.