AQuery Database
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
Bill 522e9e267b
WIP: new decoupled expr implementation
2 years ago
aquery_parser Bug fixes for alias&join. Add test in presentation. 2 years ago
data bug fixes 2 years ago
docs regression: nested aggregation support 2 years ago
engine bug fix on select into 2 years ago
lib clean trash 2 years ago
monetdb correct line_ending 2 years ago
msc-plugin bug fix 2 years ago
msvs-py correct line_ending 2 years ago
reconstruct WIP: new decoupled expr implementation 2 years ago
sdk bug fix on select into 2 years ago
server WIP: new decoupled expr implementation 2 years ago
tests updated documentation 2 years ago
.gitignore updated documentation 2 years ago
Dockerfile restructure 2 years ago
LICENSE correct line_ending 2 years ago
Makefile updated documentation 2 years ago
README.md updated documentation 2 years ago
aquery_config.py Bug fixes for alias&join. Add test in presentation. 2 years ago
build.py imporved build driver, basic support for count() 2 years ago
build_instructions.txt correct line_ending 2 years ago
csv.h fix gitw 3 years ago
datagen.cpp Bug fixes for alias&join. Add test in presentation. 2 years ago
dbconn.py correct line_ending 2 years ago
header.cxx fix gitw 3 years ago
mmw.cpp fix gitw 3 years ago
prompt.py imporved build driver, basic support for count() 2 years ago
requirements.txt Merge branch 'master' of https://github.com/sunyinqi0508/AQuery2 2 years ago
sample_ast.json correct line_ending 2 years ago
test.aquery imporved build driver, basic support for count() 2 years ago

README.md

AQuery++ Database

Introduction

AQuery++ Database is a cross-platform, In-Memory Column-Store Database that incorporates compiled query execution.

Architecture

Architecture

AQuery Compiler

  • The query is first processed by the AQuery Compiler which is composed of a frontend that parses the query into AST and a backend that generates target code that delivers the query.
  • Front end of AQuery++ Compiler is built on top of mo-sql-parsing with modifications to handle AQuery dialect and extension.
  • Backend of AQuery++ Compiler generates target code dependent on the Execution Engine. It can either be the C++ code for AQuery Execution Engine or sql and C++ post-processor for Hybrid Engine or k9 for the k9 Engine.

Execution Engines

  • AQuery++ supports different execution engines thanks to the decoupled compiler structure.
  • AQuery Execution Engine: executes queries by compiling the query plan to C++ code. Doesn't support joins and udf functions.
  • Hybrid Execution Engine: decouples the query into two parts. The sql-compliant part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
  • K9 Execution Engine: (discontinued).

Roadmap

  • SQL Parser -> AQuery Parser (Front End)
  • AQuery-C++ Compiler (Back End)
    • Schema and Data Model
    • Data acquisition/output from/to csv file
  • Execution Engine
    • Projections and single-group Aggregations
    • Group by Aggregations
    • Filters
    • Order by
    • Assumption
    • Flatten
    • UDFs (Hybrid Engine only)
    • User Module
    • Triggers
    • Join (Hybrid Engine only)
    • Subqueries
  • Query Optimization
    • Selection/Order by push-down
    • Join Optimization (Only in Hybrid Engine)

Known Issues:

  • User Module test
  • Interval based triggers
  • Hot reloading server binary
  • Bug fixes: type deduction misaligned in Hybrid Engine
  • Investigation: Using postproc only for q1 in Hybrid Engine (make is_special always on)
  • Limitation: putting ColRefs back to monetdb. (Comparison)
  • C++ Meta-Programming: Eliminate template recursions as much as possible.
  • Limitation: Date and Time, String operations, Funcs in groupby agg.
  • Functionality: Basic helper functions in aquery
  • Improvement: More DDLs, e.g. drop table, update table, etc.
  • Bug: Join-Aware Column management
  • Bug: Order By after Group By

Installation

Requirements

  1. Recent version of Linux, Windows or MacOS, with recent C++ compiler that has C++17 (1z) support. (however c++20 is recommended if available for heterogeneous lookup on unordered containers)

    • GCC: 9.0 or above (g++ 7.x, 8.x fail to handle fold-expressions due to a compiler bug)
    • Clang: 5.0 or above (Recommended)
    • MSVC: 2017 or later (2022 or above is recommended)
  2. Monetdb for Hybrid Engine

    • On windows, the required libraries and headers are already included in the repo.
    • On Linux, see Monetdb Easy Setup for instructions.
    • On MacOS, Monetdb can be easily installed in homebrew brew install monetdb.
  3. Python 3.6 or above and install required packages in requirements.txt by python3 -m pip install -r requirements.txt

Usage

python3 prompt.py will launch the interactive command prompt. The server binary will be autometically rebuilt and started.

Commands:

  • <sql statement>: parse AQuery statement

  • f <filename>: parse all AQuery statements in file

  • dbg start debugging session

  • print: printout parsed AQuery statements

  • xexec: execute last parsed statement(s) with Hybrid Execution Engine. Hybrid Execution Engine decouples the query into two parts. The standard SQL (MonetDB dialect) part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.

  • save <OPTIONAL: filename>: save current code snippet. will use random filename if not specified.

  • exit: quit the prompt

  • exec: execute last parsed statement(s) with AQuery Execution Engine (Old). AQuery Execution Engine executes query by compiling it to C++ code and then executing it.

  • r: run the last generated code snippet

Example:

f moving_avg.a
xexec

See ./tests/ for more examples.

Notes for arm64 macOS users

  • In theory, AQuery++ could work on both native arm64 and x86_64 through Rosetta. But for maximum performance, running native is preferred.
  • However, they can't be mixed up, i.e. make sure every component, python binary, C++ compiler, monetdb library and system commandline utilities such as uname should have the same architecture.
  • Because I can't get access to an arm-based mac to fully test this setup, there might still be issues. Please open an issue if you encounter any problems.