Added drop table, 'if (not) exists' support. Bug fixes

3 years ago · eac25ddbb3
parent 3bbca0d6b0
commit eac25ddbb3
15 changed files with 151 additions and 76 deletions
--- a/2
+++ b/2
@ -97,6 +97,6 @@ docker:
 	docker build -t aquery .

 clean:
-	rm *.shm *.o dll.so server.so server.bin libaquery.a libaquery.lib -rf 2> $(NULL_DEVICE) || true
+	rm .cached *.shm *.o dll.so server.so server.bin libaquery.a libaquery.lib -rf 2> $(NULL_DEVICE) || true


--- a/README.md
+++ b/README.md
@ -4,6 +4,43 @@

 AQuery++ Database is a cross-platform, In-Memory Column-Store Database that incorporates compiled query execution.

+# Installation
+## Requirements
+1. Recent version of Linux, Windows or MacOS, with recent C++ compiler that has C++17 (1z) support. (however c++20 is recommended if available for heterogeneous lookup on unordered containers)
+     - GCC: 9.0 or above (g++ 7.x, 8.x fail to handle fold-expressions due to a compiler bug)
+     - Clang: 5.0 or above (Recommended)
+     - MSVC: 2017 or later (2022 or above is recommended)
+
+2. Monetdb for Hybrid Engine
+   - On windows, the required libraries and headers are already included in the repo.
+   - On Linux, see [Monetdb Easy Setup](https://www.monetdb.org/easy-setup/) for instructions.
+   - On MacOS, Monetdb can be easily installed in homebrew `brew install monetdb`.
+
+3. Python 3.6 or above and install required packages in requirements.txt by `python3 -m pip install -r requirements.txt` 
+## Usage
+`python3 prompt.py` will launch the interactive command prompt. The server binary will be autometically rebuilt and started.
+#### Commands:
+- `<sql statement>`: parse AQuery statement
+- `f <filename>`: parse all AQuery statements in file
+- `dbg` start debugging session 
+- `print`: printout parsed AQuery statements
+
+- `xexec`: execute last parsed statement(s) with Hybrid Execution Engine. Hybrid Execution Engine decouples the query into two parts. The standard SQL (MonetDB dialect) part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
+- `save <OPTIONAL: filename>`: save current code snippet. will use random filename if not specified.
+- `exit`: quit the prompt
+- `exec`: execute last parsed statement(s) with AQuery Execution Engine (Old). AQuery Execution Engine executes query by compiling it to C++ code and then executing it.
+- `r`: run the last generated code snippet
+### Example:
+   `f moving_avg.a` <br>
+   `xexec`
+
+See ./tests/ for more examples. 
+
+## Notes for arm64 macOS users
+- In theory, AQuery++ could work on both native arm64 and x86_64 through Rosetta. But for maximum performance, running native is preferred. 
+- However, they can't be mixed up, i.e. make sure every component, `python` binary, `C++ compiler`, `monetdb` library and system commandline utilities such as `uname` should have the same architecture. 
+- Because I can't get access to an arm-based mac to fully test this setup, there might still be issues. Please open an issue if you encounter any problems.
+
 ## Architecture 
 ![Architecture](./docs/arch-hybrid.svg)

@ -40,53 +77,15 @@ AQuery++ Database is a cross-platform, In-Memory Column-Store Database that inco

 ## Known Issues:

- [x] User Module test
 - [ ] Interval based triggers
- [x] Hot reloading server binary
+- [ ] Hot reloading server binary
 - [x] Bug fixes: type deduction misaligned in Hybrid Engine
 - [ ] Investigation: Using postproc only for q1 in Hybrid Engine (make is_special always on)
 - [x] Limitation: putting ColRefs back to monetdb. (Comparison)
 - [ ] C++ Meta-Programming: Eliminate template recursions as much as possible.
- [x] Limitation: Date and Time, String operations, Funcs in groupby agg.
 - [ ] Functionality: Basic helper functions in aquery 
 - [ ] Improvement: More DDLs, e.g. drop table, update table, etc.
 - [ ] Bug: Join-Aware Column management
 - [ ] Bug: Order By after Group By


-# Installation
-## Requirements
-1. Recent version of Linux, Windows or MacOS, with recent C++ compiler that has C++17 (1z) support. (however c++20 is recommended if available for heterogeneous lookup on unordered containers)
-     - GCC: 9.0 or above (g++ 7.x, 8.x fail to handle fold-expressions due to a compiler bug)
-     - Clang: 5.0 or above (Recommended)
-     - MSVC: 2017 or later (2022 or above is recommended)
-
-2. Monetdb for Hybrid Engine
-   - On windows, the required libraries and headers are already included in the repo.
-   - On Linux, see [Monetdb Easy Setup](https://www.monetdb.org/easy-setup/) for instructions.
-   - On MacOS, Monetdb can be easily installed in homebrew `brew install monetdb`.
-
-3. Python 3.6 or above and install required packages in requirements.txt by `python3 -m pip install -r requirements.txt` 
-## Usage
-`python3 prompt.py` will launch the interactive command prompt. The server binary will be autometically rebuilt and started.
-#### Commands:
- `<sql statement>`: parse AQuery statement
- `f <filename>`: parse all AQuery statements in file
- `dbg` start debugging session 
- `print`: printout parsed AQuery statements
-
- `xexec`: execute last parsed statement(s) with Hybrid Execution Engine. Hybrid Execution Engine decouples the query into two parts. The standard SQL (MonetDB dialect) part is executed by an Embedded version of Monetdb and everything else is executed by a post-process module which is generated by AQuery++ Compiler in C++ and then compiled and executed.
- `save <OPTIONAL: filename>`: save current code snippet. will use random filename if not specified.
- `exit`: quit the prompt
- `exec`: execute last parsed statement(s) with AQuery Execution Engine (Old). AQuery Execution Engine executes query by compiling it to C++ code and then executing it.
- `r`: run the last generated code snippet
-### Example:
-   `f moving_avg.a` <br>
-   `xexec`
-
-See ./tests/ for more examples. 
-
-## Notes for arm64 macOS users
- In theory, AQuery++ could work on both native arm64 and x86_64 through Rosetta. But for maximum performance, running native is preferred. 
- However, they can't be mixed up, i.e. make sure every component, `python` binary, `C++ compiler`, `monetdb` library and system commandline utilities such as `uname` should have the same architecture. 
- Because I can't get access to an arm-based mac to fully test this setup, there might still be issues. Please open an issue if you encounter any problems.
--- a/aquery_config.py
+++ b/aquery_config.py
@ -2,7 +2,7 @@

 ## GLOBAL CONFIGURATION FLAGS

-version_string = '0.4.5a'
+version_string = '0.4.6a'
 add_path_to_ldpath = True
 rebuild_backend = False
 run_backend = True
@ -13,7 +13,7 @@ os_platform = 'unknown'
 build_driver = 'Makefile'

 def init_config():
-    global __config_initialized__, os_platform, msbuildroot
+    global __config_initialized__, os_platform, msbuildroot, build_driver
 ## SETUP ENVIRONMENT VARIABLES
    # __config_initialized__ = False
    #os_platform = 'unkown'
@ -48,6 +48,7 @@ def init_config():
            vsloc = vswhere.find(prerelease = True, latest = True, prop = 'installationPath')
            if vsloc:
                msbuildroot = vsloc[0] + '/MSBuild/Current/Bin/MSBuild.exe'
+                build_driver = 'MSBuild'
            else:
                print('Warning: No Visual Studio installation found.')
            # print("adding path")
--- a/engine/types.py
+++ b/engine/types.py
@ -112,7 +112,7 @@ VarcharT = Types(200, name = 'varchar', cname = 'const char*', sqlname='VARCHAR'
 VoidT = Types(200, name = 'void', cname = 'void', sqlname='Null', ctype_name = 'types::None')

 class VectorT(Types):
-    def __init__(self, inner_type : Types, vector_type:str = 'ColRef'):
+    def __init__(self, inner_type : Types, vector_type:str = 'vector_type'):
        self.inner_type = inner_type
        self.vector_type = vector_type
        
@ -121,7 +121,7 @@ class VectorT(Types):
        return f'{self.vector_type}<{self.inner_type.name}>'
    @property
    def sqlname(self) -> str:
-        return 'BINARY'
+        return 'BIGINT'
    @property
    def cname(self) -> str:
        return self.name
--- a/reconstruct/init.py
+++ b/reconstruct/init.py
@ -18,6 +18,8 @@ def generate(ast, cxt):
            ast_node.types[k](None, ast, cxt)

 def exec(stmts, cxt = None, keep = False):
+    if 'stmts' not in stmts:
+        return
    cxt = initialize(cxt, keep)
    stmts_stmts = stmts['stmts']
    if type(stmts_stmts) is list:
--- a/reconstruct/ast.py
+++ b/reconstruct/ast.py
@ -2,6 +2,7 @@ from copy import deepcopy
 from dataclasses import dataclass
 from enum import Enum, auto
 from typing import Set, Tuple, Dict, Union, List, Optional
+
 from engine.types import *
 from engine.utils import enlist, base62uuid, base62alp, get_legal_name
 from reconstruct.storage import Context, TableInfo, ColRef
@ -151,7 +152,8 @@ class projection(ast_node):
                        name = enlist(sql_expr.eval(False, y, count=count))
                        this_type = enlist(this_type)
                        proj_expr = enlist(proj_expr)
-                    for t, n, pexpr in zip(this_type, name, proj_expr):
+                    for t, n, pexpr, cp in zip(this_type, name, proj_expr, compound):
+                        t = VectorT(t) if cp else t
                        offset = len(col_exprs)
                        if n not in self.var_table:
                            self.var_table[n] = offset
@ -285,11 +287,11 @@ class projection(ast_node):
                    val[2].cols_mentioned.intersection(
                        self.datasource.all_cols().difference(self.group_node.refs))
                    ) and val[2].is_compound # compound val not in key
-                    or 
-                    val[2].is_compound > 1
+                    # or 
+                    # val[2].is_compound > 1
                    # (not self.group_node and val[2].is_compound)
                    ):
-                    out_typenames[key] = f'ColRef<{out_typenames[key]}>'
+                    out_typenames[key] = f'vector_type<{out_typenames[key]}>'
                    self.out_table.columns[key].compound = True
        outtable_col_nameslist = ', '.join([f'"{c.name}"' for c in self.out_table.columns])
        self.outtable_col_names = 'names_' + base62uuid(4)
@ -530,7 +532,7 @@ class groupby_c(ast_node):
                    materialize_builtin['_builtin_len'] = len_var
                if '_builtin_ret' in ex.udf_called.builtin_used:
                    define_len_var()
-                    gscanner.add(f'{ce[0]}.emplace_back({{{len_var}}});\n')
+                    gscanner.add(f'{ce[0]}.emplace_back({len_var});\n')
                    materialize_builtin['_builtin_ret'] = f'{ce[0]}.back()'
                    gscanner.add(f'{ex.eval(c_code = True, y=get_var_names, materialize_builtin = materialize_builtin)};\n')
                    continue
@ -763,9 +765,37 @@ class create_table(ast_node):
        if self.context.use_columnstore:
            self.sql += ' engine=ColumnStore'
                    
+class drop(ast_node):
+    name = 'drop'
+    first_order = name
+    def produce(self, node):
+        node = node['drop']
+        tbl_name = node['table']
+        if tbl_name in self.context.tables_byname:
+            tbl_obj = self.context.tables_byname[tbl_name]
+            # TODO: delete in postproc engine
+            self.context.tables_byname.pop(tbl_name)
+            self.context.tables.remove(tbl_obj)
+            self.sql += 'TABLE IF EXISTS ' + tbl_name
+            return
+        elif 'if_exists' not in node or not node['if_exists']:
+            print(f'Error: table {tbl_name} not found.')
+        self.sql = ''
+        
 class insert(ast_node):
    name = 'insert'
    first_order = name
+    def init(self, node):
+        values = node['query']
+        complex_query_kw = ['from', 'where', 'groupby', 'having', 'orderby', 'limit']
+        if any([kw in values for kw in complex_query_kw]):
+            values['into'] = node['insert']
+            projection(None, values, self.context)
+            self.produce = lambda*_:None
+            self.spawn = lambda*_:None
+            self.consume = lambda*_:None
+        else:
+            super().init(node)
            
    def produce(self, node):
        values = node['query']['select']
@ -773,6 +803,7 @@ class insert(ast_node):
        self.sql = f'INSERT INTO {tbl} VALUES('
        # if len(values) != table.n_cols:
        #     raise ValueError("Column Mismatch")
+
        list_values = []
        for i, s in enumerate(values):
            if 'value' in s:
--- a/reconstruct/storage.py
+++ b/reconstruct/storage.py
@ -59,7 +59,7 @@ class TableInfo:
        cxt.tables_byname[self.table_name] = self # construct reverse map

    def add_cols(self, cols, new = True):
-        for c in cols:
+        for c in enlist(cols):
            self.add_col(c, new)
            
    def add_col(self, c, new = True):
--- a/server/aggregations.h
+++ b/server/aggregations.h
@ -137,6 +137,7 @@ decayed_t<VT, types::GetLongType<T>> sums(const VT<T>& arr) {
 		ret[i] = ret[i-1] + arr[i];
 	return ret;
 }
+
 template<class T, template<typename ...> class VT>
 decayed_t<VT, types::GetFPType<types::GetLongType<T>>> avgs(const VT<T>& arr) {
 	const uint32_t& len = arr.size;
@ -149,6 +150,7 @@ decayed_t<VT, types::GetFPType<types::GetLongType<T>>> avgs(const VT<T>& arr) {
 		ret[i] = (s+=arr[i])/(FPType)(i+1);
 	return ret;
 }
+
 template<class T, template<typename ...> class VT>
 decayed_t<VT, types::GetLongType<T>> sumw(uint32_t w, const VT<T>& arr) {
 	const uint32_t& len = arr.size;
@ -162,6 +164,7 @@ decayed_t<VT, types::GetLongType<T>> sumw(uint32_t w, const VT<T>& arr) {
 		ret[i] = ret[i-1] + arr[i] - arr[i-w];
 	return ret;
 }
+
 template<class T, template<typename ...> class VT>
 decayed_t<VT, types::GetFPType<types::GetLongType<T>>> avgw(uint32_t w, const VT<T>& arr) {
 	typedef types::GetFPType<types::GetLongType<T>> FPType;
--- a/server/io.cpp
+++ b/server/io.cpp
@ -265,16 +265,16 @@ string base62uuid(int l) {
 }


-template<typename _Ty>
-inline void vector_type<_Ty>::out(uint32_t n, const char* sep) const
-{
-	n = n > size ? size : n;
-	std::cout << '(';
-	{	
-		uint32_t i = 0;
-		for (; i < n - 1; ++i)
-			std::cout << this->operator[](i) << sep;
-		std::cout << this->operator[](i);
-	}
-	std::cout << ')';
-}
+// template<typename _Ty>
+// inline void vector_type<_Ty>::out(uint32_t n, const char* sep) const
+// {
+// 	n = n > size ? size : n;
+// 	std::cout << '(';
+// 	{	
+// 		uint32_t i = 0;
+// 		for (; i < n - 1; ++i)
+// 			std::cout << this->operator[](i) << sep;
+// 		std::cout << this->operator[](i);
+// 	}
+// 	std::cout << ')';
+// }
--- a/server/table.h
+++ b/server/table.h
@ -129,7 +129,7 @@ public:
 	}

 	// defined in table_ext_monetdb.hpp
-	void* monetdb_get_col();
+	void* monetdb_get_col(void** gc_vecs, uint32_t& cnt);
 	
 };
 template<>
--- a/server/table_ext_monetdb.hpp
+++ b/server/table_ext_monetdb.hpp
@ -22,7 +22,7 @@ inline constexpr monetdbe_types AQType_2_monetdbe[] = {
 #else 
 		monetdbe_int64_t,
 #endif
-		monetdbe_int16_t, monetdbe_int8_t, monetdbe_bool, monetdbe_int64_t,
+		monetdbe_int16_t, monetdbe_int8_t, monetdbe_bool, monetdbe_int128_t,
 		monetdbe_timestamp, monetdbe_int64_t, monetdbe_int64_t
 };

@ -35,10 +35,13 @@ void TableInfo<Ts ...>::monetdb_append_table(void* srv, const char* alt_name) {
 	monetdbe_column** monetdbe_cols = new monetdbe_column * [sizeof...(Ts)];
 	
 	uint32_t i = 0;
+	constexpr auto n_vecs = count_vector_type((tuple_type*)(0));
+	void* gc_vecs[1 + n_vecs];
 	puts("getcols...");
-	const auto get_col = [&monetdbe_cols, &i, *this](auto v) {
+	uint32_t cnt = 0;
+	const auto get_col = [&monetdbe_cols, &i, *this, &gc_vecs, &cnt](auto v) {
 		printf("%d %d\n", i, (ColRef<void>*)v - colrefs);
-		monetdbe_cols[i++] = (monetdbe_column*)v->monetdb_get_col();
+		monetdbe_cols[i++] = (monetdbe_column*)v->monetdb_get_col(gc_vecs, cnt);
 	};
 	(get_col((ColRef<Ts>*)(colrefs + i)), ...);
 	puts("getcols done");
@ -47,7 +50,7 @@ void TableInfo<Ts ...>::monetdb_append_table(void* srv, const char* alt_name) {
 		printf("no:%d name: %s count:%d data: %p type:%d \n", 
 		i, monetdbe_cols[i]->name, monetdbe_cols[i]->count, monetdbe_cols[i]->data, monetdbe_cols[i]->type);
 	}
-	std::string create_table_str = "CREATE TABLE ";
+	std::string create_table_str = "CREATE TABLE IF NOT EXISTS ";
 	create_table_str += alt_name;
 	create_table_str += " (";
 	i = 0;
@ -70,12 +73,14 @@ void TableInfo<Ts ...>::monetdb_append_table(void* srv, const char* alt_name) {
 			return;
 		}
 	}
+	// for(uint32_t i = 0; i < n_vecs; ++i) 
+	// 		free(gc_vecs[i]);
 	puts("Error! Empty table.");
 }


 template<class Type>
-void* ColRef<Type>::monetdb_get_col() {
+void* ColRef<Type>::monetdb_get_col(void** gc_vecs, uint32_t& cnt) {
 	auto aq_type = AQType_2_monetdbe[types::Types<Type>::getType()];
 	monetdbe_column* col = (monetdbe_column*)malloc(sizeof(monetdbe_column));

@ -83,7 +88,13 @@ void* ColRef<Type>::monetdb_get_col() {
 	col->count = this->size;
 	col->data = this->container;
 	col->name = const_cast<char*>(this->name);
-
+	// auto arr = (types::timestamp_t*) malloc (sizeof(types::timestamp_t)* this->size);
+	// if constexpr (is_vector_type<Type>){
+	// 	for(uint32_t i = 0; i < this->size; ++i){
+	// 		memcpy(arr + i, this->container + i, sizeof(types::timestamp_t));
+	// 	}
+	// 	gc_vecs[cnt++] = arr;
+	// }
 	return col;
 }

--- a/server/types.h
+++ b/server/types.h
@ -29,7 +29,7 @@ namespace types {
 	static constexpr const char* printf_str[] = { "%d", "%f", "%s", "%lf", "%Lf", "%ld", "%d", "%hi", "%s", "%s", "%c",
 		"%u", "%lu", "%s", "%hu", "%hhu", "%s", "%s", "Vector<%s>", "%s", "NULL", "ERROR" };
 	static constexpr const char* SQL_Type[] = { "INT", "REAL", "TEXT", "DOUBLE", "DOUBLE", "BIGINT", "HUGEINT", "SMALLINT", "DATE", "TIME", "TINYINT",
-		"INT", "BIGINT", "HUGEINT", "SMALLINT", "TINYINT", "BOOL", "BIGINT", "TIMESTAMP", "NULL", "ERROR"};
+		"INT", "BIGINT", "HUGEINT", "SMALLINT", "TINYINT", "BOOL", "HUGEINT", "TIMESTAMP", "NULL", "ERROR"};
 	
 	
 	// TODO: deal with data/time <=> str/uint conversion
@ -197,12 +197,12 @@ namespace types {
 struct astring_view {
 	const unsigned char* str = 0;
 	
-#if defined(__clang__) or !defined(__GNUC__)
+#if defined(__clang__) || !defined(__GNUC__)
 	constexpr 
 #endif
 	astring_view(const char* str) noexcept :
 		str((const unsigned char*)(str))  {}
-#if defined(__clang__) or !defined(__GNUC__)
+#if defined(__clang__) || !defined(__GNUC__)
 	constexpr
 #endif 
 	astring_view(const signed char* str) noexcept :
@ -373,4 +373,10 @@ constexpr size_t count_type(std::tuple<Types...>* ts) {
    size_t t[] = {sum_type<Types, T1...>() ...};
    return sum_type(t, sizeof...(Types));
 }
+template<class ...Types> 
+constexpr size_t count_vector_type(std::tuple<Types...>* ts) {
+    size_t t[] = {is_vector_type<Types> ...};
+    return sum_type(t, sizeof...(Types));
+}
+
 #endif // !_TYPES_H
--- a/server/vector_type.hpp
+++ b/server/vector_type.hpp
@ -12,11 +12,16 @@
 #include <iterator>
 #include <initializer_list>
 #include <unordered_set>
+#include <iostream>
 #include "hasher.h"
 #include "types.h"

 #pragma pack(push, 1)
 template <typename _Ty>
+class slim_vector {
+
+};
+template <typename _Ty>
 class vector_type {
 public:
 	typedef vector_type<_Ty> Decayed_t;
@ -249,7 +254,25 @@ public:
 		}
 		size = this->size + dist;
 	}
-	void out(uint32_t n = 4, const char* sep = " ") const;
+	inline void out(uint32_t n = 4, const char* sep = " ") const
+	{
+		const char* more = "";
+		if (n < this->size)
+			more = " ... ";
+		else
+			n = this->size;
+
+		std::cout << '(';
+		if (n > 0)
+		{
+			uint32_t i = 0;
+			for (; i < n - 1; ++i)
+				std::cout << this->operator[](i) << sep;
+			std::cout << this->operator[](i);
+		}
+		std::cout<< more;
+		std::cout << ')';
+	}
 	vector_type<_Ty> subvec_memcpy(uint32_t start, uint32_t end) const {
 		vector_type<_Ty> subvec(end - start);
 		memcpy(subvec.container, container + start, sizeof(_Ty) * (end - start));
--- a/test.aquery
+++ b/test.aquery
--- a/tests/network.a
+++ b/tests/network.a
@ -10,4 +10,3 @@ FROM	 network
 	     ASSUMING ASC src, ASC dst, ASC _time 
 GROUP BY src, dst, sums (deltas(_time) > 120)

-