vectorizer.vectorizer
Vectorizer Objects
class Vectorizer()
Constructs a vector consisting of operator code and normalized value for each predicate in the sql query set with set_query method.
__init__
def __init__()
Intitialises the Vectorizer object by defining available operators.
add_queries_with_cardinalities
def add_queries_with_cardinalities(queries_with_cardinalities_path: str)
Reads CSV file with format (querySetID;query;encodings;max_card;min_max_step;estimated_cardinality;true_cardinality) whereas min_max_step is an array of the format [[1, 2, 1], [1, 113, 1], [1878, 2115, 1]] sorted by lexicographic order of corresponding predicates and encodings is an empty array if only integer values are processed. For a querySetID all predicates are collected and sorted in lexicographical order to provide correct indices (e.g. in encodings & min_max_value) for a given predicate. Read queries are added to the list of vectorisation tasks.
Arguments:
true cardinalities
- queries_with_cardinalities_path: path to a CSV file containing all queries and their estimated and
vectorize
def vectorize() -> List[np.array]
Vectorizes all vectorization tasks added.
Returns:
List of np.array vectors whereas each row contains the vectorized query and appended maximal,
save
def save(base_path: str, result_folder: str, base_filename: str, filetypes: str)
Stores the SQL query and corresponding vector at given path as NPY and TXT file.
Arguments:
base_path to empathize the need for an extra folder, since multiple files are saved.
base_path: path to a directory for savingresult_folder: name of folder to create for storing multiple files. This argument is seperated fromfilename: filename without filetype. querySetID is appended for differentiationfiletypes: string of file types must contain "csv" or "npy"
vectorize_query_original
def vectorize_query_original(query: str, min_max: Dict[str, Tuple[int, int, int]], encoders: List[Dict[str, int]]) -> np.array
Copy-pasted method of the original implementation for testing purposes; Only added Join detection
Arguments:
query: the query to vectorizemin_max: dictionary of all min, max, step values for each predicateencoders: dictionary, which maps predicates to encoders
Returns:
the normalized vector without cardinalities
vectorizer_tests
def vectorizer_tests()
Test method to compare the original implementation with jupyter notebook output (truth) or with the Vectorizer implementation. Succeeds if no assertion throws an error.