vectorizer.vectorizer

Vectorizer Objects

class Vectorizer()

Constructs a vector consisting of operator code and normalized value for each predicate in the sql query set with set_query method.

__init__

def __init__()

Intitialises the Vectorizer object by defining available operators.

add_queries_with_cardinalities

def add_queries_with_cardinalities(queries_with_cardinalities_path: str)

Reads CSV file with format (querySetID;query;encodings;max_card;min_max_step;estimated_cardinality;true_cardinality) whereas min_max_step is an array of the format [[1, 2, 1], [1, 113, 1], [1878, 2115, 1]] sorted by lexicographic order of corresponding predicates and encodings is an empty array if only integer values are processed. For a querySetID all predicates are collected and sorted in lexicographical order to provide correct indices (e.g. in encodings & min_max_value) for a given predicate. Read queries are added to the list of vectorisation tasks.

Arguments:

true cardinalities - queries_with_cardinalities_path: path to a CSV file containing all queries and their estimated and

vectorize

def vectorize() -> List[np.array]

Vectorizes all vectorization tasks added.

Returns:

List of np.array vectors whereas each row contains the vectorized query and appended maximal,

save

def save(base_path: str, result_folder: str, base_filename: str, filetypes: str)

Stores the SQL query and corresponding vector at given path as NPY and TXT file.

Arguments:

base_path to empathize the need for an extra folder, since multiple files are saved.
  • base_path: path to a directory for saving
  • result_folder: name of folder to create for storing multiple files. This argument is seperated from
  • filename: filename without filetype. querySetID is appended for differentiation
  • filetypes: string of file types must contain "csv" or "npy"

vectorize_query_original

def vectorize_query_original(query: str, min_max: Dict[str, Tuple[int, int, int]], encoders: List[Dict[str, int]]) -> np.array

Copy-pasted method of the original implementation for testing purposes; Only added Join detection

Arguments:

  • query: the query to vectorize
  • min_max: dictionary of all min, max, step values for each predicate
  • encoders: dictionary, which maps predicates to encoders

Returns:

the normalized vector without cardinalities

vectorizer_tests

def vectorizer_tests()

Test method to compare the original implementation with jupyter notebook output (truth) or with the Vectorizer implementation. Succeeds if no assertion throws an error.

results matching ""

    No results matching ""