query_parser.query_parser

QueryFormat Objects

class QueryFormat(Enum)

Enum for the different supported query-formats.

CROSS_PRODUCT: SELECT COUNT(*) FROM movie_companies mc,title t,movie_info_idx mi_idx WHERE t.id=mc.movie_id AND t.id=mi_idx.movie_id AND mi_idx.info_type_id=112 AND mc.company_type_id=2;

JOIN_ON: SELECT COUNT(*) FROM movie_companies mc INNER JOIN title t ON (t.id=mc.movie_id) INNER JOIN movie_info_idx mi_idx ON (t.id=mi_idx.movie_id) WHERE mi_idx.info_type_id=112 AND mc.company_type_id=2;

QueryParser Objects

class QueryParser()

Class for the query_parser. This is responsible of reading a given file (.csv/.tsv or .sql) which contains sql queries (for more details see Readme) parse them and return a file (.yaml) containing the aggregated information of the input file. This aggregated .yaml file is the requirement for the MetaCollector.

operators

The possible operators which can occur in the queries. ["<=", "!=", ">=", "=", "<", ">", "IS"]

read_file

def read_file(file_path: str, inner_separator: str = None, outer_separator: str = None, query_format: QueryFormat = QueryFormat.CROSS_PRODUCT) -> Tuple[Dict, str, str, str]

Generic method for rooting the processing of the input file which contains the queries according to the given file type. Because .sql/.tsv files need to be processed another way than .sql files. The parameters inner_separator and outer_separator allow the user to use customized .csv/.tsv files. The parameter query_format allows the user to choose between the two most common join formats.

Arguments:

.sql. No other file types are supported at the moment. This path could be absolute as well as relative.
documentation for details.

:return A tuple containing a dictionary with the table-string as key and a list of selection attributes as value, the file-type, the inner_separator and the outer_separator. - file_path: Path to the file containing the sql statements. This path has to end with .csv/.tsv or - inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> See - outer_separator: The block separator used in the file. -> See documentation for details. - query_format: The format of the sql query. Look at documentation of QueryFormat for details.

read_sql_file

@staticmethod
def read_sql_file(file_path: str, query_format: QueryFormat = QueryFormat.CROSS_PRODUCT) -> Tuple[Dict, str, str, str]

Read and parse the sql statements from given sql file. The most parts of the sql syntax are processed and removed. Parts like 'SELECT COUNT(*)' and 'INNER JOIN' are removed from the query.

Arguments:

types are supported at the moment. This path could be absolute as well as relative. :return A tuple containing a dictionary with the table-string as key and a list of selection attributes as value, the file-type, the inner_separator and the outer_separator. - file_path: Path to the file containing the sql statements. This path has to end with .sql. No other file - query_format: The format of the sql query. Look at documentation of QueryFormat for details.

read_csv_file

@staticmethod
def read_csv_file(file_path: str, inner_separator: str = ",", outer_separator: str = "#") -> Tuple[Dict, str, str, str]

Read the csv formatted sql statements from given file. For more details on the format, look at the readme.

Arguments:

with .csv or .tsv. No other file types are supported at the moment. This path could be absolute as well as relative. documentation for details. :return A tuple containing a dictionary with the table-string as key and a list of selection attributes as value, the file-type, the inner_separator and the outer_separator. - file_path: Path to the file containing the sql statements formatted as csv or .tsv. This path has to end - inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> See - outer_separator: The block separator used in the file. -> See documentation for details.

create_solution_dict

def create_solution_dict(command_dict: Dict[str, List[str] or List[Tuple[str, str]]], file_type: str, inner_separator: str) -> Dict[int, Dict[str, List[str or Tuple[str, str]]]]

Method for building the solution dict. Therefore the given file with the queries must be parsed at first and the command_dict must be created.

Arguments:

clauses as string if the file type is sql or a list of tuples containing the join-attribute-string in first
and the selection-attribute-string in second place.
documentation for details.

:return The solution dict containing 'table_names', 'join_attributes' and 'selection_attributes'. - command_dict: Dict with a alphabetical sorted string of the joining tables as key and a list of where - file_type: String with 'csv'/'tsv' or 'sql' which tells the file type of the read file. - inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> See

table_name_unpacker

@staticmethod
def table_name_unpacker(from_string: str, separator: str = ",") -> List[Tuple[str, str]]

Takes the sorted string of the from clause and extracts the tables with their aliases.

Arguments:

  • from_string: Alphabetical ordered string containing all tables to join, separated by the separator.
  • separator: The column separator used in the file. You can use '\t' for .tsv files.

Returns:

List of tuples where the first element of the tuple is the table name and the second one is the alias.

sql_attribute_unpacker

def sql_attribute_unpacker(where_string_list: List[str]) -> Tuple[List[str], List[str]]

Unpack the attribute strings from sql-file into sets containing the attributes.

Arguments:

selection-attributes.
  • where_string_list: A list of strings from the where clauses. These have to be separated into join- and

Returns:

A tuple containing the list of join-attributes in first and the list of selection-attributes in second

csv_attribute_unpacker

def csv_attribute_unpacker(attribute_tuples: List[Tuple[str, str]], separator: str = ",") -> Tuple[List[str], List[str]]

Unpack the attribute strings from csv-file into sets containing the attributes.

Arguments:

join-attributes, while the second string contains all selection-attributes.
  • attribute_tuples: A list of tuples of strings where the first string is the string for all
  • separator: The column separator used in the file. You can use '\t' for .tsv files.

Returns:

A tuple containing the list of join-attributes in first and the list of selection-attributes in second

save_solution_dict

@staticmethod
def save_solution_dict(solution_dict: Dict[int, Dict[str, List[str or Tuple[str, str]]]], save_file_path: str = "solution_dict")

Save the solution to file with specified filename.

Arguments:

automatically. - solution_dict: The dict containing the data to save. - save_file_path: The path for the file in which the data should be saved. The .yaml ending is added

run

def run(file_path: str, save_file_path: str, inner_separator: str = None, outer_separator: str = None, query_format: QueryFormat = QueryFormat.CROSS_PRODUCT) -> Dict[int, Dict[str, List[str or Tuple[str, str]]]]

Method for the whole parsing process.

Arguments:

documentation for details.
this is not used.
  • file_path: The file to read in which the sql-statements are saved.
  • save_file_path: The path where to save the results.
  • inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> See
  • outer_separator: The block separator used in the file. -> See documentation for details.
  • query_format: The indicator for the format of the .sql query-file. If the given file is not .sql than

Returns:

results matching ""

    No results matching ""