query_parser.query_parser
QueryFormat Objects
class QueryFormat(Enum)
Enum for the different supported query-formats.
CROSS_PRODUCT: SELECT COUNT(*) FROM movie_companies mc,title t,movie_info_idx mi_idx WHERE t.id=mc.movie_id AND t.id=mi_idx.movie_id AND mi_idx.info_type_id=112 AND mc.company_type_id=2;
JOIN_ON: SELECT COUNT(*) FROM movie_companies mc INNER JOIN title t ON (t.id=mc.movie_id) INNER JOIN movie_info_idx mi_idx ON (t.id=mi_idx.movie_id) WHERE mi_idx.info_type_id=112 AND mc.company_type_id=2;
QueryParser Objects
class QueryParser()
Class for the query_parser. This is responsible of reading a given file (.csv/.tsv or .sql) which contains sql queries (for more details see Readme) parse them and return a file (.yaml) containing the aggregated information of the input file. This aggregated .yaml file is the requirement for the MetaCollector.
operators
The possible operators which can occur in the queries. ["<=", "!=", ">=", "=", "<", ">", "IS"]
read_file
def read_file(file_path: str, inner_separator: str = None, outer_separator: str = None, query_format: QueryFormat = QueryFormat.CROSS_PRODUCT) -> Tuple[Dict, str, str, str]
Generic method for rooting the processing of the input file which contains the queries according to the given file type. Because .sql/.tsv files need to be processed another way than .sql files. The parameters inner_separator and outer_separator allow the user to use customized .csv/.tsv files. The parameter query_format allows the user to choose between the two most common join formats.
Arguments:
.sql. No other file types are supported at the moment. This path could be absolute as well as relative.
documentation for details.
:return A tuple containing a dictionary with the table-string as key and a list of selection attributes as
value, the file-type, the inner_separator and the outer_separator.
- file_path: Path to the file containing the sql statements. This path has to end with .csv/.tsv or
- inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> See
- outer_separator: The block separator used in the file. -> See documentation for details.
- query_format: The format of the sql query. Look at documentation of QueryFormat for details.
read_sql_file
@staticmethod
def read_sql_file(file_path: str, query_format: QueryFormat = QueryFormat.CROSS_PRODUCT) -> Tuple[Dict, str, str, str]
Read and parse the sql statements from given sql file. The most parts of the sql syntax are processed and removed. Parts like 'SELECT COUNT(*)' and 'INNER JOIN' are removed from the query.
Arguments:
types are supported at the moment. This path could be absolute as well as relative.
:return A tuple containing a dictionary with the table-string as key and a list of selection attributes as
value, the file-type, the inner_separator and the outer_separator.
- file_path: Path to the file containing the sql statements. This path has to end with .sql. No other file
- query_format: The format of the sql query. Look at documentation of QueryFormat for details.
read_csv_file
@staticmethod
def read_csv_file(file_path: str, inner_separator: str = ",", outer_separator: str = "#") -> Tuple[Dict, str, str, str]
Read the csv formatted sql statements from given file. For more details on the format, look at the readme.
Arguments:
with .csv or .tsv. No other file types are supported at the moment. This path could be absolute as well as
relative.
documentation for details.
:return A tuple containing a dictionary with the table-string as key and a list of selection attributes as
value, the file-type, the inner_separator and the outer_separator.
- file_path: Path to the file containing the sql statements formatted as csv or .tsv. This path has to end
- inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> See
- outer_separator: The block separator used in the file. -> See documentation for details.
create_solution_dict
def create_solution_dict(command_dict: Dict[str, List[str] or List[Tuple[str, str]]], file_type: str, inner_separator: str) -> Dict[int, Dict[str, List[str or Tuple[str, str]]]]
Method for building the solution dict. Therefore the given file with the queries must be parsed at first and the command_dict must be created.
Arguments:
clauses as string if the file type is sql or a list of tuples containing the join-attribute-string in first
and the selection-attribute-string in second place.
documentation for details.
:return The solution dict containing 'table_names', 'join_attributes' and 'selection_attributes'.
- command_dict: Dict with a alphabetical sorted string of the joining tables as key and a list of where
- file_type: String with 'csv'/'tsv' or 'sql' which tells the file type of the read file.
- inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> See
table_name_unpacker
@staticmethod
def table_name_unpacker(from_string: str, separator: str = ",") -> List[Tuple[str, str]]
Takes the sorted string of the from clause and extracts the tables with their aliases.
Arguments:
from_string: Alphabetical ordered string containing all tables to join, separated by the separator.separator: The column separator used in the file. You can use '\t' for .tsv files.
Returns:
List of tuples where the first element of the tuple is the table name and the second one is the alias.
sql_attribute_unpacker
def sql_attribute_unpacker(where_string_list: List[str]) -> Tuple[List[str], List[str]]
Unpack the attribute strings from sql-file into sets containing the attributes.
Arguments:
selection-attributes.
where_string_list: A list of strings from the where clauses. These have to be separated into join- and
Returns:
A tuple containing the list of join-attributes in first and the list of selection-attributes in second
csv_attribute_unpacker
def csv_attribute_unpacker(attribute_tuples: List[Tuple[str, str]], separator: str = ",") -> Tuple[List[str], List[str]]
Unpack the attribute strings from csv-file into sets containing the attributes.
Arguments:
join-attributes, while the second string contains all selection-attributes.
attribute_tuples: A list of tuples of strings where the first string is the string for allseparator: The column separator used in the file. You can use '\t' for .tsv files.
Returns:
A tuple containing the list of join-attributes in first and the list of selection-attributes in second
save_solution_dict
@staticmethod
def save_solution_dict(solution_dict: Dict[int, Dict[str, List[str or Tuple[str, str]]]], save_file_path: str = "solution_dict")
Save the solution to file with specified filename.
Arguments:
automatically.
- solution_dict: The dict containing the data to save.
- save_file_path: The path for the file in which the data should be saved. The .yaml ending is added
run
def run(file_path: str, save_file_path: str, inner_separator: str = None, outer_separator: str = None, query_format: QueryFormat = QueryFormat.CROSS_PRODUCT) -> Dict[int, Dict[str, List[str or Tuple[str, str]]]]
Method for the whole parsing process.
Arguments:
documentation for details.
this is not used.
file_path: The file to read in which the sql-statements are saved.save_file_path: The path where to save the results.inner_separator: The column separator used in the file. You can use '\t' for .tsv files. -> Seeouter_separator: The block separator used in the file. -> See documentation for details.query_format: The indicator for the format of the .sql query-file. If the given file is not .sql than
Returns: