estimator.estimator

Estimator Objects

class Estimator()

Class containing the neural network for cardinality estimation. The specifications of the neural network can be changed in 'config.yaml'.

__init__

def __init__(config: Dict[str, Any] = None, config_file_path: str = "config.yaml", data: np.ndarray = None, model: Model = None, model_path: str = None, debug: bool = False)

Initializer for the Estimator.

Configuration options for the neural network are optionally passed via a config dict. It must contain at least the fields "loss_function", "dropout", "learning_rate", "kernel_initializer", "activation_strategy" and "layer".

Arguments:

if given: It must contain at least the fields "loss_function", "dropout", "learning_rate",
"kernel_initializer", "activation_strategy" and "layer".
if not given: the config file 'config.yaml' is used for these settings.
at least "x" and "y" and optionally "postgres_estimate" as keys. The values have to be numpy.ndarray. For
key "x" it should be the vectorized queries, for key "y" the true cardinalities in the same order and for
optional key "postgres_estimate" the estimates of the postgres optimizer for the query.
  • config: Only used if neither a model or a model_path is passed.
  • config_file_path: path for the config-file -> only necessary if no config is given
  • data: Optional parameter for giving the data for training and testing. If given it has to be a Dict with
  • model: Option to pass a Model which can be used.
  • model_path: Option to pass a path to a saved model in an .h5 file.
  • debug: Boolean whether to print additional information while processing.

get_model

def get_model(len_input: int, override: bool = False) -> Model

Function for creating the model of the neural network with the information from self.config

Arguments:

  • len_input: The size of the input vector.
  • override: Whether an existing model should be overridden.

Returns:

The model for the neural network with the given properties.

load_model

def load_model(model_path: str)

Method for loading an already existing model wich was saved to file.

Arguments:

  • model_path: Path to the file containing the model to load

denormalize

@staticmethod
def denormalize(y, y_min: float, y_max: float)

Arguments:

  • y: tensor filled with values to denormalize
  • y_min: minimum value for y
  • y_max: maximum value for y

Returns:

tensor with denormalized values

denormalize_np

@staticmethod
def denormalize_np(y: np.ndarray, y_min: float, y_max: float) -> np.ndarray

Arguments:

  • y: numpy-array filled with values to denormalize
  • y_min: minimum value for y
  • y_max: maximum value for y

Returns:

numpy-array with denormalized values

load_data_file

def load_data_file(file_path: str, override: bool = False) -> Dict[str, np.ndarray]

Method for loading the data from file.

Arguments:

  • file_path: Path for the file where the data is stored. Has to be a .csv or .npy file.
  • override: Boolean whether to override already existing data.

Returns:

The data which is set for the Estimator.

set_data

def set_data(loaded_data: np.ndarray, override: bool = False)

Method for setting data and dependent values like max_card and input_length.

Arguments:

  • loaded_data: The data loaded from the file.
  • override: Boolean whether to override already existing data.

split_data

def split_data(split: float = 0.9)

Function to split the data into training- and test-set by a parameterized split value.

Arguments:

  • split: Percentage of the data going into training set. (split=0.9 means 90% of data is training set)

train

def train(epochs: int = 100, verbose: int = 1, shuffle: bool = True, batch_size: int = 32, validation_split: float = 0.1) -> Union[History, History]

Method for training the before created Model.

Arguments:

epoch.
(possibly) but enlarge training time, while bigger batches may lead to a less well trained network while
training faster.
training data, not the test data, and are reselected for every epoch.
  • epochs: Number of epochs for training.
  • verbose: How much information to print while training. 0 = silent, 1 = progress bar, 2 = one line per
  • shuffle: Whether to shuffle the training data -> not necessary if split was done by numpy.random.choice()
  • batch_size: Size for the batches -> Smaller batches may be able to train the neural network better
  • validation_split: How much of the data should be taken as validation set -> these are taken from the

Returns:

Training history as dict.

test

def test() -> np.ndarray

Let the trained neural network predict the test data.

Returns:

numpy-array containing the normalized predictions of the neural network for the test data

predict

def predict(data: np.ndarray) -> np.ndarray

Let the trained neural network predict the given data.

Arguments:

  • data: numpy-array containing at least one vectorized query which should be predicted

Returns:

numpy-array containing the normalized predictions of the neural network for the given data

run

def run(data_file_path: str = None, epochs: int = 100, verbose: int = 1, shuffle: bool = True, batch_size: int = 32, validation_split: float = 0.1, override_model: bool = False, save_model: bool = True, save_model_file_path: str = "model") -> np.ndarray

Method for a full run of the Estimator, with training and testing.

Arguments:

epoch.
(possibly) but enlarge training time, while bigger batches may lead to a less well trained network while
training faster.
training data, not the test data, and are reselected for every epoch.
should be saved.
  • data_file_path: Optional path to saved data file. Only necessary if no data has been set before.
  • epochs: Number of epochs for training.
  • verbose: How much information to print while training. 0 = silent, 1 = progress bar, 2 = one line per
  • shuffle: Whether to shuffle the training data -> not necessary if split was done by numpy.random.choice()
  • batch_size: Size for the batches -> Smaller batches may be able to train the neural network better
  • validation_split: How much of the data should be taken as validation set -> these are taken from the
  • override_model: Whether to override a probably already existing model.
  • save_model: Whether to save the trained model to file.
  • save_model_file_path: When save_model==True this parameter is required to give the path where the model

Returns:

A numpy.ndarray containing the calculated q-error.

save_model

def save_model(filename: str = "model")

Method for saving the Model to file.

Arguments:

filename)
  • filename: Name of the file where the model should be stored. (Without file ending. ".h5" is added to the

results matching ""

    No results matching ""