baybe.utils.dataframe.fuzzy_row_match¶
- baybe.utils.dataframe.fuzzy_row_match(left_df: pd.DataFrame, right_df: pd.DataFrame, parameters: Sequence[Parameter], numerical_measurements_must_be_within_tolerance: bool)[source]¶
Match row of the right dataframe to the rows of the left dataframe.
This is useful for validity checks and to automatically match measurements to entries in the search space, e.g. to detect which ones have been measured. For categorical parameters, there needs to be an exact match with any of the allowed values. For numerical parameters, the user can decide via a flag whether values outside the tolerance should be accepted.
- Parameters:
left_df (pd.DataFrame) – The data that serves as lookup reference.
right_df (pd.DataFrame) – The data that should be checked for matching rows in the left dataframe.
parameters (Sequence[Parameter]) – List of baybe parameter objects that are needed to identify potential tolerances.
numerical_measurements_must_be_within_tolerance (bool) – If
True
, numerical parameters are matched with the search space elements only if there is a match within the parameter tolerance. IfFalse
, the closest match is considered, irrespective of the distance.
- Return type:
pd.Index
- Returns:
The index of the matching rows in
left_df
.- Raises:
ValueError – If some rows are present in the right but not in the left dataframe.
ValueError – If the input data has invalid values.