baybe.utils.dataframe.filter_df

baybe.utils.dataframe.filter_df(df: DataFrame, /, to_keep: DataFrame, complement: bool = False)[source]

Filter a dataframe based on a second dataframe defining filtering conditions.

Filtering is done via a join (see complement argument for details) between the input dataframe and the filter dataframe.

Parameters:
  • df (DataFrame) – The dataframe to be filtered.

  • to_keep (DataFrame) – The dataframe defining the filtering conditions. By default (see complement argument), it defines the rows to be kept in the sense of an inner join.

  • complement (bool) – If False, the filter dataframe determines the rows to be kept (i.e. selection via inner join). If True, the filtering mechanism is inverted so that the complement set of rows is kept (i.e. selection via anti-join).

Return type:

DataFrame

Returns:

A new dataframe containing the result of the filtering process.

Examples

>>> df = pd.DataFrame(
...         [[0, "a"], [0, "b"], [1, "a"], [1, "b"]],
...         columns=["num", "cat"]
... )
>>> df
   num cat
0    0   a
1    0   b
2    1   a
3    1   b
>>> filter_df(df, pd.DataFrame([0], columns=["num"]), complement=False)
   num cat
0    0   a
1    0   b
>>> filter_df(df, pd.DataFrame([0], columns=["num"]), complement=True)
   num cat
2    1   a
3    1   b
>>> filter_df(df, pd.DataFrame(), complement=True)
   num cat
0    0   a
1    0   b
2    1   a
3    1   b
>>> filter_df(df, pd.DataFrame(), complement=False)
Empty DataFrame
Columns: [num, cat]
Index: []