baybe.utils.dataframe.filter_df¶
- baybe.utils.dataframe.filter_df(df: DataFrame, /, to_keep: DataFrame, complement: bool = False)[source]¶
Filter a dataframe based on a second dataframe defining filtering conditions.
Filtering is done via a join (see
complement
argument for details) between the input dataframe and the filter dataframe.- Parameters:
df (
DataFrame
) – The dataframe to be filtered.to_keep (
DataFrame
) – The dataframe defining the filtering conditions. By default (seecomplement
argument), it defines the rows to be kept in the sense of an inner join.complement (
bool
) – IfFalse
, the filter dataframe determines the rows to be kept (i.e. selection via inner join). IfTrue
, the filtering mechanism is inverted so that the complement set of rows is kept (i.e. selection via anti-join).
- Return type:
- Returns:
A new dataframe containing the result of the filtering process.
Examples
>>> df = pd.DataFrame( ... [[0, "a"], [0, "b"], [1, "a"], [1, "b"]], ... columns=["num", "cat"] ... ) >>> df num cat 0 0 a 1 0 b 2 1 a 3 1 b
>>> filter_df(df, pd.DataFrame([0], columns=["num"]), complement=False) num cat 0 0 a 1 0 b
>>> filter_df(df, pd.DataFrame([0], columns=["num"]), complement=True) num cat 2 1 a 3 1 b
>>> filter_df(df, pd.DataFrame(), complement=True) num cat 0 0 a 1 0 b 2 1 a 3 1 b
>>> filter_df(df, pd.DataFrame(), complement=False) Empty DataFrame Columns: [num, cat] Index: []