baybe.utils.dataframe.df_uncorrelated_features

baybe.utils.dataframe.df_uncorrelated_features(df: DataFrame, exclude_list: list[str] | None = None, threshold: float = 0.7)[source]

Return an uncorrelated set of features.

Adapted from edbo (https://github.com/b-shields/edbo , https://doi.org/10.1038/s41586-021-03213-y).

Parameters:
  • df (DataFrame) – The dataframe to be cleaned

  • exclude_list (Optional[list[str]]) – If provided this defines the columns that should be ignored

  • threshold (float) – Threshold for column-column correlation above which columns should be dropped

Returns:

A new dataframe