Insights¶

In BayBE, insights provide a way of analyzing your experimental results beyond what is required for the basic measure-recommend loop. Dependencies needed for insights are optional and available by installing baybe with the respective dependency group, e.g. via pip install baybe[insights].

Examples On This Page

In what follows, we show results for the campaign studied in the full lookup example, which aims at maximizing the yield of a chemical reaction and involves three substance parameters and
two discrete numerical parameters. We randomly sample 100 measurements from the lookup table and add them to the campaign, providing a basis for creating an insight.

Parameter Importance via SHAP¶

SHapley Additive exPlanations are a popular way of interpreting models to gain insight into the importance of the features utilized. In the context of Bayesian optimization (BO), this enables analyzing the importance of the parameters spanning the search space. This can be useful for identifying which parameters play a key role and which do not – learnings that can be applied in designing future campaigns. The interface is provided by the SHAPInsight class.

Model Interpretation in BO

While feature importance is a method well studied, it is usually applied in data regimes where models are fed with plenty of data. However, in BO, we often operate in the low-to-no-data regime, making feature importance interpretation potentially tricky. We urge the users to consider this and be careful with their interpretations. For instance, we suggest a study where the available amount of data is sub-sampled to check the obtained parameter importances for convergence and consistency.

Basic Usage¶

A SHAPInsight can be obtained in several ways:

From a Campaign via from_campaign:

insight = SHAPInsight.from_campaign(campaign)

From a surrogate model via from_surrogate:

insight = SHAPInsight.from_surrogate(surrogate, data)

From a recommender that has an underlying surrogate model and implements get_surrogate via from_recommender:
```
insight = SHAPInsight.from_recommender(recommender, searchspace, objective, data)
```

In these examples, data is the background data used to build the underlying explainer model. Typically, you would set this to the measurements obtained during your experimental campaign (for instance, from_campaign automatically extracts the measurements from the campaign object).

Plots¶

After creating the insight, various methods are available to visualize the results via the .plot interface, please refer to available SHAP plots.

insight.plot("bar")

SHAP_Bar_Exp_Rep

This result agrees well with the chemical intuition that ligands are the most important reactants to activate the conversion, resulting in higher yields.

Such plots can also be created for data sets other than the background data that was used to generate the insight. If this is desired, pass your data frame as second argument:

insight.plot("beeswarm", new_measurements)

SHAP_Beeswarm_Exp_Rep

The force plot type requires the user to additionally select which single data point they want to visualize by specifying the corresponding explanation_index:

insight.plot(
    "force", explanation_index=3
)  # plots the force analysis of the measurement at positional index 3

SHAP_Force

Explainers¶

In general, SHAP is an exhaustive method testing all combinations of features. This exhaustive algorithm (implemented by the shap.ExactExplainer class) is often not feasible in practice, and various approximate variants are available (see supported explainers). For details about their inner mechanics, we refer to the SHAP documentation.

The explainer can be changed when creating the insight:

insight = SHAPInsight.from_campaign(
    campaign, explainer_cls="KernelExplainer"
)  # default explainer

Experimental and Computational Representations¶

SHAPInsight by default analyzes the experimental representation of the measurements, i.e. the that specifies parameter and target values in terms of their actual (physical) quantities. This comes with certain limitations:

Experimental Representation Limits

If the experimental representation contains parameters with non-numeric values (such as CategoricalParameter, SubstanceParameter or CustomDiscreteParameter), the only supported explainer is the KernelExplainer. Attempts to use other explainers will result in an IncompatibleExplainerError.

A feature importance study can still be performed by looking at the computational representation of the data points, activated by the use_comp_rep flag. Since all entries in this representation are numeric by construction, there are no limitations on the explainer type used. A study of the computational representation might also be useful if a deeper analysis of descriptors used is of interest to the user. In general, for each non-numerical parameter in the experimental representation, there will be several descriptors the computational representation:

insight = SHAPInsight.from_campaign(campaign, use_comp_rep=True)
insight.plot("bar")

SHAP_Bar_Comp_Rep

In addition to SHAP-based explainers, we also support LIME and MAPLE variants. For example:

insight = SHAPInsight.from_campaign(
    campaign, explainer_cls="LimeTabular", use_comp_rep=True
)
insight.plot("bar")

SHAP_Bar_Lime

As expected, the result from LimeTabular are very similar to the results from the SHAP KernelExplainer because both methods involve linear local approximations.

Multi-Output Surrogate Models¶

In case you are using a surrogate model with multiple outputs, e.g. a CompositeSurrogate for a ParetoObjective, there are multiple models to be analyzed.

SHAPInsight calculates a separate explanation for each model. When requesting a plot, you must select for which target the assessment should be displayed:

insight.plot("bar", target_index=2)  # plots the bar plot for the third target