Insights¶
In BayBE, insights provide a way of analyzing your experimental results beyond what is
required for the basic measure-recommend loop. Dependencies needed for insights are
optional and available by installing baybe
with the respective dependency group, e.g.
via pip install baybe[insights]
.
Examples On This Page
In what follows, we show results for the campaign studied in the
full lookup example, which aims at
maximizing the yield of a chemical reaction and involves three substance parameters and
two discrete numerical parameters. We randomly sample 100 measurements from the
lookup table and add them to the campaign, providing a basis for creating an insight.
Parameter Importance via SHAP¶
SHapley Additive exPlanations
are a popular way of interpreting models to gain insight into the importance of the
features utilized. In the context of Bayesian optimization (BO), this enables analyzing
the importance of the parameters spanning the search space. This can be useful
for identifying which parameters play a key role and which do not – learnings that can
be applied in designing future campaigns. The interface is provided by the
SHAPInsight
class.
Model Interpretation in BO
While feature importance is a method well studied, it is usually applied in data regimes where models are fed with plenty of data. However, in BO, we often operate in the low-to-no-data regime, making feature importance interpretation potentially tricky. We urge the users to consider this and be careful with their interpretations. For instance, we suggest a study where the available amount of data is sub-sampled to check the obtained parameter importances for convergence and consistency.
Basic Usage¶
A SHAPInsight
can be obtained in several ways:
From a
Campaign
viafrom_campaign
:insight = SHAPInsight.from_campaign(campaign)
From a surrogate model via
from_surrogate
:insight = SHAPInsight.from_surrogate(surrogate, data)
From a recommender that has an underlying surrogate model and implements
get_surrogate
viafrom_recommender
:insight = SHAPInsight.from_recommender(recommender, searchspace, objective, data)
In these examples, data
is the background data used to build the underlying explainer
model. Typically, you would set this to the measurements obtained during your
experimental campaign (for instance, from_campaign
automatically extracts the measurements
from the campaign
object).
Plots¶
After creating the insight, various methods are available to visualize the results via the .plot interface, please refer to available SHAP plots.
insight.plot("bar")
This result agrees well with the chemical intuition that ligands are the most important reactants to activate the conversion, resulting in higher yields.
Such plots can also be created for data sets other than the background data that was used to generate the insight. If this is desired, pass your data frame as second argument:
insight.plot("beeswarm", new_measurements)
The force
plot type requires the user to additionally select which single data point
they want to visualize by specifying the corresponding explanation_index
:
insight.plot(
"force", explanation_index=3
) # plots the force analysis of the measurement at positional index 3
Explainers¶
In general, SHAP is an exhaustive method testing all combinations of features. This
exhaustive algorithm (implemented by the shap.ExactExplainer
class) is
often not feasible in practice, and various approximate variants are available (see
supported explainers). For details about their inner
mechanics, we refer to the SHAP documentation.
The explainer can be changed when creating the insight:
insight = SHAPInsight.from_campaign(
campaign, explainer_cls="KernelExplainer"
) # default explainer
Experimental and Computational Representations¶
SHAPInsight
by default analyzes the experimental
representation of the measurements, i.e. the that specifies parameter and target values
in terms of their actual (physical) quantities. This comes with certain limitations:
Experimental Representation Limits
If the experimental representation contains parameters with non-numeric values (such
as CategoricalParameter
,
SubstanceParameter
or CustomDiscreteParameter
),
the only supported explainer is the KernelExplainer
.
Attempts to use other explainers will result in an
IncompatibleExplainerError
.
A feature importance study can still be performed by looking at the computational
representation of the data points, activated by the use_comp_rep
flag. Since all
entries in this representation are numeric by construction, there are no limitations on
the explainer type used. A study of the computational representation might also be
useful if a deeper analysis of descriptors used is of interest to the user. In general,
for each non-numerical parameter in the experimental representation, there will be
several descriptors the computational representation:
insight = SHAPInsight.from_campaign(campaign, use_comp_rep=True)
insight.plot("bar")
In addition to SHAP-based explainers, we also support LIME and MAPLE variants. For example:
insight = SHAPInsight.from_campaign(
campaign, explainer_cls="LimeTabular", use_comp_rep=True
)
insight.plot("bar")
As expected, the result from LimeTabular
are very
similar to the results from the SHAP KernelExplainer
because
both methods involve linear local approximations.