# Surrogates

Surrogate models are used to model and estimate the unknown objective function of the
DoE campaign. BayBE offers a diverse array of surrogate models, while also allowing for
the utilization of custom models. All surrogate models are based upon the general
[`Surrogate`](baybe.surrogates.base.Surrogate) class. Some models even support transfer
learning, as indicated by the `supports_transfer_learning` attribute.

## Available Models

BayBE provides a comprehensive selection of surrogate models, empowering you to choose
the most suitable option for your specific needs. The following surrogate models are
available within BayBE:

* [`GaussianProcessSurrogate`](baybe.surrogates.gaussian_process.core.GaussianProcessSurrogate)
* [`BayesianLinearSurrogate`](baybe.surrogates.linear.BayesianLinearSurrogate)
* [`MeanPredictionSurrogate`](baybe.surrogates.naive.MeanPredictionSurrogate)
* [`NGBoostSurrogate`](baybe.surrogates.ngboost.NGBoostSurrogate)
* [`RandomForestSurrogate`](baybe.surrogates.random_forest.RandomForestSurrogate)

(multi_output_modeling)= 
## Multi-Output Modeling
Depending on the use case at hand, it may be necessary to model multiple output
variables simultaneously. However, not all surrogate types natively provide (joint)
predictive distributions for more than one variable, as indicated by their
{attr}`~baybe.surrogates.base.Surrogate.supports_multi_output` attribute. 

In multi-output contexts, it may therefore be necessary to assemble several
single-output surrogates into a composite model to build a joint predictive model from
independent components for each output. BayBE provides two convenient mechanisms to
achieve this, both built upon the
{class}`~baybe.surrogates.composite.CompositeSurrogate` class:

### Surrogate Replication
The simplest way to construct a multi-output surrogate is to replicate a given
single-output model architecture for each of the existing output dimensions.

To replicate a given surrogate, you can either call its 
{meth}`~baybe.surrogates.base.Surrogate.replicate` method or use the
[`CompositeSurrogate.from_replication()`](baybe.surrogates.composite.CompositeSurrogate.from_replication)
convenience constructor:
```python
from baybe.surrogates import CompositeSurrogate, GaussianProcessSurrogate

composite_a = GaussianProcessSurrogate().replicate()
composite_b = CompositeSurrogate.from_replication(GaussianProcessSurrogate())

assert composite_a == composite_b
```

However, there are very few cases where such an explicit conversion is required. Because
using a single-output surrogate model in a multi-output context would trivially fail, and
because BayBE cares deeply about its users' lives, it automatically performs this conversion
for you behind the scenes:

(auto_replication)=
```{admonition} Auto-Replication
:class: important

When using a single-output surrogate model in a multi-output context, BayBE
automatically replicates the surrogate on the fly.
```
The consequence of the above is that you can use the same model object regardless
of the modeling context and its multi-output capabilities.

There is *one* notable exception where an explicit replication may still make
sense: if you want to bypass the existing multi-output mechanics of a surrogate that is
inherently multi-output compatible.

### Composite Surrogates
An alternative to surrogate replication is to manually assemble your
{class}`~baybe.surrogates.composite.CompositeSurrogate`. This can be useful if you want
to
* use the same model architecture but with different settings for each output or
* use different architectures for the outputs to begin with.

```python
from baybe.surrogates import (
    CompositeSurrogate,
    GaussianProcessSurrogate,
    RandomForestSurrogate,
)

surrogate = CompositeSurrogate(
    {
        "target_a": GaussianProcessSurrogate(),
        "target_b": RandomForestSurrogate(),
    }
)
```

A noticeable difference to the replication approach is that manual assembly requires
the exact set of target variables to be known at the time the object is created.


## Extracting the Model for Advanced Study

In principle, the surrogate model does not need to be a persistent object during
Bayesian optimization since each iteration performs a new fit anyway. However, for
advanced study, such as investigating the posterior predictions, acquisition functions
or feature importance, it can be useful to directly extract the current surrogate model.

For this, BayBE provides the ``get_surrogate`` method, which is available for the
[``Campaign``](baybe.campaign.Campaign.get_surrogate) or for 
[recommenders](baybe.recommenders.pure.bayesian.base.BayesianRecommender.get_surrogate).
Below an example of how to utilize this in conjunction with the popular SHAP package:

~~~python
# Assuming we already have a campaign created and measurements added
data = campaign.measurements[[p.name for p in campaign.parameters]]
model = lambda x: campaign.get_surrogate().posterior(x).mean

# Apply SHAP
explainer = shap.Explainer(model, data)
shap_values = explainer(data)
shap.plots.bar(shap_values)
~~~

```{admonition} Current Scalarization Limitations
:class: note
Currently, ``get_surrogate`` always returns the surrogate model with respect to the
transformed target(s) / objective. This means that if you are using a
``SingleTargetObjective`` with a transformed target or a ``DesirabilityObjective``, the
model's output will correspond to the transformed quantities and not the original
untransformed target(s). If you are using the model for subsequent analysis this should
be kept in mind.
```

## Using Custom Models

BayBE goes one step further by allowing you to incorporate custom models based on the
ONNX architecture. Note however that these cannot be retrained. For a detailed
explanation on using custom models, refer to the comprehensive examples provided in the
corresponding [example folder](./../../examples/Custom_Surrogates/Custom_Surrogates).