# Modeling a Mixture in Slot-Based Representation

## Terminology

Modeling a mixture is possible in a non-traditional way by using a concept we refer to
as a **slot**. A slot is represented through the combination of two parameters: one
indicating the *amount* of a mixture ingredient, and another indicating the *type* of
the ingredient (as a label) populating the slot. Unlike in [traditional mixture
modeling](/examples/Mixtures/traditional.md), the total number of parameters is not
determined by how many ingredient choices we have, but by the maximum number of slots
we allow. For instance, if we want to design a mixture with *up to three* ingredients,
we can do so by creating three slots represented by six parameters.

A corresponding search space could look like this:
| Slot1_Label | Slot1_Amount | Slot2_Label | Slot2_Amount | Slot3_Label | Slot3_Amount |
|:------------|:-------------|:------------|:-------------|:------------|:-------------|
| Solvent1    | 10           | Solvent5    | 20           | Solvent4    | 70           |
| Solvent1    | 30           | Solvent8    | 40           | Solvent2    | 30           |
| Solvent3    | 20           | Solvent1    | 35           | Solvent9    | 45           |
| Solvent2    | 15           | Solvent3    | 40           | Solvent1    | 45           |

The slot-based representation has one decided advantage over traditional
modeling: We can use BayBE's label encodings for the label parameters. For
instance, when mixing small molecules, the
[`SubstanceParameter`](baybe.parameters.substance.SubstanceParameter) can be used to
smartly encode the slot labels, enabling the algorithm to perform a chemically-aware
mixture optimization.

In this example, we show how to design such a search space, including the various
discrete constraints we need to impose. We consider a situation where we want to mix
up to three solvents, whose respective amounts must add up to 100.

```{admonition} Discrete vs. Continuous Modeling
:class: important

Here, we only use discrete parameters, although in principle the parameters
corresponding to amounts could also be modeled as continuous numbers. However, this
would imply that some of the constraints would have to act on both discrete and
continuous parameters, which is not currently supported.
```

## Imports


```python
import math
```


```python
import numpy as np
import pandas as pd
```


```python
from baybe.constraints import (
    DiscreteDependenciesConstraint,
    DiscreteNoLabelDuplicatesConstraint,
    DiscretePermutationInvarianceConstraint,
    DiscreteSumConstraint,
    ThresholdCondition,
)
from baybe.parameters import NumericalDiscreteParameter, SubstanceParameter
from baybe.searchspace import SubspaceDiscrete
from baybe.utils.dataframe import pretty_print_df
```

Basic example settings:


```python
SUM_TOLERANCE = 0.1  # tolerance allowed to fulfill the sum constraints
RESOLUTION = 5  # resolution for discretizing the slot amounts
```

## Parameter Setup

First, we create the parameters for the slot labels. Each slot offers a choice of
four solvents:


```python
dict_solvents = {
    "water": "O",
    "ethanol": "CCO",
    "methanol": "CO",
    "acetone": "CC(=O)C",
}
slot1_label = SubstanceParameter(
    name="Slot1_Label", data=dict_solvents, encoding="MORDRED"
)
slot2_label = SubstanceParameter(
    name="Slot2_Label", data=dict_solvents, encoding="MORDRED"
)
slot3_label = SubstanceParameter(
    name="Slot3_Label", data=dict_solvents, encoding="MORDRED"
)
```

Next, we create the parameters representing the slot amounts:


```python
slot1_amount = NumericalDiscreteParameter(
    name="Slot1_Amount", values=np.linspace(0, 100, RESOLUTION), tolerance=0.2
)
slot2_amount = NumericalDiscreteParameter(
    name="Slot2_Amount", values=np.linspace(0, 100, RESOLUTION), tolerance=0.2
)
slot3_amount = NumericalDiscreteParameter(
    name="Slot3_Amount", values=np.linspace(0, 100, RESOLUTION), tolerance=0.2
)
```

We collect all parameters in a single list:


```python
parameters = [
    slot1_label,
    slot2_label,
    slot3_label,
    slot1_amount,
    slot2_amount,
    slot3_amount,
]
```

## Constraint Setup

For the sake of demonstration, we consider a scenario where we do *not* care about the
order of addition of components to the mixture, which imposes two additional
constraints: one for removing duplicates and one for imposing permutation invariance.

```{admonition} Order of Addition
:class: note
Whether you need to impose the constraints for removing duplicates and imposing
permutation invariance depends on your use case. If the order of addition is relevant
to your mixture, the permutation invariance constraint should be discarded and one
could further argue that adding the same substance multiple times should be allowed.
```

### Duplicate Substances

Assuming that the order of addition is irrelevant, there is no difference between
having two slots with the same substance or having only one slot with the combined
amounts. Thus, we want to make sure that there are no such duplicate label entries,
which can be achieved using a
{class}`~baybe.constraints.discrete.DiscreteNoLabelDuplicatesConstraint`:


```python
no_duplicates_constraint = DiscreteNoLabelDuplicatesConstraint(
    parameters=["Slot1_Label", "Slot2_Label", "Slot3_Label"]
)
```

### Permutation Invariance

Next, we need to take care of permutation invariance. If our order of addition does
not matter, the result of interchanging any two slots does not alter the overall
mixture, i.e. the mixture slots are considered permutation-invariant.

A complication with permutation invariance arises from the fact that we do not only
have a label per slot, but also a numerical amount. If this amount is zero, then the
label of the slot becomes meaningless, because adding zero of the corresponding
substance does not change the mixture. In BayBE, we call this a "dependency", i.e.
the slot labels depend on the slot amounts and are only relevant if the amount
satisfies some condition (in this case "amount > 0").

The {class}`~baybe.constraints.discrete.DiscreteDependenciesConstraint` informs the
{class}`~baybe.constraints.discrete.DiscretePermutationInvarianceConstraint` about
these dependencies so that they are correctly included in the filtering process:


```python
perm_inv_constraint = DiscretePermutationInvarianceConstraint(
    parameters=["Slot1_Label", "Slot2_Label", "Slot3_Label"],
    dependencies=DiscreteDependenciesConstraint(
        parameters=["Slot1_Amount", "Slot2_Amount", "Slot3_Amount"],
        conditions=[
            ThresholdCondition(threshold=0.0, operator=">"),
            ThresholdCondition(threshold=0.0, operator=">"),
            ThresholdCondition(threshold=0.0, operator=">"),
        ],
        affected_parameters=[["Slot1_Label"], ["Slot2_Label"], ["Slot3_Label"]],
    ),
)
```

### Substance Amounts

Interpreting the slot amounts as percentages, we need to ensure that their total is
always 100:


```python
sum_constraint = DiscreteSumConstraint(
    parameters=["Slot1_Amount", "Slot2_Amount", "Slot3_Amount"],
    condition=ThresholdCondition(threshold=100, operator="=", tolerance=SUM_TOLERANCE),
)
```

We store all constraints in a single list:


```python
constraints = [perm_inv_constraint, sum_constraint, no_duplicates_constraint]
```

## Search Space Creation

With all building blocks in place, we can now assemble our discrete space and inspect
its configurations:


```python
space = SubspaceDiscrete.from_product(parameters=parameters, constraints=constraints)
print(
    pretty_print_df(
        space.exp_rep,
        max_rows=len(space.exp_rep),
        max_columns=len(space.exp_rep.columns),
    )
)
```

       Slot1_Label Slot2_Label Slot3_Label  Slot1_Amount  Slot2_Amount  Slot3_Amount
    0      acetone     ethanol    methanol           0.0           0.0         100.0
    1      acetone     ethanol    methanol           0.0          25.0          75.0
    2      acetone     ethanol    methanol           0.0          50.0          50.0
    3      acetone     ethanol    methanol           0.0          75.0          25.0
    4      acetone     ethanol    methanol           0.0         100.0           0.0
    5      acetone     ethanol    methanol          25.0           0.0          75.0
    6      acetone     ethanol    methanol          25.0          25.0          50.0
    7      acetone     ethanol    methanol          25.0          50.0          25.0
    8      acetone     ethanol    methanol          25.0          75.0           0.0
    9      acetone     ethanol    methanol          50.0           0.0          50.0
    10     acetone     ethanol    methanol          50.0          25.0          25.0
    11     acetone     ethanol    methanol          50.0          50.0           0.0
    12     acetone     ethanol    methanol          75.0           0.0          25.0
    13     acetone     ethanol    methanol          75.0          25.0           0.0
    14     acetone     ethanol    methanol         100.0           0.0           0.0
    15     acetone     ethanol       water           0.0           0.0         100.0
    16     acetone     ethanol       water           0.0          25.0          75.0
    17     acetone     ethanol       water           0.0          50.0          50.0
    18     acetone     ethanol       water           0.0          75.0          25.0
    19     acetone     ethanol       water          25.0           0.0          75.0
    20     acetone     ethanol       water          25.0          25.0          50.0
    21     acetone     ethanol       water          25.0          50.0          25.0
    22     acetone     ethanol       water          50.0           0.0          50.0
    23     acetone     ethanol       water          50.0          25.0          25.0
    24     acetone     ethanol       water          75.0           0.0          25.0
    25     acetone    methanol       water           0.0          25.0          75.0
    26     acetone    methanol       water           0.0          50.0          50.0
    27     acetone    methanol       water           0.0          75.0          25.0
    28     acetone    methanol       water          25.0          25.0          50.0
    29     acetone    methanol       water          25.0          50.0          25.0
    30     acetone    methanol       water          50.0          25.0          25.0
    31     ethanol    methanol       water          25.0          25.0          50.0
    32     ethanol    methanol       water          25.0          50.0          25.0
    33     ethanol    methanol       water          50.0          25.0          25.0


````{admonition} Simplex Construction
:class: tip
In this example, we use the
{meth}`~baybe.searchspace.discrete.SubspaceDiscrete.from_product` constructor in order
to demonstrate the explicit creation of all involved constraints. However, for
creating mixture representations, the
{meth}`~baybe.searchspace.discrete.SubspaceDiscrete.from_simplex` constructor should
generally be used. It takes care of the overall sum constraint already during search
space creation, providing a more efficient path to the same result.

The alternative in our case would look like:
```python
space = SubspaceDiscrete.from_simplex(
    max_sum=100.0,
    boundary_only=True,
    simplex_parameters=[slot1_amount, slot2_amount, slot3_amount],
    product_parameters=[slot1_label, slot2_label, slot3_label],
    constraints=[perm_inv_constraint, no_duplicates_constraint],
)
```
Note that {meth}`~baybe.searchspace.discrete.SubspaceDiscrete.from_simplex`
inherently ensures the sum constraint, hence we do not pass it to `constraints`.
````

## Verification of Constraints

Let us programmatically assert that all constraints are satisfied:


```python
amounts = space.exp_rep[["Slot1_Amount", "Slot2_Amount", "Slot3_Amount"]]
labels = space.exp_rep[["Slot1_Label", "Slot2_Label", "Slot3_Label"]]
slots = space.exp_rep.apply(
    lambda row: pd.Series(
        [(row[f"Slot{k}_Label"], row[f"Slot{k}_Amount"]) for k in range(1, 4)]
    ),
    axis=1,
)
```

* All amounts sum to 100:


```python
n_wrong_sum = amounts.sum(axis=1).apply(lambda x: x - 100).abs().gt(SUM_TOLERANCE).sum()
assert n_wrong_sum == 0
print("Number of configurations whose amounts do not sum to 100: ", n_wrong_sum)
```

    Number of configurations whose amounts do not sum to 100:  0


* There are no duplicate slot labels:


```python
n_duplicates = labels.nunique(axis=1).ne(3).sum()
assert n_duplicates == 0
print("Number of configurations with duplicate slot labels: ", n_duplicates)
```

    Number of configurations with duplicate slot labels:  0


* There are no permutation-invariant configurations:


```python
n_permute = slots.apply(frozenset, axis=1).duplicated().sum()
assert n_permute == 0
print("Number of permuted configurations: ", n_permute)
```

    Number of permuted configurations:  0


## Verification of Span

Finally, we also assert if we have completely spanned the space of allowed
configurations by comparing the numbers of unique `K`-solvent entries against their
theoretical values.

```{admonition} Theoretical Span
:class: info

The number of possible `K`-solvent entries can be found by imagining the corresponding
[traditional mixture representation](/examples/Mixtures/traditional.md) and solving a
slightly more complex version of the ["stars and bars"
problem](https://en.wikipedia.org/wiki/Stars_and_bars_(combinatorics)), where the
number of non-empty bins is fixed. That is, we need to ask how many possible ways
exist to distribute `N` items (= number of elemental steps for the amounts, in our
case `RESOLUTION-1`) across `M` bins (= number of available solvents) if exactly
`K` bins are non-empty (= number of solvents allowed in the mixture).

There are `(M choose K)` ways to select the non-empty buckets. When distributing the
`N` items, one item needs to go to each of the `K` buckets for it to be non-empty.
The remaining `N - K` items can be freely distributed among the `K` buckets. The
number of configurations for the latter is given by the "stars and bars" formula,
which states that `X` indistinguishable items can be placed in `Y` distinguishable
bins in `((X + Y -1) choose (Y - 1))` ways. Setting `X`=`N-K` and `Y`=`K` gives
`((N - 1) choose (K - 1))`. Combined with the former count, we get the formula
implemented in the helper function below.
```

Helper function to compute the theoretical numbers:


```python
def n_combinations(N: int, M: int, K: int) -> int:
    """Get number of ways to put `N` items into `M` bins yielding `K` non-empty bins."""
    return math.comb(M, K) * math.comb(N - 1, K - 1)
```

Verify that the space is fully spanned:


```python
for K in range(1, 4):
    n_combinations_expected = n_combinations(RESOLUTION - 1, len(dict_solvents), K)
    n_combinations_actual = (amounts != 0).sum(axis=1).eq(K).sum()
    assert n_combinations_expected == n_combinations_actual
    print(
        f"Number of unique {K}-solvent entries: "
        f"{n_combinations_actual} ({n_combinations_expected} expected)"
    )
```

    Number of unique 1-solvent entries: 4 (4 expected)
    Number of unique 2-solvent entries: 18 (18 expected)
    Number of unique 3-solvent entries: 12 (12 expected)