Modeling a Mixture in Traditional Representation

When modeling mixtures, we are often faced with a large set of ingredients to choose from. A common way to formalize this type of selection problem is to assign each ingredient its own numerical parameter representing the amount of the ingredient in the mixture. A sum constraint imposed on all parameters then ensures that the total amount of ingredients in the mix is always 100%. In addition, there could be other constraints, for instance, to impose further restrictions on individual subgroups of ingredients. In BayBE’s language, we call this the traditional mixture representation.

In this example, we demonstrate how to create a search space in this representation, using a simple mixture of up to six components, which are divided into three subgroups: solvents, bases and phase agents.

Slot-based Representation

For an alternative way to describe mixtures, see our slot-based representation.

Imports

import numpy as np
import pandas as pd
from baybe.constraints import ContinuousLinearConstraint
from baybe.parameters import NumericalContinuousParameter
from baybe.recommenders import RandomRecommender
from baybe.searchspace import SearchSpace

Parameter Setup

We start by creating lists containing our substance labels according to their subgroups:

g1 = ["Solvent1", "Solvent2"]
g2 = ["Base1", "Base2"]
g3 = ["PhaseAgent1", "PhaseAgent2"]

Next, we create continuous parameters describing the substance amounts for each group. Here, the maximum amount for each substance depends on its group, i.e. we allow adding more of a solvent compared to a base or a phase agent:

p_g1_amounts = [
    NumericalContinuousParameter(name=f"{name}", bounds=(0, 80)) for name in g1
]
p_g2_amounts = [
    NumericalContinuousParameter(name=f"{name}", bounds=(0, 20)) for name in g2
]
p_g3_amounts = [
    NumericalContinuousParameter(name=f"{name}", bounds=(0, 5)) for name in g3
]

Constraints Setup

Now, we set up our constraints. We start with the overall mixture constraint, ensuring the total of all ingredients is 100%:

c_total_sum = ContinuousLinearConstraint(
    parameters=g1 + g2 + g3,
    operator="=",
    coefficients=(1,) * len(g1 + g2 + g3),
    rhs=100,
)

Additionally, we require bases make up at least 10% of the mixture:

c_g2_min = ContinuousLinearConstraint(
    parameters=g2,
    operator=">=",
    coefficients=(1,) * len(g2),
    rhs=10,
)

By contrast, phase agents should make up no more than 5%:

c_g3_max = ContinuousLinearConstraint(
    parameters=g3,
    operator="<=",
    coefficients=(1,) * len(g3),
    rhs=5,
)

Search Space Creation

Having both parameter and constraint definitions at hand, we can create our search space:

searchspace = SearchSpace.from_product(
    parameters=[*p_g1_amounts, *p_g2_amounts, *p_g3_amounts],
    constraints=[c_total_sum, c_g2_min, c_g3_max],
)

Verification of Constraints

To verify that the constraints imposed above are fulfilled, let us draw some random points from the search space:

recommendations = RandomRecommender().recommend(batch_size=10, searchspace=searchspace)
print(recommendations)
       Base1      Base2  PhaseAgent1  PhaseAgent2   Solvent1   Solvent2
0   5.962248  15.091470     0.617988     2.548859  65.135942  10.643492
1  10.731245  17.527701     2.706310     0.860732  15.074918  53.099093
2  13.538111   4.028894     1.280748     1.446167  73.517748   6.188333
3  17.246565   9.477427     4.151038     0.176811  53.451651  15.496508
4  18.763944   9.595472     0.082182     0.412835  19.062188  52.083380
5   8.380212   5.018878     0.987167     2.417629  23.917630  59.278485
6   3.031890   7.806274     1.553746     0.458131  49.503789  37.646169
7  12.432642  15.208097     1.889968     2.736214  26.619129  41.113950
8  14.350522  12.110974     1.564600     3.153687  15.195388  53.624829
9  11.268059  14.890784     1.077251     1.289359  41.041423  30.433124

Computing the respective row sums reveals the expected result:

stats = pd.DataFrame(
    {
        "Total": recommendations.sum(axis=1),
        "Total_Bases": recommendations[g2].sum(axis=1),
        "Total_Phase_Agents": recommendations[g3].sum(axis=1),
    }
)
print(stats)
   Total  Total_Bases  Total_Phase_Agents
0  100.0    21.053719            3.166847
1  100.0    28.258947            3.567043
2  100.0    17.567005            2.726915
3  100.0    26.723992            4.327850
4  100.0    28.359416            0.495017
5  100.0    13.399090            3.404796
6  100.0    10.838165            2.011877
7  100.0    27.640739            4.626182
8  100.0    26.461496            4.718287
9  100.0    26.158842            2.366610
assert np.allclose(stats["Total"], 100)
assert (stats["Total_Bases"] >= 10).all()
assert (stats["Total_Phase_Agents"] <= 5).all()