Example for using a mixture use case in a discrete searchspace

Example for imposing sum constraints for discrete parameters. The constraints simulate a situation where we want to mix up to three solvents. However, their respective fractions need to sum up to 100. Also, the solvents should never be chosen twice, which requires various other constraints.

This example assumes some basic familiarity with using BayBE. We thus refer to campaign for a basic example.

Necessary imports for this example

import math
import os
import numpy as np
from baybe import Campaign
from baybe.constraints import (
    DiscreteDependenciesConstraint,
    DiscreteNoLabelDuplicatesConstraint,
    DiscretePermutationInvarianceConstraint,
    DiscreteSumConstraint,
    ThresholdCondition,
)
from baybe.objective import Objective
from baybe.parameters import NumericalDiscreteParameter, SubstanceParameter
from baybe.searchspace import SearchSpace
from baybe.targets import NumericalTarget
from baybe.utils.dataframe import add_fake_results

Experiment setup

This parameter denotes the tolerance with regard to the calculation of the sum.

SUM_TOLERANCE = 1.0
SMOKE_TEST = "SMOKE_TEST" in os.environ
# This parameter denotes the resolution of the discretization of the parameters
RESOLUTION = 5 if SMOKE_TEST else 12
dict_solvents = {
    "water": "O",
    "C1": "C",
    "C2": "CC",
    "C3": "CCC",
}
solvent1 = SubstanceParameter(name="Solv1", data=dict_solvents, encoding="MORDRED")
solvent2 = SubstanceParameter(name="Solv2", data=dict_solvents, encoding="MORDRED")
solvent3 = SubstanceParameter(name="Solv3", data=dict_solvents, encoding="MORDRED")

Parameters for representing the fraction.

fraction1 = NumericalDiscreteParameter(
    name="Frac1", values=list(np.linspace(0, 100, RESOLUTION)), tolerance=0.2
)
fraction2 = NumericalDiscreteParameter(
    name="Frac2", values=list(np.linspace(0, 100, RESOLUTION)), tolerance=0.2
)
fraction3 = NumericalDiscreteParameter(
    name="Frac3", values=list(np.linspace(0, 100, RESOLUTION)), tolerance=0.2
)
parameters = [solvent1, solvent2, solvent3, fraction1, fraction2, fraction3]

Creating the constraint

Since the constraints are required for the creation of the searchspace, we create them next. Note that we need a PermutationInvarianceConstraint here. The reason is that constraints are normally applied in a specific order. However, the fractions should be invariant under permutations. We thus require an explicit constraint for this.

perm_inv_constraint = DiscretePermutationInvarianceConstraint(
    parameters=["Solv1", "Solv2", "Solv3"],
    dependencies=DiscreteDependenciesConstraint(
        parameters=["Frac1", "Frac2", "Frac3"],
        conditions=[
            ThresholdCondition(threshold=0.0, operator=">"),
            ThresholdCondition(threshold=0.0, operator=">"),
            ThresholdCondition(threshold=0.0, operator=">"),
        ],
        affected_parameters=[["Solv1"], ["Solv2"], ["Solv3"]],
    ),
)

This is now the actual sum constraint

sum_constraint = DiscreteSumConstraint(
    parameters=["Frac1", "Frac2", "Frac3"],
    condition=ThresholdCondition(threshold=100, operator="=", tolerance=SUM_TOLERANCE),
)

The permutation invariance might create duplicate labels. We thus include a constraint to remove them.

no_duplicates_constraint = DiscreteNoLabelDuplicatesConstraint(
    parameters=["Solv1", "Solv2", "Solv3"]
)
constraints = [perm_inv_constraint, sum_constraint, no_duplicates_constraint]

Creating the searchspace and the objective

searchspace = SearchSpace.from_product(parameters=parameters, constraints=constraints)
________________________________________________________________________________
[Memory] Calling baybe.utils.chemistry._smiles_to_mordred_features...
_smiles_to_mordred_features('C')
_______________________________________smiles_to_mordred_features - 0.2s, 0.0min
________________________________________________________________________________
[Memory] Calling baybe.utils.chemistry._smiles_to_mordred_features...
_smiles_to_mordred_features('CC')
_______________________________________smiles_to_mordred_features - 0.0s, 0.0min
objective = Objective(
    mode="SINGLE", targets=[NumericalTarget(name="Target_1", mode="MAX")]
)

Creating and printing the campaign

campaign = Campaign(searchspace=searchspace, objective=objective)
print(campaign)
Campaign
         
 Meta Data
 Batches Done: 0

Fits Done: 0

 Search Space
          
  Search Space Type: DISCRETE
  
  Discrete Search Space
               
   Discrete Parameters
       Name                        Type  Num_Values                   Encoding
   0  Solv1          SubstanceParameter           4  SubstanceEncoding.MORDRED
   1  Solv2          SubstanceParameter           4  SubstanceEncoding.MORDRED
   2  Solv3          SubstanceParameter           4  SubstanceEncoding.MORDRED
   3  Frac1  NumericalDiscreteParameter           5                       None
   4  Frac2  NumericalDiscreteParameter           5                       None
   5  Frac3  NumericalDiscreteParameter           5                       None
               
   Experimental Representation
               
   Solv1 Solv2  ... Frac2  Frac3
   0   water    C1  ...   0.0  100.0
   1   water    C1  ...  25.0   75.0
   2   water    C1  ...  50.0   50.0
   ..    ...   ...  ...   ...    ...
   31     C1    C2  ...  25.0   50.0
   32     C1    C2  ...  50.0   25.0
   33     C1    C2  ...  25.0   25.0
   
   [34 rows x 6 columns]
   
   Metadata:

was_recommended: 0/34

was_measured: 0/34

dont_recommend: 0/34

   Constraints
                                         Type    Affected_Parameters
   0  DiscretePermutationInvarianceConstraint  [Solv1, Solv2, Solv3]
   1                    DiscreteSumConstraint  [Frac1, Frac2, Frac3]
   2      DiscreteNoLabelDuplicatesConstraint  [Solv1, Solv2, Solv3]
               
   Computational Representation
               
   Solv1_MORDRED_nHetero  Solv1_MORDRED_AATSC1v  ...  Frac2  Frac3
   0                     1.0             -18.543836  ...    0.0  100.0
   1                     1.0             -18.543836  ...   25.0   75.0
   2                     1.0             -18.543836  ...   50.0   50.0
   ..                    ...                    ...  ...    ...    ...
   31                    0.0             -36.020386  ...   25.0   50.0
   32                    0.0             -36.020386  ...   50.0   25.0
   33                    0.0             -36.020386  ...   25.0   25.0
   
   [34 rows x 12 columns]
 
 Objective
          
  Mode: SINGLE
          
  Targets 
                Type      Name Mode  Lower_Bound  Upper_Bound Transformation  \
  0  NumericalTarget  Target_1  MAX         -inf          inf           None   
  
     Weight  
  0   100.0  
          
  Combine Function: GEOM_MEAN
 
 TwoPhaseMetaRecommender(allow_repeated_recommendations=None,

allow_recommending_already_measured=None, initial_recommender=RandomRecommender(allow_repeated_recommendations=False, allow_recommending_already_measured=True), recommender=SequentialGreedyRecommender(allow_repeated_recommendations=False, allow_recommending_already_measured=True, surrogate_model=GaussianProcessSurrogate(model_params={}, _model=None), acquisition_function_cls=’qEI’, _acquisition_function=None, hybrid_sampler=’None’, sampling_percentage=1.0), switch_after=1)

Manual verification of the constraint

The following loop performs some recommendations and manually verifies the given constraints.

N_ITERATIONS = 2 if SMOKE_TEST else 3
for kIter in range(N_ITERATIONS):
    print(f"\n#### ITERATION {kIter+1} ####")

    print("## ASSERTS ##")
    print(
        "No. of searchspace entries where fractions do not sum to 100.0:      ",
        campaign.searchspace.discrete.exp_rep[["Frac1", "Frac2", "Frac3"]]
        .sum(axis=1)
        .apply(lambda x: x - 100.0)
        .abs()
        .gt(SUM_TOLERANCE)
        .sum(),
    )
    print(
        "No. of searchspace entries that have duplicate solvent labels:       ",
        campaign.searchspace.discrete.exp_rep[["Solv1", "Solv2", "Solv3"]]
        .nunique(axis=1)
        .ne(3)
        .sum(),
    )
    print(
        "No. of searchspace entries with permutation-invariant combinations:  ",
        campaign.searchspace.discrete.exp_rep[["Solv1", "Solv2", "Solv3"]]
        .apply(frozenset, axis=1)
        .to_frame()
        .join(campaign.searchspace.discrete.exp_rep[["Frac1", "Frac2", "Frac3"]])
        .duplicated()
        .sum(),
    )
    # The following asserts only work if the tolerance for the threshold condition in
    # the constraint are not 0. Otherwise, the sum/prod constraints will remove more
    # points than intended due to numeric rounding
    print(
        f"No. of unique 1-solvent entries (exp. {math.comb(len(dict_solvents), 1)*1})",
        (campaign.searchspace.discrete.exp_rep[["Frac1", "Frac2", "Frac3"]] == 0.0)
        .sum(axis=1)
        .eq(2)
        .sum(),
    )
    print(
        f"No. of unique 2-solvent entries (exp."
        f" {math.comb(len(dict_solvents), 2)*(RESOLUTION-2)})",
        (campaign.searchspace.discrete.exp_rep[["Frac1", "Frac2", "Frac3"]] == 0.0)
        .sum(axis=1)
        .eq(1)
        .sum(),
    )
    print(
        f"No. of unique 3-solvent entries (exp."
        f" {math.comb(len(dict_solvents), 3)*((RESOLUTION-3)*(RESOLUTION-2))//2})",
        (campaign.searchspace.discrete.exp_rep[["Frac1", "Frac2", "Frac3"]] == 0.0)
        .sum(axis=1)
        .eq(0)
        .sum(),
    )

    rec = campaign.recommend(batch_size=5)
    add_fake_results(rec, campaign)
    campaign.add_measurements(rec)
#### ITERATION 1 ####
## ASSERTS ##
No. of searchspace entries where fractions do not sum to 100.0:       0
No. of searchspace entries that have duplicate solvent labels:        0
No. of searchspace entries with permutation-invariant combinations:   0
No. of unique 1-solvent entries (exp. 4) 4
No. of unique 2-solvent entries (exp. 18) 18
No. of unique 3-solvent entries (exp. 12) 12

#### ITERATION 2 ####
## ASSERTS ##
No. of searchspace entries where fractions do not sum to 100.0:       0
No. of searchspace entries that have duplicate solvent labels:        0
No. of searchspace entries with permutation-invariant combinations:   0
No. of unique 1-solvent entries (exp. 4) 4
No. of unique 2-solvent entries (exp. 18) 18
No. of unique 3-solvent entries (exp. 12) 12