baybe.utils.sampling_algorithms.farthest_point_sampling

baybe.utils.sampling_algorithms.farthest_point_sampling(points: ndarray, n_samples: int = 1, initialization: Literal['farthest', 'random'] | Collection[int] = 'farthest', random_tie_break: bool = True)[source]

Select a subset of points using farthest point sampling.

Creates a subset of a given collection of points by successively adding points with the largest Euclidean distance to intermediate point selections encountered during the algorithmic process. The mechanism used for the initial point selection is configurable.

Parameters:
  • points (ndarray) – The points that are available for selection, represented as a 2-D array of shape (n, k), where n is the number of points and k is the dimensionality of the points.

  • n_samples (int) – The total number of points to be selected.

  • initialization (Union[Literal['farthest', 'random'], Collection[int]]) –

    Determines how the first points are selected:

    • "farthest": The first two selected points are those with the largest distance. If only a single point is requested, a deterministic choice is made based on the point coordinates.

    • "random": The first point is selected uniformly at random.

    • Indices: Points corresponding to these indices will be pre-selected.

  • random_tie_break (bool) – Determines if points are chosen deterministically or randomly in equidistant situations. If True, a random point is selected from the candidates, otherwise the first point is selected. For non-equidistant points, the point with the largest minimum distance is always selected.

Return type:

list[int]

Returns:

A list containing the positional indices of the selected points.

Raises:
  • ValueError – If the number of requested samples is less than 1.

  • ValueError – If the provided array is not two-dimensional.

  • ValueError – If the array contains no points.

  • ValueError – If the input space has no dimensions.

  • ValueError – Indices for initialization are not unique.

  • ValueError – More initialization indices than available points are provided.

  • ValueError – Initialization indices are out of bounds.

  • ValueError – Unknown initialization method.

  • ValueError – More points are requested than available.