Gaussian Process Force Fields

class flare.gp.GaussianProcess(kernels: List[str] = None, component: str = 'mc', hyps: ndarray = None, cutoffs: dict = None, hyps_mask: dict = None, hyp_labels: List[T] = None, opt_algorithm: str = 'L-BFGS-B', maxiter: int = 10, parallel: bool = False, per_atom_par: bool = True, n_cpus: int = 1, n_sample: int = 100, output: flare.output.Output = None, name='default_gp', energy_noise: float = 0.01, **kwargs)

Gaussian process force field. Implementation is based on Algorithm 2.1 (pg. 19) of “Gaussian Processes for Machine Learning” by Rasmussen and Williams.

Methods within GaussianProcess allow you to make predictions on AtomicEnvironment objects (see env.py) generated from FLARE Structures (see struc.py), and after data points are added, optimize hyperparameters based on available training data (train method).

Parameters:
  • kernels (list, optional) – Determine the type of kernels. Example: [‘twbody’, ‘threebody’], [‘2’, ‘3’, ‘mb’], [‘2’]. Defaults to [ ‘twboody’, ‘threebody’]
  • component (str, optional) – Determine single- (“sc”) or multi- component (“mc”) kernel to use. Defaults to “mc”
  • hyps (np.ndarray, optional) – Hyperparameters of the GP.
  • cutoffs (Dict, optional) – Cutoffs of the GP kernel. For simple hyper- parameter setups, formatted like {“twobody”:7, “threebody”:4.5}, etc.
  • hyp_labels (List, optional) – List of hyperparameter labels. Defaults to None.
  • opt_algorithm (str, optional) – Hyperparameter optimization algorithm. Defaults to ‘L-BFGS-B’.
  • maxiter (int, optional) – Maximum number of iterations of the hyperparameter optimization algorithm. Defaults to 10.
  • parallel (bool, optional) – If True, the covariance matrix K of the GP is computed in parallel. Defaults to False.
  • n_cpus (int, optional) – Number of cpus used for parallel calculations. Defaults to 1 (serial)
  • n_sample (int, optional) – Size of submatrix to use when parallelizing predictions.
  • output (Output, optional) – Output object used to dump hyperparameters during optimization. Defaults to None.
  • hyps_mask (dict, optional) – hyps_mask can set up which hyper parameter is used for what interaction. Details see kernels/mc_sephyps.py
  • name (str, optional) – Name for the GP instance which dictates global memory access.
add_one_env(env: flare.env.AtomicEnvironment, force: Optional[<sphinx.ext.autodoc.importer._MockObject object at 0x7f53c4af2cd0>] = None, train: bool = False, **kwargs)

Add a single local environment to the training set of the GP.

Parameters:
  • env (AtomicEnvironment) – Local environment to be added to the training set of the GP.
  • force (np.ndarray) – Force on the central atom of the local environment in the form of a 3-component Numpy array containing the x, y, and z components.
  • train (bool) – If True, the GP is trained after the local environment is added.
adjust_cutoffs(new_cutoffs: Union[list, tuple, np.ndarray] = None, reset_L_alpha=True, train=True, new_hyps_mask=None)

Loop through atomic environment objects stored in the training data, and re-compute cutoffs for each. Useful if you want to gauge the impact of cutoffs given a certain training set! Unless you know exactly what you are doing for some development or test purpose, it is highly suggested that you call set_L_alpha and re-optimize your hyperparameters afterwards as is default here.

A helpful way to update the cutoffs and kernel for an extant GP is to perform the following commands: >> hyps_mask = pm.as_dict() >> hyps = hyps_mask[‘hyps’] >> cutoffs = hyps_mask[‘cutoffs’] >> kernels = hyps_mask[‘kernels’] >> gp_model.update_kernel(kernels, ‘mc’, hyps, cutoffs, hyps_mask)

Parameters:
  • reset_L_alpha
  • train
  • new_hyps_mask
  • new_cutoffs
Returns:

as_dict()

Dictionary representation of the GP model.

static backward_arguments(kwargs, new_args={})

update the initialize arguments that were renamed

static backward_attributes(dictionary)

add new attributes to old instance or update attribute types

check_L_alpha()

Check that the alpha vector is up to date with the training set. If not, update_L_alpha is called.

check_instantiation()

Runs a series of checks to ensure that the user has not supplied contradictory arguments which will result in undefined behavior with multiple hyperparameters. :return:

compute_matrices()

When covariance matrix is known, reconstruct other matrices. Used in re-loading large GPs. :return:

static from_dict(dictionary)

Create GP object from dictionary representation.

static from_file(filename: str, format: str = '')

One-line convenience method to load a GP from a file stored using write_file

Parameters:
  • filename (str) – path to GP model
  • format (str) – json or pickle if format is not in filename
Returns:

par

Backwards compability attribute :return:

predict(x_t: flare.env.AtomicEnvironment, d: int) → [<class 'float'>, <class 'float'>]

Predict a force component of the central atom of a local environment.

Parameters:
  • x_t (AtomicEnvironment) – Input local environment.
  • d (int) – Force component to be predicted (1 is x, 2 is y, and 3 is z).
Returns:

Mean and epistemic variance of the prediction.

Return type:

(float, float)

predict_efs(x_t: flare.env.AtomicEnvironment)

Predict the local energy, forces, and partial stresses of an atomic environment and their predictive variances.

predict_force_xyz(x_t: flare.env.AtomicEnvironment) -> ('np.ndarray', 'np.ndarray')

Simple wrapper to predict all three components of a force in one go. :param x_t: :return:

predict_local_energy(x_t: flare.env.AtomicEnvironment) → float

Predict the local energy of a local environment.

Parameters:x_t (AtomicEnvironment) – Input local environment.
Returns:Local energy predicted by the GP.
Return type:float
predict_local_energy_and_var(x_t: flare.env.AtomicEnvironment)

Predict the local energy of a local environment and its uncertainty.

Parameters:x_t (AtomicEnvironment) – Input local environment.
Returns:Mean and predictive variance predicted by the GP.
Return type:(float, float)
remove_force_data(indexes: Union[int, List[int]], update_matrices: bool = True) → Tuple[List[flare.struc.Structure], List[ndarray]]

Remove force components from the model. Convenience function which deletes individual data points.

Matrices should always be updated if you intend to use the GP to make predictions afterwards. This might be time consuming for large GPs, so, it is provided as an option, but, only do so with extreme caution. (Undefined behavior may result if you try to make predictions and/or add to the training set afterwards).

Returns training data which was removed akin to a pop method, in order of lowest to highest index passed in.

Parameters:
  • indexes – Indexes of envs in training data to remove.
  • update_matrices – If false, will not update the GP’s matrices afterwards (which can be time consuming for large models). This should essentially always be true except for niche development applications.
Returns:

set_L_alpha()

Invert the covariance matrix, setting L (a lower triangular matrix s.t. L L^T = (K + sig_n^2 I)) and alpha, the inverse covariance matrix multiplied by the vector of training labels. The forces and variances are later obtained using alpha.

train(logger_name: str = None, custom_bounds=None, grad_tol: float = 0.0001, x_tol: float = 1e-05, line_steps: int = 20, print_progress: bool = False)

Train Gaussian Process model on training data. Tunes the hyperparameters to maximize the likelihood, then computes L and alpha (related to the covariance matrix of the training set).

Parameters:
  • logger (logging.logger) – logger object specifying where to write the progress of the optimization.
  • custom_bounds (np.ndarray) – Custom bounds on the hyperparameters.
  • grad_tol (float) – Tolerance of the hyperparameter gradient that determines when hyperparameter optimization is terminated.
  • x_tol (float) – Tolerance on the x values used to decide when Nelder-Mead hyperparameter optimization is terminated.
  • line_steps (int) – Maximum number of line steps for L-BFGS hyperparameter optimization. :param logger_name: :param print_progress:
training_statistics

Return a dictionary with statistics about the current training data. Useful for quickly summarizing info about the GP. :return:

update_L_alpha()

Update the GP’s L matrix and alpha vector without recalculating the entire covariance matrix K.

update_db(struc: flare.struc.Structure, forces: ndarray = None, custom_range: List[int] = (), energy: float = None, stress: ndarray = None)

Given a structure and forces, add local environments from the structure to the training set of the GP. If energy is given, add the entire structure to the training set.

Parameters:
  • struc (Structure) – Input structure. Local environments of atoms in this structure will be added to the training set of the GP.
  • forces (np.ndarray) – Forces on atoms in the structure.
  • custom_range (List[int]) – Indices of atoms whose local environments will be added to the training set of the GP.
  • energy (float) – Energy of the structure.
  • stress (np.ndarray) – Stress tensor of the structure. The stress tensor components should be given in the following order: xx, xy, xz, yy, yz, zz.
write_model(name: str, format: str = None, split_matrix_size_cutoff: int = 5000)

Write model in a variety of formats to a file for later re-use. JSON files are open to visual inspection and are easier to use across different versions of FLARE or GP implementations. However, they are larger and loading them in takes longer (by setting up a new GP from the specifications). Pickled files can be faster to read & write, and they take up less memory.

Parameters:
  • name (str) – Output name.
  • format (str) – Output format.
  • split_matrix_size_cutoff (int) – If there are more than this
  • number of training points in the set, save the matrices seperately.