Advanced Hyperparameters Set Up

For multi-component systems, the configurational space can be highly complicated. One may want to use different hyper-parameters and cutoffs for different interactions, or do constraint optimisation for hyper-parameters.

To use more hyper-parameters, we need special kernel function that can differentiate different pairs, triplets and other descriptors and determine which number to use for what interaction.

This kernel can be enabled by using the hyps_mask argument of the GaussianProcess class. It contains multiple arrays to describe how to break down the array of hyper-parameters and apply them when computing the kernel. Detail descriptions of this argument can be seen in kernel/mc_sephyps.py.

The ParameterHelper class is to generate the hyps_mask with a more human readable interface.

Example:

>>> pm = ParameterHelper(species=['C', 'H', 'O'],
...                      kernels={'twobody':[['*', '*'], ['O','O']],
...                      'threebody':[['*', '*', '*'],
...                          ['O','O', 'O']]},
...                      parameters={'twobody0':[1, 0.5, 1], 'twobody1':[2, 0.2, 2],
...                            'threebody0':[1, 0.5], 'threebody1':[2, 0.2],
...                            'cutoff_threebody':1},
...                      constraints={'twobody0':[False, True]})
>>> hm = pm.as_dict()
>>> kernels = hm['kernels']
>>> gp_model = GaussianProcess(kernels=kernels,
...                            hyps=hyps, hyps_mask=hm)

In this example, four atomic species are involved. There are many kinds of twobodys and threebodys. But we only want to use eight different signal variance and length-scales.

In order to do so, we first define all the twobodys to be group “twobody0”, by listing “-” as the first element in the twobody argument. The second element O-O is then defined to be group “twobody1”. Note that the order matters here. The later element overrides the ealier one. If twobodys=[[‘O’, ‘O’], [‘*’, ‘*’]], then all twobodys belong to group “twobody1”.

Similarly, O-O-O is defined as threebody1, while all remaining ones are left as threebody0.

The hyperpameters for each group is listed in the order of [sig, ls, cutoff] in the parameters argument. So in this example, O-O interaction will use [2, 0.2, 2] as its sigma, length scale, and cutoff.

For threebody, the parameter arrays only come with two elements. So there is no cutoff associated with threebody0 or threebody1; instead, a universal cutoff is used, which is defined as ‘cutoff_threebody’.

The constraints argument define which hyper-parameters will be optimized. True for optimized and false for being fixed.

Here are a couple more simple examples.

Define a 5-parameter 2+3 kernel (1, 0.5, 1, 0.5, 0.05)

>>> pm = ParameterHelper(kernels=['twobody', 'threebody'],
...                     parameters={'sigma': 1,
...                                 'lengthscale': 0.5,
...                                 'cutoff_twobody': 2,
...                                 'cutoff_threebody': 1,
...                                 'noise': 0.05})

Define a 5-parameter 2+3 kernel (1, 1, 1, 1, 0.05)

>>> pm = ParameterHelper(kernels=['twobody', 'threebody'],
...                      parameters={'cutoff_twobody': 2,
...                                  'cutoff_threebody': 1,
...                                  'noise': 0.05},
...                      ones=ones,
...                      random=not ones)

Define a 9-parameter 2+3 kernel

>>> pm = ParameterHelper()
>>> pm.define_group('specie', 'O', ['O'])
>>> pm.define_group('specie', 'rest', ['C', 'H'])
>>> pm.define_group('twobody', '**', ['*', '*'])
>>> pm.define_group('twobody', 'OO', ['O', 'O'])
>>> pm.define_group('threebody', '***', ['*', '*', '*'])
>>> pm.define_group('threebody', 'Oall', ['O', 'O', 'O'])
>>> pm.set_parameters('**', [1, 0.5])
>>> pm.set_parameters('OO', [1, 0.5])
>>> pm.set_parameters('Oall', [1, 0.5])
>>> pm.set_parameters('***', [1, 0.5])
>>> pm.set_parameters('cutoff_twobody', 5)
>>> pm.set_parameters('cutoff_threebody', 4)

See more examples in functions ParameterHelper.define_group , ParameterHelper.set_parameters, and in the tests tests/test_parameters.py

If you want to add in a new hyperparameter set to an already-existing GP, you can perform the following steps:

>> hyps_mask = pm.as_dict() >> hyps = hyps_mask[‘hyps’] >> kernels = hyps_mask[‘kernels’] >> gp_model.update_kernel(kernels, ‘mc’, hyps_mask) >> gp_model.hyps = hyps

class flare.utils.parameter_helper.ParameterHelper(hyps_mask=None, species=None, kernels={}, cutoff_groups={}, parameters=None, constraints={}, allseparate=False, random=False, ones=False, verbose='WARNING')

A helper class to construct the hyps_mask dictionary for AtomicEnvironment , GaussianProcess and MappedGaussianProcess

Parameters:
  • hyps_mask (dict) – Not implemented yet
  • species (dict, list) – Define specie groups
  • kernels (dict, list) – Define kernels and groups for the kernels
  • cutoff_groups (dict) – Define different cutoffs for different species
  • parameters (dict) – Define signal variance, length scales, and cutoffs
  • constraints (dict) – If listed as False, the cooresponding hyperparmeters will not be trained
  • allseparate (bool) – If True, define each type pair/triplet into a separate group.
  • random (bool) – If True, randomized all signal variances and lengthscales
  • one (bool) – If True, set all signal variances and lengthscales to one
  • verbose (str) – Level to print with “ERROR”, “WARNING”, “INFO”, “DEBUG”
  • the species is an optional input. It can be left as None if the user only wants to set up one group of hyper-parameters for each kernel.

  • the kernels can be defined along with or without groups. But the later mode is not compatible with the allseparate flag.

    >>> kernels=['twobody', 'threebody'],
    

    or

    >>> kernels={'twobody':[['*', '*'], ['O','O']],
    ...          'threebody':[['*', '*', '*'],
    ...                       ['O','O', 'O']]},
    

    Current options for the kernels are twobody, threebody and manybody (based on coordination number).

  • See format of species, kernels (dict), and cutoff_groups in list_groups() function.

  • See format of parameters and constraints in list_parameters() function.

all_separate_groups(group_type)

Separate all possible types of twobodys, threebodys, manybody. One type per group.

Parameters:group_type (str) – “specie”, “twobody”, “threebody”, “cut3b”, “manybody”
as_dict()

Dictionary representation of the mask. The output can be used for AtomicEnvironment or the GaussianProcess

define_group(group_type, name, element_list, parameters=None, atomic_str=False)

Define specie/twobody/threebody/3b cutoff/manybody group

Parameters:
  • group_type (str) – “specie”, “twobody”, “threebody”, “cut3b”, “manybody”
  • name (str) – the name use for indexing. can be anything but “*”
  • element_list (list) – list of elements
  • parameters (list) – corresponding parameters for this group
  • atomic_str (bool) – whether the elements in element_list are specified by group names or periodic table element names.

The function is helped to define different groups for specie/twobody/threebody /3b cutoff/manybody terms. This function can be used for many times. The later one always overrides the former one.

The name of the group has to be unique string (but not “*”), that define a group of species or twobodys, etc. If the same name is used, in two function calls, the definitions of the group will be merged. Both calls will be effective.

element_list has to be a list of atomic elements, or a list of specie group names (which should be defined in previous calls), or “*”. “*” will loop the function over all previously defined species. It has to be two elements for twobody/3b cutoff/manybody term, or three elements for threebody. For specie group definition, it can be as many elements as you want.

If multiple define_group calls have conflict with element, the later one has higher priority. For example, twobody 1-2 are defined as group1 in the first call, and as group2 in the second call. In the end, the twobody will be left as group2.

Example 1:

>>> define_group('specie', 'water', ['H', 'O'])
>>> define_group('specie', 'salt', ['Cl', 'Na'])

They define H and O to be group water, and Na and Cl to be group salt.

Example 2.1:

>>> define_group('twobody', 'in-water', ['H', 'H'], atomic_str=True)
>>> define_group('twobody', 'in-water', ['H', 'O'], atomic_str=True)
>>> define_group('twobody', 'in-water', ['O', 'O'], atomic_str=True)

Example 2.2:

>>> define_group('twobody', 'in-water', ['water', 'water'])

The 2.1 is equivalent to 2.2.

Example 3.1:

>>> define_group('specie', '1', ['H'])
>>> define_group('specie', '2', ['O'])
>>> define_group('twobody', 'Hgroup', ['H', 'H'], atomic_str=True)
>>> define_group('twobody', 'Hgroup', ['H', 'O'], atomic_str=True)
>>> define_group('twobody', 'OO', ['O', 'O'], atomic_str=True)

Example 3.2:

>>> define_group('specie', '1', ['H'])
>>> define_group('specie', '2', ['O'])
>>> define_group('twobody', 'Hgroup', ['H', '*'], atomic_str=True)
>>> define_group('twobody', 'OO', ['O', 'O'], atomic_str=True)

Example 3.3:

>>> list_groups('specie', ['H', 'O'])
>>> define_group('twobody', 'Hgroup', ['H', '*'])
>>> define_group('twobody', 'OO', ['O', 'O'])

Example 3.4:

>>> list_groups('specie', ['H', 'O'])
>>> define_group('twobody', 'OO', ['*', '*'])
>>> define_group('twobody', 'Hgroup', ['H', '*'])

3.1 to 3.4 are all equivalent.

fill_in_parameters(group_type, random=False, ones=False, universal=False)

Separate all possible types of twobodys, threebodys, manybody. One type per group. And fill in either universal ls and sigma from pre-defined parameters from set_parameters(“sigma”, ..) and set_parameters(“ls”, ..) or random parameters if random is True.

Parameters:
  • group_type (str) – “specie”, “twobody”, “threebody”, “cut3b”, “manybody”
  • definition_list (list, dict) – list of elements
find_group(group_type, element_list, atomic_str=False)

find the group that contains the input pair

Parameters:
  • group_type (str) – species, twobody, threebody, cut3b, manybody
  • element_list (list) – list of elements for a pair/triplet/coordination-pair
  • atomic_str (bool) – whether the elements in element_list are specified by group names or periodic table element names.
Returns:

Return type:

name (str)

static from_dict(hyps_mask, verbose=False, init_spec=[])

convert dictionary mask to HM instance This function is not tested yet

list_groups(group_type, definition_list)

define groups in batches.

Parameters:
  • group_type (str) – “specie”, “twobody”, “threebody”, “cut3b”, “manybody”
  • definition_list (list, dict) – list of elements

This function runs define_group in batch. Please first read the manual of define_group.

If the definition_list is a list, it is equivalent to executing define_group through the definition_list.

>>> for all terms in the list:
>>>     define_group(group_type, group_type+'n', the nth term in the list)

So the first twobody defined will be group twobody0, second one will be group twobody1. For specie, it will define all the listed elements as groups with only one element with their original name.

If the definition_list is a dictionary, it is equivalent to

>>> for k, v in the dict:
>>>     define_group(group_type, k, v)

It is not recommended to use the dictionary mode, especially when the group definitions are conflicting with each other. There is no guarantee that the priority order is the same as you want.

Unlike ParameterHelper.define_group(), it can only be called once for each group_type, and not after any ParameterHelper.define_group() calls.

list_parameters(parameter_dict: dict, constraints: dict = {})

Define many groups of parameters

Parameters:
  • parameter_dict (dict) – dictionary of all parameters
  • constraints (dict) – dictionary of all constraints

Example:

>>> parameter_dict={"group_name":[sig, ls, cutoffs], ...}
>>> constraints={"group_name":[True, False, False], ...}

The name of parameters can be the group name previously defined in define_group or list_groups function. Aside from the group name, noise, cutoff_twobody, cutoff_threebody, and cutoff_manybody are reserved for noise parmater and universal cutoffs, while sigma and lengthscale are reserved for universal signal variances and length scales.

For non-reserved keys, the value should be a list of 2 to 3 elements, corresponding to the sigma, lengthscale and cutoff (if the third one is defined). For reserved keys, the value should be a float number.

The parameter_dict and constraints should use the same set of keys. If a key in constraints is not used in parameter_dict, it will be ignored.

The value in the constraints can be either a single bool, which apply to all parameters, or list of bools that apply to each parameter.

set_constraints(name, opt)

Set the parameters for certain group

Parameters:
  • name (str) – name of the patermeters
  • opt (bool, list) – whether to optimize the parameter or not

The name of parameters can be the group name previously defined in define_group or list_groups function. Aside from the group name, noise, cutoff_twobody, cutoff_threebody, and cutoff_manybody are reserved for noise parmater and universal cutoffs, while sigma and lengthscale are reserved for universal signal variances and length scales.

The optimization flag can be a single bool, which apply to all parameters under that name, or list of bools that apply to each parameter.

set_parameters(name, parameters, opt=True)

Set the parameters for certain group

Parameters:
  • name (str) – name of the patermeters
  • parameters (list) – the sigma, lengthscale, and cutoff of each group.
  • opt (bool, list) – whether to optimize the parameter or not

The name of parameters can be the group name previously defined in define_group or list_groups function. Aside from the group name, noise, cutoff_twobody, cutoff_threebody, and cutoff_manybody are reserved for noise parmater and universal cutoffs, while sigma and lengthscale are reserved for universal signal variances and length scales.

The parameter should be a list of 2-3 elements, for sigma, lengthscale (and cutoff if the third one is defined).

The optimization flag can be a single bool, which apply to all parameters, or list of bools that apply to each parameter.

summarize_group(group_type)

Sort and combine all the previous definition to internal varialbes

Parameters:group_type (str) – species, twobody, threebody, cut3b, manybody