API Reference

dgp module

class dgpsi.dgp.dgp(X, Y, all_layer=None, check_rep=True, block=True, vecchia=False, m=25, ord_fun=None)

Bases: object

Class that contains the deep GP hierarchy for stochastic imputation inference.

Parameters:

X (ndarray) – a numpy 2d-array where each row is an input data point and each column is an input dimension.
Y (ndarray) – a numpy 2d-arrays containing observed output data. The 2d-array has it rows being output data points and columns being output dimensions (with the number of columns equals to the number of GP nodes in the final layer).
all_layer (list, optional) – a list contains L (the number of layers) sub-lists, each of which contains the GPs defined by the kernel class in that layer. The sub-lists are placed in the list in the same order of the specified DGP model. The final layer of DGP hierarchy can be set to a likelihood layer by putting an object created by a likelihood class (in likelihood_class) into the final sub-list of all_layer. Defaults to None. If a DGP structure is not provided, an input-connected two-layered DGP structure (for deterministic model emulation without a likelihood layer), where the number of GP nodes in the first layer equals to the dimension of X, is automatically constructed.
check_rep (bool, optional) – whether to check the repetitions in the dataset, i.e., if one input position has multiple outputs. Defaults to True.
block (bool, optional) – whether to use the blocked (layer-wise) ESS for the imputations during the training. Defaults to True.
vecchia (bool) – a bool indicating if Vecchia approximation will be used. Defaults to False.
m (int) – an integer that gives the size of the conditioning set for the Vecchia approximation in the training. Defaults to 25.
ord_fun (function, optional) – a function that decides the ordering of the input of the GP nodes in the DGP structure for the Vecchia approximation. If set to None, then the default random ordering is used. Defaults to None.

Remark:: This class is used for DGP structures, in which internal I/O are unobservable. When some internal layers are fully observable, the DGP model reduces to linked (D)GP model. In such a case, use lgp class for inference where one can have separate input/output training data for each (D)GP. See lgp class for implementation details.

Examples

To build a list that represents a three-layer DGP with three GPs in the first two layers and one GP (i.e., only one dimensional output) in the final layer, do:

from kernel_class import kernel, combine
layer1, layer2, layer3=[],[],[]
for _ in range(3):
    layer1.append(kernel(length=np.array([1])))
for _ in range(3):
    layer2.append(kernel(length=np.array([1])))
layer3.append(kernel(length=np.array([1])))
all_layer=combine(layer1,layer2,layer3)

aggregate_r2(burnin=0.75, agg='median')

Compute the aggregated R2 of all GP nodes on a DGP hierarchy.

Parameters:

burnin (float, optional) – a value between 0 and 1 that indicates the percentage of stored R2 values to be discarded for average R2 calculation. If this is not specified, only the last 25% of R2 values are used. Defaults to 0.75.
agg (str, optional) – either ‘median’ or ‘mean’ that is used to aggregate the R2 values after discarding the first burnin percentage of the R2 sequences. Defaults to ‘median’.

Returns:

a list of average R2 values that correspond to the DGP hierarchy.

Return type:

list

change_init_scale()

compute_r2()

estimate(burnin=None)

Compute the point estimates of the DGP model parameters and output the trained DGP.

Parameters:: burnin (int, optional) – the number of SEM iterations to be discarded for point estimate calculation. Must be smaller than the SEM iterations implemented. If this is not specified, only the last 25% of iterations are used. Defaults to None.
Returns:: an updated list that represents the trained DGP hierarchy.
Return type:: list

initialize(): Initialise all_layer attribute for training.

plot(layer_no, ker_no, width=4.0, height=1.0, ticksize=5.0, labelsize=8.0, hspace=0.1)

Plot the traces of model parameters of a particular GP node in the DGP hierarchy.

Parameters:

layer_no (int) – the index of the interested layer.
ker_no (int) – the index of the interested GP in the layer specified by layer_no.
width (float, optional) – the overall plot width. Defaults to 4.
height (float, optional) – the overall plot height. Defaults to 1.
ticksize (float, optional) – the size of sub-plot ticks. Defaults to 5.
labelsize (float, optional) – the font size of y labels. Defaults to 8.
hspace (float, optional) – the space between sub-plots. Defaults to 0.1.

ptrain(N=500, ess_burn=10, disable=False, core_num=None)

Train the DGP model with parallel GP optimizations in each layer.

Parameters:

N (int) – number of iterations for stochastic EM. Defaults to 500.
ess_burn (int, optional) – number of burnin steps for the ESS-within-Gibbs at each I-step of the SEM. Defaults to 10.
disable (bool, optional) – whether to disable the training progress bar. Defaults to False.
core_num (int, optional) – the number of cores/workers to be used. Defaults to None. If not specified, the number of cores is set to (max physical cores available - 1).

reinit_all_layer(reset_lengthscale): Reinitialise all_layer attribute with new input and output. :param reset_lengthscale: whether to reset hyperparameter of the DGP emulator to the initial values. :type reset_lengthscale: bool

remove_vecchia(): Remove the Vecchia mode from the DGP structure.

to_vecchia(m=25, ord_fun=None)

Convert the DGP structure to the Vecchia mode.

Parameters:

m (int) – an integer that gives the size of the conditioning set for the Vecchia approximation in the training. Defaults to 25.
ord_fun (function, optional) – a function that decides the ordering of the input of the GP nodes in the DGP structure for the Vecchia approximation. If set to None, then the default random ordering is used. Defaults to None.

train(N=500, ess_burn=10, disable=False)

Train the DGP model.

Parameters:

N (int) – number of iterations for stochastic EM. Defaults to 500.
ess_burn (int, optional) – number of burnin steps for the ESS-within-Gibbs at each I-step of the SEM. Defaults to 10.
disable (bool, optional) – whether to disable the training progress bar. Defaults to False.

update_all_layer(all_layer): Update the class with a new dgp structure with given hyperparameter and latent layer values.

update_all_layer_larger(sub_idx): Update all_layer attribute with new input and output when the original input is a subset of the new one.

update_all_layer_smaller(sub_idx): Update all_layer attribute with new input and output when the new input is a subset of the original one.

update_xy(X, Y, reset=False)

Update the trained DGP with new input and output data.

Parameters:

X (ndarray) – a numpy 2d-array where each row is an input data point and each column is an input dimension.
Y (ndarray) – a numpy 2d-arrays containing observed output data. The 2d-array has it rows being output data points and columns being output dimensions (with the number of columns equals to the number of GP nodes in the final layer).
reset (bool, optional) – whether to reset latent layers and hyperparameter values of the DGP emulator. Defaults to False.

emulation module

class dgpsi.emulation.emulator(all_layer, N=10, block=True)

Bases: object

Class to make predictions from the trained DGP model.

Parameters:

all_layer (list) – a list that contains the trained DGP model produced by the method estimate() of the dgp class.
N (int, optional) – the number of imputations to produce the predictions. Increase the value to account for more imputation uncertainties. Defaults to 10.
block (bool, optional) – whether to use the blocked (layer-wise) ESS for the imputations. Defaults to True.

change_vecch_state()

loo(X, method=None, sample_size=50, m=30)

Implement the Leave-One-Out cross-validation from a DGP emulator.

Parameters:

X (ndarray) – the training input data used to build the DGP emulator via the dgp class.
method (str, optional) – the prediction approach: mean-variance (mean_var) or sampling (sampling) approach for the LOO. If set to None, sampling (sampling) approach is used for DGP emulators with a categorical likelihood. Otherwise, mean-variance (mean_var) approach is used. mean-variance (mean_var) approach is not applicable to DGP emulators with a categorical likelihood. Defaults to None.
sample_size (int, optional) – the number of samples to draw for each given imputation if method = ‘sampling’. Defaults to 50.
m (int, optional) – the size of the conditioning set for loo calculations if the GP was built under the Vecchia approximation. Defaults to 30.

Returns:

if the argument method = ‘mean_var’, a tuple is returned. The tuple contains two numpy 2d-arrays, one for the predictive means: and another for the predictive variances. Each array has its rows corresponding to training input positions and columns corresponding to DGP output dimensions (i.e., the number of GP/likelihood nodes in the final layer);

If the argument method = ‘sampling’, the function returns a list. This list contains D elements, where D represents either the number of GP/likelihood nodes in the final layer or the number of classes (when mode = ‘prob’ and the emulator uses a categorical likelihood). Each element in the list is a 2d-array in which rows correspond to training input positions, and columns represent samples of size N * sample_size.

Return type:

tuple_or_list

metric(x_cand, method='ALM', obj=None, nugget_s=1.0, m=50, score_only=False)

Compute the value of the ALM, MICE, or VIGF criterion for sequential designs.

Parameters:

x_cand (ndarray) – a numpy 2d-array that represents a candidate input design where each row is a design point and each column is a design input dimension.
method (str, optional) – the sequential design approach: MICE (MICE), ALM (ALM), or VIGF (VIGF). Defaults to ALM.
obj (class, optional) – the dgp object that is used to build the DGP emulator when method = ‘VIGF’. Defaults to None.
nugget_s (float, optional) – the value of the smoothing nugget term used when method = ‘MICE’. Defaults to 1.0.
m (int, optional) – the size of the conditioning set for metric calculations if the DGP was built under the Vecchia approximation. Defaults to 50.
score_only (bool, optional) – whether to return only the scores of ALM or MICE criterion at all design points contained in x_cand. Defaults to False.

Returns:

if the argument score_only = True, a numpy 2d-array is returned that gives the scores of ALM, MICE, or VIGF criterion with rows: corresponding to design points in the candidate design set x_cand and columns corresponding to output dimensions;
if the argument score_only = False, a tuple of two numpy 1d-arrays is returned. The first one gives the indices (i.e., row numbers): of the design points in the candidate design set x_cand that have the largest criterion values, which are given by the second array, across different outputs of the DGP emulator.

Return type:

ndarray_or_tuple

nllik(x, y, m=50)

Compute the negative predicted log-likelihood from a trained DGP model with likelihood layer.

Parameters:

x (ndarray) – a numpy 2d-array where each row is an input testing data point and each column is an input dimension.
y (ndarray) – a numpy 2d-array where each row is a scalar-valued testing output data point.
m (int, optional) – the size of the conditioning set if the DGP was built under the Vecchia approximation. Defaults to 50.

Returns:

a tuple of two 1d-arrays. The first one is the average negative predicted log-likelihood across all testing data points. The second one is the negative predicted log-likelihood for each testing data point.

Return type:

tuple

ploo(X, method=None, sample_size=50, m=30, core_num=None)

Implement the parallel Leave-One-Out cross-validation from a DGP emulator.

Parameters:

X – see descriptions of the method emulator.loo().
method – see descriptions of the method emulator.loo().
mode – see descriptions of the method emulator.loo().
sample_size – see descriptions of the method emulator.loo().
m – see descriptions of the method emulator.loo().
core_num (int, optional) – the number of processes to be used. Defaults to None. If not specified, the number of cores is set to max physical cores available // 2.

Returns:

Same as the method emulator.loo().

pmetric(x_cand, method='ALM', obj=None, nugget_s=1.0, m=50, score_only=False, chunk_num=None, core_num=None)

Compute the value of the ALM or MICE criterion for sequential designs in parallel.

Parameters:

x_cand – see descriptions of the method emulator.metric().
method – see descriptions of the method emulator.metric().
obj – see descriptions of the method emulator.metric().
nugget_s – see descriptions of the method emulator.metric().
m – see descriptions of the method emulator.metric().
score_only – see descriptions of the method emulator.metric().
chunk_num (int, optional) – the number of chunks that the candidate design set x_cand will be divided into. Defaults to None. If not specified, the number of chunks is set to core_num.
core_num (int, optional) – the number of processes to be used. Defaults to None. If not specified, the number of cores is set to max physical cores available // 2.

Returns:

Same as the method emulator.metric().

ppredict(x, method='mean_var', full_layer=False, sample_size=50, m=50, chunk_num=None, core_num=None)

Implement parallel predictions from the trained DGP model.

Parameters:

x – see descriptions of the method emulator.predict().
method – see descriptions of the method emulator.predict().
full_layer – see descriptions of the method emulator.predict().
sample_size – see descriptions of the method emulator.predict().
m – see descriptions of the method emulator.predict().
chunk_num (int, optional) – the number of chunks that the testing input array x will be divided into. Defaults to None. If not specified, the number of chunks is set to core_num.
core_num (int, optional) – the number of processes to be used. Defaults to None. If not specified, the number of cores is set to max physical cores available // 2.

Returns:

Same as the method emulator.predict().

predict(x, method='mean_var', full_layer=False, sample_size=50, m=50, aggregation=True)

Implement predictions from the trained DGP model.

Parameters:

x (ndarray) – a numpy 2d-array where each row is an input testing data point and each column is an input dimension.
method (str, optional) – the prediction approach: mean-variance (mean_var) or sampling (sampling) approach. Defaults to mean_var.
full_layer (bool, optional) – whether to output the predictions of all layers. Defaults to False.
sample_size (int, optional) – the number of samples to draw for each given imputation if method = ‘sampling’. Defaults to 50.
m (int, optional) – the size of the conditioning set for predictions if the DGP was built under the Vecchia approximation. Defaults to 50.
aggregation (bool, optional) – whether to aggregate mean and variance predictions from imputed linked GPs when method = ‘mean_var’ and full_layer = False. Defaults to True.

Returns:

if the argument method = ‘mean_var’, a tuple is returned:

If full_layer = False and aggregation = True, the tuple contains two numpy 2d-arrays, one for the predictive means and another for the predictive variances. Each array has its rows corresponding to testing positions and columns corresponding to DGP output dimensions (i.e., the number of GP/likelihood nodes in the final layer). For categorical likelihood, the arrays represent the predictive means and variances of class probabilities. The number of columns corresponds to the number of classes. In the binary classification case, a single column is returned, representing the probabilities of class 1;

If full_layer = False and aggregation = False, the tuple contains two lists, one for the predictive means and another for the predictive variances from the imputed linked GPs. Each list contains N (i.e., the number of imputations) numpy 2d-arrays. Each array has its rows corresponding to testing positions and columns corresponding to DGP output dimensions (i.e., the number of GP/likelihood nodes in the final layer). For categorical likelihood, arrays in each list represent the predictive means and variances of class probabilities obtained from the imputed linked GPs. The number of columns of the arrays corresponds to the number of classes. In the binary classification case, arrays have a single column, representing the probabilities of class 1;

If full_layer = True, the tuple contains two lists, one for the predictive means and another for the predictive variances. Each list contains L (i.e., the number of layers) numpy 2d-arrays. Each array has its rows corresponding to testing positions and columns corresponding to output dimensions (i.e., the number of GP nodes from the associated layer and in case of the final layer, it may be the number of the likelihood nodes). For categorical likelihood, the final arrays in each list represent the predictive means and variances of class probabilities. The number of columns of the final arrays corresponds to the number of classes. In the binary classification case, a single column is returned, representing the probabilities of class 1;

if the argument method = ‘sampling’, a list is returned:

If full_layer = False, the list contains D (i.e., the number of GP/likelihood nodes in the final layer) numpy 2d-arrays. Each array has its rows corresponding to testing positions and columns corresponding to samples of size: N * sample_size. For the categorical likelihood, the list contains numpy 2d-arrays of class probabilities, with the number of arrays equal to the number of classes. In the binary classification case, only a single array is returned, representing samples of the probability of class 1;

If full_layer = True, the list contains L (i.e., the number of layers) sub-lists. Each sub-list represents samples drawn from the GPs/likelihoods in the corresponding layers, and contains D (i.e., the number of GP nodes in the corresponding layer or likelihood nodes in the final layer) numpy 2d-arrays. Each array gives samples of the output from one of D GPs/likelihoods at the testing positions, and has its rows corresponding to testing positions and columns corresponding to samples of size: N * sample_size. For the categorical likelihood, the final sub-list contains numpy 2d-arrays of class probabilities, with the number of arrays equal to the number of classes. In the binary classification case, the final sub-set only contains a single array, representing samples of the probability of class 1.

Return type:

tuple_or_list

predict_mice(x_cand, islikelihood, m): Implement predictions from the trained DGP model that are required to calculate the MICE criterion.

predict_mice_2layer_likelihood(x_cand, m): Implement predictions from the trained DGP model with 2 layers (including a likelihood layer) that are required to calculate the MICE criterion.

predict_vigf(x_cand, index, islikelihood, m): Implement predictions from the trained DGP model that are required to calculate the VIGF criterion.

predict_vigf_2layer_likelihood(x_cand, index, m): Implement predictions from the trained DGP model with 2 layers (including a likelihood layer) that are required to calculate the VIGF criterion.

remove_vecchia(): Remove the Vecchia mode from the DGP emulator.

to_vecchia(): Convert the DGP emulator to the Vecchia mode.

gp module

class dgpsi.gp.gp(X, Y, kernel, check_rep=True, vecchia=False, m=25, ord_fun=None)

Bases: object

Class that for Gaussian process emulation.

Parameters:

X (ndarray) – a numpy 2d-array where each row is an input data point and each column is an input dimension.
Y (ndarray) – a numpy 2d-array with only one column and each row being an input data point.
kernel (class) – a kernel class that specifies the features of the GP.
vecchia (bool) – a bool indicating if Vecchia approximation will be used. Defaults to False.
m (int) – an integer that gives the size of the conditioning set for the Vecchia approximation in the training. Defaults to 25.
ord_fun (function, optional) – a function that decides the ordering of the input of the GP for the Vecchia approximation. If set to None, then the default random ordering is used. Defaults to None.

export(): Export the trained GP.

initialize(): Assign input/output data to the kernel for training.

loo(method='mean_var', sample_size=50, m=30)

Implement the Leave-One-Out cross-validation of a GP model.

Parameters:

method (str, optional) – the prediction approach: mean-variance (mean_var) or sampling (sampling) approach for the LOO. Defaults to mean_var.
sample_size (int, optional) – the number of samples to draw from the predictive distribution of GP if method = ‘sampling’. Defaults to 50.
m (int, optional) – the size of the conditioning set for loo calculations if the GP was built under the Vecchia approximation. Defaults to 30.

Returns:

if the argument method = ‘mean_var’, a tuple is returned. The tuple contains two numpy 2d-arrays, one for the predictive means: and another for the predictive variances. Each array has only one column with its rows corresponding to training data positions.
if the argument method = ‘sampling’, a numpy 2d-array is returned. The array has its rows corresponding to to training data positions: and columns corresponding to sample_size number of samples drawn from the predictive distribution of GP.

Return type:

tuple_or_ndarray

metric(x_cand, method='MICE', nugget_s=1.0, m=50, score_only=False)

Compute the value of the ALM, MICE, or VIGF criterion for sequential designs.

Parameters:

x_cand (ndarray) – a numpy 2d-array that represents a candidate input design where each row is a design point and each column is a design input dimension.
method (str, optional) – the sequential design approach: MICE (MICE), ALM (ALM) or VIGF (VIGF). Defaults to MICE.
nugget_s (float, optional) – the value of the smoothing nugget term used when method = ‘MICE’. Defaults to 1.0.
m (int, optional) – the size of the conditioning set for metric calculations if the GP was built under the Vecchia approximation. Defaults to 50.
score_only (bool, optional) – whether to return only the scores of ALM or MICE criterion at all design points contained in x_cand. Defaults to False.

Returns:

if the argument score_only = True, a numpy 2d-array is returned that gives the scores of ALM, MICE, or VIGF criterion with rows: corresponding to design points in the candidate design set x_cand
if the argument score_only = False, a tuple of two numpy 1d-arrays is returned. The first one gives the index (i.e., row number): of the design point in the candidate design set x_cand that has the largest criterion value, which is given by the second element.

Return type:

ndarray_or_tuple

pmetric(x_cand, method='MICE', nugget_s=1.0, m=50, score_only=False, chunk_num=None, core_num=None)

Implement parallel computation of the ALM, MICE, or VIGF criterion for sequential designs.

Parameters:

x_cand – see descriptions of the method gp.metric().
method – see descriptions of the method gp.metric().
nugget_s – see descriptions of the method gp.metric().
m – see descriptions of the method gp.metric().
score_only – see descriptions of the method gp.metric().
chunk_num (int, optional) – the number of chunks that the candidate design set x_cand will be divided into. Defaults to None. If not specified, the number of chunks is set to core_num.
core_num (int, optional) – the number of processes to be used. Defaults to None. If not specified, the number of cores is set to max physical cores available // 2.

Returns:

Same as the method gp.metric().

ppredict(x, method='mean_var', sample_size=50, m=50, chunk_num=None, core_num=None)

Implement parallel predictions from the trained GP model.

Parameters:

x – see descriptions of the method gp.predict().
method – see descriptions of the method gp.predict().
sample_size – see descriptions of the method gp.predict().
m – see descriptions of the method gp.predict().
chunk_num (int, optional) – the number of chunks that the testing input array x will be divided into. Defaults to None. If not specified, the number of chunks is set to core_num.
core_num (int, optional) – the number of processes to be used. Defaults to None. If not specified, the number of cores is set to max physical cores available // 2.

Returns:

Same as the method gp.predict().

predict(x, method='mean_var', sample_size=50, m=50)

Implement predictions from the trained GP model.

Parameters:

x (ndarray) – a numpy 2d-array where each row is an input testing data point and each column is an input dimension.
method (str, optional) – the prediction approach: mean-variance (mean_var) or sampling (sampling) approach. Defaults to mean_var.
sample_size (int, optional) – the number of samples to draw from the predictive distribution of GP if method = ‘sampling’. Defaults to 50.
m (int, optional) – the size of the conditioning set for predictions if the GP was built under the Vecchia approximation. Defaults to 50.

Returns:

if the argument method = ‘mean_var’, a tuple is returned:

the tuple contains two numpy 2d-arrays, one for the predictive means and another for the predictive variances. Each array has only one column with its rows corresponding to testing positions.

if the argument method = ‘sampling’, a numpy 2d-array is returned:

the array has its rows corresponding to testing positions and columns corresponding to sample_size number of samples drawn from the predictive distribution of GP.

Return type:

tuple_or_ndarray

remove_vecchia(): Remove the Vecchia mode from the GP emulator.

to_vecchia(m=25, ord_fun=None)

Convert the GP emulator to the Vecchia mode.

Parameters:

m (int) – an integer that gives the size of the conditioning set for the Vecchia approximation in the training. Defaults to 25.
ord_fun (function, optional) – a function that decides the ordering of the input of the GP for the Vecchia approximation. If set to None, then the default random ordering is used. Defaults to None.

train(): Train the GP model.

update_kernel(reset_lengthscale): Assign new input/output data to the kernel. :param reset_lengthscale: whether to reset hyperparameter of the GP emulator to the initial values. :type reset_lengthscale: bool

update_xy(X, Y, reset=False)

Update the trained GP emulator with new input and output data.

Parameters:

X (ndarray) – a numpy 2d-array where each row is an input data point and each column is an input dimension.
Y (ndarray) – a numpy 2d-array with only one column and each row being an input data point.
reset (bool, optional) – whether to reset hyperparameter values of the GP emulator. Defaults to False.

imputation module

class dgpsi.imputation.imputer(all_layer, block=True)

Bases: object

Class to implement imputation of latent variables.

Parameters:

all_layer (list) – a list that contains the DGP model
block (bool, optional) – whether to use the blocked (layer-wise) ESS for the imputations. Defaults to True.

key_stats(): Compute and store key statistics used in predictions

static one_sample(target_kernel, linked_upper_kernels, k)

Impute one latent variable produced by a particular GP.

Parameters:

target_kernel (class) – the GP whose output is a latent variable that needs to be imputed.
linked_upper_kernels (list) – a list of GPs (in the next layer) that link the output produced by the GP defined by the argument target_kernel.
k (int) – the index indicating the position of the GP defined by the argument target_kernel in its layer.

static one_sample_block(target_layer, upper_layer)

Impute a latent layer.

Parameters:

target_layer (list) – a list of GPs that produce a latent layer that needs to be imputed.
upper_layer (list) – a list of GPs (in the next layer) that are fed by the output of GPs in target_layer.

sample(burnin=0)

Implement the imputation via the ESS-within-Gibbs.

Parameters:: burnin (int, optional) – the number of burnin iterations for the ESS-within-Gibbs sampler to generate one realisation of latent variables. Defaults to 0.

update_ord_nn(): Update order and KNN in each GP node for Vecchia approximation

dgpsi.imputation.uniform(low=0.0, high=1.0, size=None)

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

Note

New code should use the uniform method of a default_rng() instance instead; please see the random-quick-start.

Parameters

lowfloat or array_like of floats, optional: Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
highfloat or array_like of floats: Upper boundary of the output interval. All values generated will be less than or equal to high. The high limit may be included in the returned array of floats due to floating-point rounding in the equation low + (high-low) * random_sample(). The default value is 1.0.
sizeint or tuple of ints, optional: Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if low and high are both scalars. Otherwise, np.broadcast(low, high).size samples are drawn.

Returns

outndarray or scalar: Drawn samples from the parameterized uniform distribution.

Notes

The probability density function of the uniform distribution is

\[p(x) = \frac{1}{b - a}\]

anywhere within the interval [a, b), and zero elsewhere.

When high == low, values of low will be returned. If high < low, the results are officially undefined and may eventually raise an error, i.e. do not rely on this function to behave when passed arguments satisfying that inequality condition. The high limit may be included in the returned array of floats due to floating-point rounding in the equation low + (high-low) * random_sample(). For example:

>>> x = np.float32(5*0.99999999)
>>> x
5.0

Examples

Draw samples from the distribution:

>>> s = np.random.uniform(-1,0,1000)

All values are within the given interval:

>>> np.all(s >= -1)
True
>>> np.all(s < 0)
True

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s, 15, density=True)
>>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
>>> plt.show()

kernel_class module

dgpsi.kernel_class.combine(*layers)

Combine layers into one list as a DGP or linked (D)GP structure.

Parameters:: layers (list) – a sequence of lists, each of which contains the GP nodes (defined by the kernel class), likelihood nodes (e.g., defined by the Poisson class), or containers (defined by the container class) in that layer.
Returns:: a list of layers defining the DGP or linked (D)GP structure.
Return type:: list

class dgpsi.kernel_class.kernel(length, scale=1.0, nugget=1e-06, name='sexp', prior_name='ga', prior_coef=None, bds=None, nugget_est=False, scale_est=False, input_dim=None, connect=None)

Bases: object

Class that defines the GPs in the DGP hierarchy.

Parameters:

length (ndarray) –
a numpy 1d-array, whose length equals to:
1. either one if the lengthscales in the kernel function are assumed same across input dimensions; or
2. the total number of input dimensions, which is the sum of the number of feeding GPs in the last layer (defined by the argument input_dim) and the number of connected global input dimensions (defined by the argument connect), if the lengthscales in the kernel function are assumed different across input dimensions.
scale (float, optional) – the variance of a GP. Defaults to 1.
nugget (float, optional) – the nugget term of a GP. Defaults to 1e-6.
name (str, optional) – kernel function to be used. Either sexp for squared exponential kernel or matern2.5 for Matern2.5 kernel. Defaults to sexp.
prior_name (str, optional) – prior options for the lengthscales and nugget term. Either gamma (ga), inverse gamma (inv_ga) or the reference prior (ref) for the lengthscales and nugget term. Set None to disable the prior. Defaults to ga.
prior_coef (ndarray, optional) – if prior_name is either ga or inv_ga, it is a numpy 1d-array that contains two values specifying the shape and rate parameters of gamma prior, or shape and scale parameters of inverse gamma prior. If prior_name is ref, it is a numpy 1d-array that gives the value of the coefficient a in the reference prior. When set to None, it defaults to np.array([1.6,0.3]) for gamma or inverse gamma priors. When set to the reference prior, it defaults to np.array([0.2]). Defaults to None.
bds (ndarray, optional) – a numpy 1d-array of length two that gives the lower and upper bounds of the lengthscales. Default to None.
nugget_est (bool, optional) – set to True to estimate nugget term or to False to fix the nugget term as specified by the argument nugget. If set to True, the value set to the argument nugget is used as the initial value. Defaults to False.
scale_est (bool, optional) – set to True to estimate the variance or to False to fix the variance as specified by the argument scale. Defaults to False.
input_dim (ndarray, optional) –
a numpy 1d-array that contains either
1. the indices of GPs in the feeding layer whose outputs feed into the GP; or
2. the indices of dimensions in the global input if the GP is in the first layer.
When set to None,
1. all outputs from GPs in the feeding layer; or
2. all global input dimensions feed into the GP.
Defaults to None.
connect (ndarray, optional) – a numpy 1d-array that contains the indices of dimensions in the global input connecting to the GP as additional input dimensions to the input obtained from the output of GPs in the feeding layer (as determined by the argument input_dim). When set to None, no global input connection is implemented. Defaults to None. When the kernel class is used in GP/DGP emulators for linked emulation and some input dimensions to the computer models are not connected to some feeding computer models, set connect to a 1d-array of indices of these external global input dimensions, and accordingly, set input_dim to a 1d-array of indices of the remaining input dimensions that are connected to the feeding computer models.

type

identifies that the kernel is a GP.

Type:: str

g

a function giving the log probability density function of gamma or inverse gamma distribution ignoring the constant part.

Type:: function

gfod

a function giving the first order derivative of g with respect to the log-transformed lengthscales and nugget.

Type:: function

para_path

a numpy 2d-array that contains the trace of model parameters. Each row is a parameter estimate produced by one SEM iteration. The model parameters in each row are ordered as follow: np.array([scale estimate, lengthscale estimate (whose length>=1), nugget estimate]).

Type:: ndarray

global_input

a numpy 2d-array that contains the connect global input dimensions determined by the argument connect. The value of the attribute is assigned during the initialisation of dgp class. If connect is set to None, this attribute is also None.

Type:: ndarray

input

a numpy 2d-array (each row as a data point and each column as a data dimension) that contains the input training data (according to the argument input_dim) to the GP. The value of this attribute is assigned during the initialisation of dgp class.

Type:: ndarray

output

a numpy 2d-array with only one column that contains the output training data to the GP. The value of this attribute is assigned during the initialisation of dgp class.

Type:: ndarray

rep

a numpy 1d-array used to re-construct repetitions in the data according to the repetitions in the global input, i.e., rep is assigned during the initialisation of dgp class if one input position has multiple outputs. Otherwise, it is None. Defaults to None.

Type:: ndarray

Rinv

a numpy 2d-array that stores the inversion of correlation matrix. Defaults to None.

Type:: ndarray

Rinv_y

a numpy 1d-array that stores the product of correlation matrix inverse and the output Y. Defaults to None.

Type:: ndarray

vecch

indicates weather the Vecchia apprxoimation is used. Defaults to None.

Type:: bool

D

the dimension of input data to the GP node. Defaults to None.

Type:: int

ord

a 1d-array that gives the ordering of input for the Vecchia approximation. Defaults to None.

Type:: ndarray

rev_ord

a 1d-array that reconstructs the ordering of input from the ordered one for the Vecchia approximation. Defaults to None.

Type:: ndarray

m

the number of conditioning points in Vecchia approximation. Defaults to None.

Type:: int

NNarray

a 2d-array that gives the m NN for each data point after ordering for the Vecchia approximation. Defaults to None.

Type:: ndarray

R2

a 2d-array that stores the R2 of the linear regression between global_input and input. Defaults to None.

Type:: ndarray

add_to_path(): Add updated model parameter estimates to the class attribute para_path.

callback(xk)

compute_cl()

compute_stats(): Compute and store key statistics for the GP predictions

gfod(x)

gp_prediction(x, z)

Make GP predictions.

Parameters:

x (ndarray) – a numpy 2d-array that contains the input testing data (whose rows correspond to testing data points and columns correspond to testing data dimensions) with the number of columns same as the input attribute.
z (ndarray) – a numpy 2d-array that contains additional input testing data (with the same number of columns of the global_input attribute) from the global testing input if the argument connect is not None. Set to None if the argument connect is None.

Returns:

a tuple of two 1d-arrays giving the means and variances at the testing input data positions.

Return type:

tuple

k_matrix(fod_eval=False)

Compute the correlation matrix and/or first order derivatives of the correlation matrix wrt log-transformed lengthscales and nugget.

Parameters:

fod_eval (bool) – indicates if the gradient information is also computed along with the correlation matrix. Defaults to False.

Returns:

If fod_eval = False, a numpy 2d-array K is returned as the correlation matrix.
If fod_eval = True, a tuple is returned. It includes K and fod, a numpy 3d-array that contains the first order derivatives of the correlation matrix wrt log-transformed lengthscales and nugget. The length of the array equals to the total number of model parameters (i.e., the total number of lengthscales and nugget).

Return type:

ndarray_or_tuple

linkgp_prediction(m, v, z)

Make linked GP predictions.

Parameters:

m (ndarray) – a numpy 2d-array that contains predictive means of testing outputs from the GPs in the last layer. The number of rows equals to the number of testing positions and the number of columns equals to the length of the argument input_dim. If the argument input_dim is None, then the number of columns equals to the number of GPs in the last layer.
v (ndarray) – a numpy 2d-array that contains predictive variances of testing outputs from the GPs in the last layer. It has the same shape of m.
z (ndarray) – a numpy 2d-array that contains additional input testing data (with the same number of columns of the global_input attribute) from the global testing input if the argument connect is not None. Set to None if the argument connect is None.

Returns:

a tuple of two 1d-arrays giving the means and variances at the testing input data positions (that are represented by predictive means and variances).

Return type:

tuple

linkgp_prediction_full(m, v, m_z, v_z, z)

Make linked GP predictions with additional input also generated by GPs/DGPs.

Parameters:

m (ndarray) – a numpy 2d-array that contains predictive means of testing outputs from the GPs in the last layer. The number of rows equals to the number of testing positions and the number of columns equals to the length of the argument input_dim. If the argument input_dim is None, then the number of columns equals to the number of GPs in the last layer.
v (ndarray) – a numpy 2d-array that contains predictive variances of testing outputs from the GPs in the last layer. It has the same shape of m.
m_z (ndarray) – a numpy 2d-array that contains predictive means of additional input testing data from GPs.
v_z (ndarray) – a numpy 2d-array that contains predictive variances of additional input testing data from GPs.
z (ndarray) – a numpy 2d-array that contains additional input testing data from the global testing input that are not from GPs. Set to None if the argument connect is None.

Returns:

a tuple of two 1d-arrays giving the means and variances at the testing input data positions (that are represented by predictive means and variances).

Return type:

tuple

llik(x)

Compute the negative log-likelihood function of the GP and the first order derivatives of the negative log-likelihood function wrt log-transformed model parameters..

Parameters:: x (ndarray) – a numpy 1d-array that contains the values of log-transformed model parameters: log-transformed lengthscales followed by the log-transformed nugget.
Returns:: a tuple is returned. The tuple contains two numpy 1d-arrays. The first one gives the negative log-likelihood. The second one (whose length equal to the total number of lengthscales and nugget) contains first order derivatives of the negative log-likelihood function wrt log-transformed lengthscales and nugget.
Return type:: tuple

llik_vecch(x)

Compute the negative log-likelihood function of the GP under Vecchia approximation.

Parameters:: x (ndarray) – a numpy 1d-array that contains the values of log-transformed model parameters: log-transformed lengthscales followed by the log-transformed nugget.
Returns:: a tuple is returned. The tuple contains two numpy 1d-arrays. The first one gives the negative log-likelihood. The second one (whose length equal to the total number of lengthscales and nugget) contains first order derivatives of the negative log-likelihood function wrt log-transformed lengthscales and nugget.
Return type:: tuple

log_likelihood_func()

log_likelihood_func_vecch(): Compute Gaussian log-likelihood function using the Vecchia approximation.

log_prior()

Compute the value of log priors specified to the lengthscales and nugget.

Returns:: a numpy 1d-array giving the sum of log priors of the lengthscales and nugget.
Return type:: ndarray

log_prior_fod()

Compute the first order derivatives of log priors wrt the log-transformed lengthscales and nugget.

Returns:: a numpy 1d-array (whose length equal to the total number of lengthscales and nugget) giving the first order derivatives of log priors wrt the log-transformed lengthscales and nugget.
Return type:: ndarray

log_t()

Log transform the model parameters (lengthscales and nugget).

Returns:: a numpy 1d-array of log-transformed model parameters
Return type:: ndarray

maximise(method='L-BFGS-B')

Optimise and update model parameters by minimising the negative log-likelihood function.

Parameters:: method (str, optional) – optimisation algorithm. Defaults to L-BFGS-B.

ord_nn(ord=None, NNarray=None, pointer=False): Specify the ordering and NN for the Vecchia approximation

r2(overwritten=False): Compute R2 of the linear regression between global_input and input.

update(log_theta)

Update the model parameters (lengthscales and nugget).

Parameters:: log_theta (ndarray) – optimised numpy 1d-array of log-transformed lengthscales and nugget.

likelihood_class module

class dgpsi.likelihood_class.Categorical(num_classes=None, input_dim=None, link='logit')

Bases: object

Class to implement categorical likelihood for binary and multi-class classifications. It can only be added as the final layer of a DGP model.

Parameters:

num_classes (int, optional) – an integer indicating the number of classes in the training data.
input_dim (ndarray, optional) – a numpy 1d-array of length one that contains the indices of one GP (if the output has two classes) and K (if the output has K > 2 classes) in the feeding layer whose outputs feed into the likelihood node. When set to None, all outputs from GPs of the feeding layer feed into the likelihood node, and in this case one needs to ensure there is only one GP node (for binary classification) or K GP nodes (for multi-class classification) specified in the feeding layer. Defaults to None.
link (str, optional) – the link function to be used for binary classification. Either ‘probit’ or ‘logit’. Defaults to ‘logit’.

type

identifies that the node is a likelihood node;

Type:: str

input

a numpy 2d-array (each row as a data point and each column as a likelihood parameter from the DGP part) that contains the input data (according to the argument input_dim) to the likelihood node. The value of this attribute is assigned during the initialisation of dgp class.

Type:: ndarray

output

a numpy 2d-array with only one column that contains the output data to the likelihood node. The value of this attribute is assigned during the initialisation of dgp class.

Type:: ndarray

exact_post_idx

a numpy 1d-array that indicates the indices of the likelihood parameters that allow closed-form conditional posterior distributions. Defaults to None.

Type:: ndarray

rep

a numpy 1d-array used to re-construct repetitions in the data according to the repetitions in the global input, i.e., rep is assigned during the initialisation of dgp class if one input position has multiple outputs. Otherwise, it is None. Defaults to None.

Type:: ndarray

llik()

The log-likelihood function of Categorical distribution.

Returns:: a numpy 1d-array of log-likelihood.
Return type:: ndarray

pllik(y, f)

prediction(m, v)

sampling(f_sample)

class dgpsi.likelihood_class.Hetero(input_dim=None)

Bases: object

Class to implement Heteroskedastic Gaussian likelihood. It can only be added as the final layer of a DGP model.

Parameters:: input_dim (ndarray, optional) – a numpy 1d-array of length two that contains the indices of two GPs in the feeding layer whose outputs feed into the likelihood node. When set to None, all outputs from GPs of the feeding layer feed into the likelihood node, and in this case one needs to ensure there are only two GP nodes specified in the feeding layer. Defaults to None.

llik()

static pllik(y, f)

static post_het1(v, Gamma, y_mask): Calculate the conditional posterior mean and covariance of the mean of the heteroskedastic Gaussian likelihood when there are no repetitions in the training data.

static post_het2(v, Gamma, mask_f, y_mask): Calculate the conditional posterior mean and covariance of the mean of the heteroskedastic Gaussian likelihood when there are repetitions in the training data.

static post_het_vecch(U_sp_l, U_sp_ol, y): Calculate the conditional posterior mean and covariance of the mean of the heteroskedastic Gaussian likelihood when there are repetitions in the training data under the Vecchia approximation.

posterior(idx, v): Sampling from the conditional posterior distribution of the mean in heteroskedastic Gaussian likelihood.

posterior_vecch(idx, U_sp_l, U_sp_ol, ord, rev_ord, invd=None, invg=None): Sampling from the conditional posterior distribution of the mean in heteroskedastic Gaussian likelihood under the Vecchia Approximation.

static prediction(m, v)

static sampling(f_sample)

class dgpsi.likelihood_class.NegBin(input_dim=None)

Bases: object

Class to implement Negative Binomial likelihood. It can only be added as the final layer of a DGP model.

Parameters:: input_dim (ndarray, optional) – a numpy 1d-array of length two that contains the indices of two GPs in the feeding layer whose outputs feed into the likelihood node. When set to None, all outputs from GPs of the feeding layer feed into the likelihood node, and in this case one needs to ensure there are only two GP nodes specified in the feeding layer. Defaults to None.

llik()

static pllik(y, f)

static prediction(m, v)

static sampling(f_sample)

class dgpsi.likelihood_class.Poisson(input_dim=None)

Bases: object

Class to implement Poisson likelihood. It can only be added as the final layer of a DGP model.

Parameters:: input_dim (ndarray, optional) – a numpy 1d-array of length one that contains the indices of one GP in the feeding layer whose outputs feed into the likelihood node. When set to None, all outputs from GPs of the feeding layer feed into the likelihood node, and in this case one needs to ensure there is only one GP node specified in the feeding layer. Defaults to None.

type

identifies that the node is a likelihood node;

Type:: str

input

a numpy 2d-array (each row as a data point and each column as a likelihood parameter from the DGP part) that contains the input data (according to the argument input_dim) to the likelihood node. The value of this attribute is assigned during the initialisation of dgp class.

Type:: ndarray

output

a numpy 2d-array with only one column that contains the output data to the likelihood node. The value of this attribute is assigned during the initialisation of dgp class.

Type:: ndarray

exact_post_idx

a numpy 1d-array that indicates the indices of the likelihood parameters that allow closed-form conditional posterior distributions. Defaults to None.

Type:: ndarray

rep

a numpy 1d-array used to re-construct repetitions in the data according to the repetitions in the global input, i.e., rep is assigned during the initialisation of dgp class if one input position has multiple outputs. Otherwise, it is None. Defaults to None.

Type:: ndarray

llik()

The log-likelihood function of Poisson distribution.

Returns:: a numpy 1d-array of log-likelihood.
Return type:: ndarray

static pllik(y, f)

The predicted log-likelihood function of Poisson distribution.

Parameters:

y (ndarray) – a numpy 3d-array of output data with shape (N,1,1), where N is the number of output data points.
f (ndarray) – a numpy 3d-array of sample points with shape (N,S,Q), where S is the number of sample points and Q is the number of parameters in the distribution (e.g., Q = 1 for Poisson distribution).

Returns:

a numpy 3d-array of log-likelihood for given f.

Return type:

ndarray

static prediction(m, v)

Compute mean and variance of the DGP+Poisson model given the predictive mean and variance of DGP model for Poisson parameter.

Parameters:

m (ndarray) – a numpy 2d-array of predictive mean from the DGP model for the Poisson parameter.
v (ndarray) – a numpy 2d-array of predictive variance from the DGP model for the Poisson parameter.

Returns:

a tuple of two 1d-arrays giving the means and variances at the testing input data positions (that are represented by predictive means and variances).

Return type:

tuple

sampling(f_sample)

Generate samples of DGP+Poisson model given samples of DGP model for the Poisson parameter.

Parameters:: f_sample (ndarray) – a numpy 2d-array (with one column) of samples from the DGP model for the Poisson parameter.
Returns:: a tuple of one 1d-arrays giving samples at the testing input data positions.
Return type:: tuple

linkgp module

class dgpsi.linkgp.container(structure, local_input_idx=None, block=True)

Bases: object

Class to contain the trained GP or DGP emulator of a computer model for linked (D)GP emulation.

Parameters:

structure (list) – a list that contains the trained structure of GP or DGP of a computer model. For GP, this is the list exported from the export() method of the gp class. For DGP, this is the list exported from the estimate() of the dgp class.
local_input_idx (ndarray_or_list) –
a numpy 1d-array or a list:
1. If local_input_idx is a 1d-array, it specifies the indices of outputs (a 2d-array) produced by all emulators in the feeding layer that are input to the emulator represented by the structure argument. The indices should be ordered in such a way that the extracted output from the feeding layer is sorted in the same order as the training input used for the GP/DGP emulator that the structure argument represents. When the emulator is in the first layer, local_input_idx gives the indices of its input in the global testing input set, see lgp.predict() for descriptions of the global testing input set.
2. If local_input_idx is a list, the emulator must be in layer 2 or deeper layers. The list should have a number (the same number of preceding layers, e.g., when an emulator is in the second layer, the list is of length 1) of elements. Each element is a 1d-array that specifies the indices of outputs produced by all emulators in the corresponding layer that feed to the emulator represented by the structure argument. If there is no output connections from a certain layer, set None instead in the list.
Defaults to None. When the argument is None, one needs to set its value using the set_local_input().
block (bool, optional) – whether to use the blocked (layer-wise) ESS for the imputations. Defaults to True.

remove_vecchia(): Remove the Vecchia mode from the container.

set_local_input(idx, new=False)

Set the local_input_idx argument and optionally output a copy of the container with a different local_input_idx.

Parameters:

idx (ndarray_or_list) – see container for details.
new (bool, optional) – whether to output a copy of the container with a different local_input_idx. Defaults to False.

Remark:

This method is useful in the following scenarios:

when different models are emulated by different teams. Each team can create the container of their model even without knowing how different models are connected together. When this information is available and containers of different emulators are collected, the connections between emulators can then be set by assigning values to local_input_idx of each container with this method.

when local_input_idx was not correctly specified when the container was created, one can correct local_input_idx swiftly without recreating it.

when the same emulator in the container is repeatedly used in a system, one can set new to True to create copies of the container by assigning different local_input_idx to the copies swiftly without generating the containers repeatedly.

to_vecchia(): Convert the container to the Vecchia mode.

class dgpsi.linkgp.lgp(all_layer, N=10)

Bases: object

Class to store a system of GP and DGP emulators for predictions.

Parameters:

all_layer (list) – a list contains L (the number of layers of a systems of computer models) sub-lists, each of which represents a layer and contains the GP/DGP emulators of computer models represented by the container class. The sub-lists are placed in the list in the same order of the specified computer model system.
N (int) – the number of imputation to produce the predictions. Increase the value to account for more imputation uncertainties. If the system consists only GP emulators, N is set to 1 automatically. Defaults to 10.

static dgp_pred(x, m, v, z, structure, pred_m): Compute predictive mean and variance from a DGP (DGP+likelihood) emulator when the testing input is either deterministic or normally distributed.

static gp_pred(x, m, v, z, structure, m_pred): Compute predictive mean and variance from a GP emulator when the testing input is either deterministic or normally distributed.

ppredict(x, method='mean_var', full_layer=False, sample_size=50, m=50, chunk_num=None, core_num=None)

Implement parallel predictions from the trained DGP model.

Parameters:

x – see descriptions of the method lgp.predict().
method – see descriptions of the method lgp.predict().
full_layer – see descriptions of the method lgp.predict().
sample_size – see descriptions of the method lgp.predict().
m – see descriptions of the method lgp.predict().
chunk_num (int, optional) – the number of chunks that the testing input array x will be divided into. Defaults to None. If not specified, the number of chunks is set to core_num.
core_num (int, optional) – the number of cores/workers to be used. Defaults to None. If not specified, the number of cores is set to (max physical cores available - 1).

Returns:

Same as the method predict.

predict(x, method='mean_var', full_layer=False, sample_size=50, m=50)

Implement predictions from the linked (D)GP model.

Parameters:

x (ndarray_or_list) –
a numpy 2d-array or a list.
1. If x is a 2d-array, it is the global testing input set to the computer emulators in the first layer where each rows are input testing data points and columns are input dimensions across all computer emulators in the first layer of the system. In this case, it is assumed that x is the only global input to the computer system, i.e., there are no external global input to computer emulators in layers other than the first layer.
2. If x is a list, it has L (the number of layers of a systems of emulators) elements. The first element is a numpy 2d-array that represents the global testing input set to the computer emulators in the first layer. The remaining L-1 elements are L-1 sub-lists, each of which contains a number (same as the number of computer emulators in the corresponding layer) of numpy 2d-arrays (rows being testing points and columns being input dimensions) that represent the external global testing input to the computer models in the corresponding layer. The order of 2d-arrays in each sub-list must be the same order of the emulators placed in the corresponding layer of all_layer argument to lgp class. If there is no external global input to a certain computer emulator of the system, set None in the corresponding sub-list (i.e., layer) of x.
method (str, optional) – the prediction approach: mean-variance (mean_var) or sampling (sampling) approach. Defaults to mean_var.
full_layer (bool, optional) – whether to output the predictions from all GP/DGP emulators in the system. Defaults to False.
sample_size (int, optional) – the number of samples to draw for each given imputation if method = ‘sampling’. Defaults to 50.
m (int, optional) – the size of the conditioning set for predictions if the DGP was built under the Vecchia approximation. Defaults to 50.

Returns:

if the argument method = ‘mean_var’, a tuple is returned:

If full_layer = False, the tuple contains two lists, one for the predictive means and another for the predictive variances. Each list contains a number (same number of emulators in the final layer of the system) of numpy 2d-arrays. Each 2d-array has its rows corresponding to global testing positions and columns corresponding to output dimensions of the associated emulator in the final layer;

If full_layer = True, the tuple contains two lists, one for the predictive means and another for the predictive variances. Each list contains L (i.e., the number of layers of the emulated system) sub-lists. Each sub-list represents a layer and contains a number (same number of emulators in the corresponding layer of the system) of numpy 2d-arrays. Each array has its rows corresponding to global testing positions and columns corresponding to output dimensions of the associated GP/DGP emulator in the corresponding layer.

if the argument method = ‘sampling’, a list is returned:

If full_layer = False, the list contains a number (same number of emulators in the final layer of the system) of numpy 3d-arrays. Each array corresponds to an emulator in the final layer, and has its 0-axis corresponding to the output dimensions of the GP/DGP emulator, 1-axis corresponding to global testing positions, and 2-axis corresponding to samples of size N * sample_size;

If full_layer = True, the list contains L (i.e., the number of layers of the emulated system) sub-lists. Each sub-list represents a layer and contains a number (same number of emulators in the corresponding layer of the system) of numpy 3d-arrays. Each array corresponds to an emulator in the associated layer, and has its 0-axis corresponding to the output dimensions of the GP/DGP emulators, 1-axis corresponding to global testing positions, and 2-axis corresponding to samples of size N * sample_size.

Return type:

tuple_or_list

set_vecchia(mode)

Convert the (D)GP emulators in the linked system to Vecchia or non-Vecchia mode.

Parameters:

mode (bool_or_list) – a bool or a list of bools.
bool (1. If mode is a) – the Vecchia (True) or non-Vecchia (False) mode.
all (it indicates whether to set) – the Vecchia (True) or non-Vecchia (False) mode.
list (2. If mode is a) – each of which represents a layer and contains same number of bools as that of the GP/DGP emulators of computer models in the same layer. The list has the same shape as the all_layer argument of lgp class.
L (it is a list contains) – each of which represents a layer and contains same number of bools as that of the GP/DGP emulators of computer models in the same layer. The list has the same shape as the all_layer argument of lgp class.

temp_all_layer()

utils module

class dgpsi.utils.NystromKPCA(n_components, m=200)

Bases: object

demean_matrices(K_mm, K_nm)

fit_transform(X)

static flip_dimensions(scores)

static get_inverse(K, is_sqrt=False)

dgpsi.utils.get_thread(): Get number of numba thread.

dgpsi.utils.have_same_shape(list1, list2)

dgpsi.utils.multistart(func: Callable[[ndarray, Any], float], initials: ndarray, lb: ndarray, up: ndarray, args: Tuple = (), method: str = 'L-BFGS-B', core_num: int | None = None, out_dim: int | None = 0, int_mask: ndarray | None = None) → ndarray

Perform parallel multistart optimization and return the best optimized x.

Parameters: - func: The objective function to be minimized. Should accept (x, *args). - initials: 2D NumPy array where each row is a starting point. - lb: 1D NumPy array of lower bounds for each parameter. - up: 1D NumPy array of upper bounds for each parameter. - args: Additional arguments to pass to the objective function. - method: Optimization method (default ‘L-BFGS-B’). - core_num: Number of worker processes to use. - out_dim: The index of the output to which the optimization is to be implemented. - int_mask: Boolean mask indicating which variables in x must be integers.

Returns: - best_x: Optimized parameters corresponding to the lowest target value.

dgpsi.utils.nb_seed(value): Set seed for Numba functions.

dgpsi.utils.read(pkl_file)

Load the .pkl file that stores the emulator.

Parameters:: pkl_file (strings) – the path to and the name of the .pkl file where the emulator is stored.
Returns:: an emulator class. For GP, it is the gp class. For DGP, it is the emulator class. For linked GP/DGP, it is the lgp class.
Return type:: class

dgpsi.utils.set_thread(value): Set number of numba thread.

dgpsi.utils.summary(obj, tablefmt='fancy_grid')

Summarize key information of GP, DGP, and Linked (D)GP structures.

Parameters:

obj (class) –
obj can be one of the following:
1. an instance of kernel class;
2. an instance of gp class;
3. an instance of dgp class;
4. an instance of emulator class;
5. an instance of lgp class
tablefmt (str) – the style of output summary table. See https://pypi.org/project/tabulate/ for different options. Defaults to fancy_grid.

Returns:

a table summarizing key information contained in obj.

Return type:

string

dgpsi.utils.write(emu, pkl_file)

Save the constructed emulator to a .pkl file.

Parameters:

emu (class) – an emulator class. For GP, it is the gp class after training. For DGP, it is the emulator class. For linked GP/DGP, it is the lgp class.
pkl_file (strings) – the path to and the name of the .pkl file to which the emulator specified by emu is saved.

functions module

dgpsi.functions.IJ_matern(X, z_m, z_v, length)

dgpsi.functions.IJ_sexp(X, z_m, z_v, length, R2sexp, Psexp)

dgpsi.functions.Pmatrix(X)

dgpsi.functions.cond_mean(x, z, w1, global_w1, Rinv_y, length, name): Make GP predictions.

dgpsi.functions.fmvn(cov): Generate multivariate Gaussian random samples without means.

dgpsi.functions.fmvn_mu(mu, cov): Generate multivariate Gaussian random samples with means.

dgpsi.functions.fod_exp(X, K)

dgpsi.functions.g(coef1, coef2, x, name)

dgpsi.functions.ghdiag(fct, mu, var, y)

dgpsi.functions.gp(x, z, w1, global_w1, Rinv, Rinv_y, scale, length, nugget, name): Make GP predictions

dgpsi.functions.gp_non_parallel(x, z, w1, global_w1, Rinv, Rinv_y, scale, length, nugget, name): Make GP predictions

dgpsi.functions.k_one_vec(X, z, length, name): Compute cross-correlation matrix between the testing and training input data.

dgpsi.functions.link_gp(m, v, z, w1, global_w1, Rinv, Rinv_y, R2sexp, Psexp, scale, length, nugget, name): Make linked GP predictions.

dgpsi.functions.link_gp_non_parallel(m, v, z, w1, global_w1, Rinv, Rinv_y, R2sexp, Psexp, scale, length, nugget, name): Make linked GP predictions.

dgpsi.functions.logdet_nb(L)

dgpsi.functions.matern_coef(v, u)

dgpsi.functions.matern_multi(v, u)

dgpsi.functions.matern_one(v, u)

dgpsi.functions.mice_var(x, x_extra, input_dim, connect, name, length, scale, nugget, nugget_s): Calculate smoothed predictive variances of the GP using the candidate design set.

dgpsi.functions.pdist_matern_coef(X)

dgpsi.functions.pdist_matern_multi(X)

dgpsi.functions.pdist_matern_one(X)

dgpsi.functions.randn(d0, d1, ..., dn)

Return a sample (or samples) from the “standard normal” distribution.

Note

This is a convenience function for users porting code from Matlab, and wraps standard_normal. That function takes a tuple to specify the size of the output, which is consistent with other NumPy functions like numpy.zeros and numpy.ones.

Note

New code should use the standard_normal method of a default_rng() instance instead; please see the random-quick-start.

If positive int_like arguments are provided, randn generates an array of shape (d0, d1, ..., dn), filled with random floats sampled from a univariate “normal” (Gaussian) distribution of mean 0 and variance 1. A single float randomly sampled from the distribution is returned if no argument is provided.

Parameters

d0, d1, …, dnint, optional: The dimensions of the returned array, must be non-negative. If no argument is given a single Python float is returned.

Returns

Zndarray or float: A (d0, d1, ..., dn)-shaped array of floating-point samples from the standard normal distribution, or a single such float if no parameters were supplied.

Notes

For random samples from \(N(\mu, \sigma^2)\), use:

sigma * np.random.randn(...) + mu

Examples

>>> np.random.randn()
2.1923875335537315  # random

Two-by-four array of samples from N(3, 6.25):

>>> 3 + 2.5 * np.random.randn(2, 4)
array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
       [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random

dgpsi.functions.trace_nb(K)

dgpsi.functions.trace_sum(A, B)

dgpsi.functions.update_f(f, nu, theta): Update ESS proposal samples.

synthetic module

class dgpsi.synthetic.path(X, all_layer)

Bases: object

generate(N)

static k_matrix(X, length, name)

API Reference

dgp module

emulation module

gp module

imputation module

Parameters

Returns

See Also

Notes

Examples

kernel_class module

likelihood_class module

linkgp module

utils module

functions module

Parameters

Returns

See Also

Notes

Examples

synthetic module