ramsey.experimental
Experimental modules such as Gaussian processes or Bayesian neural networks.
Note
Experimental code is not native Ramsey code and subject to change, and might even get deleted in the future.
Better don't build critical code bases around the :code:ramsey.experimental
submodule.
Distributions
ramsey.experimental.Autoregressive
Bases: Distribution
An autoregressive model.
Attributes:
Name | Type | Description |
---|---|---|
parameters |
Array
|
an initializer object from Flax |
parameters |
Optional[Array]
|
an initializer object from Flax |
ar_coefficients = ar_coefficients
instance-attribute
arg_constraints = {'loc': constraints.real, 'ar_coefficients': constraints.real_vector, 'scale': constraints.positive}
class-attribute
instance-attribute
length = length
instance-attribute
loc = loc
instance-attribute
p = len(ar_coefficients)
instance-attribute
reparametrized_params = ['loc', 'scale', 'ar_coefficients']
class-attribute
instance-attribute
scale = scale
instance-attribute
support = constraints.real_vector
class-attribute
instance-attribute
__init__(loc, ar_coefficients, scale, length=None)
Construct an autoregressive distribution.
log_prob(value: Array)
Compute the log probability of a value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
value |
Array
|
one-dimensional array of floats |
required |
Returns:
Type | Description |
---|---|
float
|
returns the mean |
mean(length: Optional[int] = None, initial_state: Optional[float] = None)
Compute the mean of the autoregressive distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
length |
Optional[int]
|
"length" of the autoregressive sequence. If None, takes length supplied to constructor during construction of object. |
None
|
initial_state |
Optional[float]
|
initial state of the distribution. If None, takes mean |
None
|
Returns:
Type | Description |
---|---|
float
|
returns the mean |
sample(rng_key: jr.PRNGKey, length: Optional[int] = None, initial_state: Optional[float] = None, sample_shape=())
Sample from the distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rng_key |
PRNGKey
|
a random key for seeding |
required |
length |
Optional[int]
|
length of sequence |
None
|
initial_state |
Optional[float]
|
an initial value |
None
|
sample_shape |
a tuple of the form (shape,) |
()
|
Returns:
Type | Description |
---|---|
Array
|
returns an array of values |
Models
ramsey.experimental.BNN
Bases: Module
A Bayesian neural network.
The BNN layers can a mix of Bayesian layers and conventional layers. The training objective is the ELBO and is calculated according to [1].
Attributes:
Name | Type | Description |
---|---|---|
layers |
Iterable[Module]
|
layers of the BNN |
family |
Family
|
exponential family of the response |
References
[1] Blundell C., Cornebise J., Kavukcuoglu K., Wierstra D. "Weight Uncertainty in Neural Networks". ICML, 2015.
family: Family = Gaussian()
class-attribute
instance-attribute
layers: Iterable[nn.Module]
instance-attribute
__call__(x: Array, **kwargs)
Transform the inputs through the Bayesian neural network.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
Input data of dimension (*batch_dims, spatial_dims..., feature_dims) |
required | |
**kwargs |
Keyword arguments can include: - outputs: jax.Array. If an argument called outputs is provided, computes the loss (negative ELBO) together with a predictive posterior distribution |
{}
|
Returns:
Type | Description |
---|---|
Union[distribution, Tuple[distribution, float]]
|
if 'outputs' is provided as keyword argument, returns a tuple of the predictive distribution and the negative ELBO which can be used as loss for optimzation. If 'outputs' is not provided, returns the predictive distribution only. |
ramsey.experimental.RANP
Bases: ANP
A recurrent attentive neural process.
Implements the core structure of a recurrent attentive neural process cross-attention module.
Attributes:
Name | Type | Description |
---|---|---|
decoder |
Sequential
|
the decoder can be any network, but is typically an MLP. Note that the last layer of the decoder needs to have twice the number of nodes as the data you try to model |
latent_encoder |
Optional[Tuple[Module, Module]]
|
a tuple of two |
deterministic_encoder |
Optional[Tuple[Module, Attention]]
|
a tuple of a |
family |
Family
|
distributional family of the response variable |
decoder: nn.Module
instance-attribute
deterministic_encoder: Optional[Tuple[nn.Module, Attention]] = None
class-attribute
instance-attribute
family: Family = Gaussian()
class-attribute
instance-attribute
latent_encoder: Optional[Tuple[nn.Module, nn.Module]] = None
class-attribute
instance-attribute
__call__(x_context: Array, y_context: Array, x_target: Array, **kwargs)
Transform the inputs through the neural process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x_context |
Array
|
Input data of dimension (*batch_dims, spatial_dims..., feature_dims) |
required |
y_context |
Array
|
Input data of dimension (*batch_dims, spatial_dims..., response_dims) |
required |
x_target |
Array
|
Input data of dimension (*batch_dims, spatial_dims..., feature_dims) |
required |
**kwargs |
Keyword arguments can include: - y_target: jax.Array. If an argument called 'y_target' is provided, computes the loss (negative ELBO) together with a predictive posterior distribution |
{}
|
Returns:
Type | Description |
---|---|
Union[distribution, Tuple[distribution, float]]
|
If 'y_target' is provided as keyword argument, returns a tuple of the predictive distribution and the negative ELBO which can be used as loss for optimization. If 'y_target' is not provided, returns the predictive distribution only. |
setup()
Construct all networks.
ramsey.experimental.GP
Bases: Module
A Gaussian process.
Attributes:
Name | Type | Description |
---|---|---|
kernel |
Kernel
|
a covariance function |
sigma_init |
Optional[Initializer]
|
an initializer object from Flax |
kernel: Kernel
instance-attribute
sigma_init: Optional[initializers.Initializer] = None
class-attribute
instance-attribute
__call__(x: Array, **kwargs)
Evaluate the Gaussian process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
training point x |
required | |
**kwargs |
Keyword arguments can include: - outputs: jax.Array. - inputs_star: jax.Array |
{}
|
Returns:
Type | Description |
---|---|
distribution
|
returns a multivariate normal distribution object |
References
.. [1] Rasmussen, Carl E and Williams, Chris KI. "Gaussian Processes for Machine Learning". MIT press, 2006.
ramsey.experimental.SparseGP
Bases: Module
A sparse Gaussian process.
Attributes:
Name | Type | Description |
---|---|---|
kernel |
Kernel
|
a covariance function |
n_inducing |
int
|
number of inducing points |
jitter |
float
|
jitter to add to the covariance matrix diagonal |
log_sigma_init |
Optional[Initializer]
|
an initializer object from Flax |
inducing_init |
Optional[Initializer]
|
an initializer object from Flax |
References
[1] Titsias, Michalis K. "Variational Learning of Inducing Variables in Sparse Gaussian Processes". AISTATS, 2009
inducing_init: Optional[initializers.Initializer] = initializers.uniform(1)
class-attribute
instance-attribute
jitter: Optional[float] = 1e-07
class-attribute
instance-attribute
kernel: Kernel
instance-attribute
log_sigma_init: Optional[initializers.Initializer] = initializers.constant(jnp.log(1.0))
class-attribute
instance-attribute
n_inducing: int
instance-attribute
__call__(x: Array, **kwargs)
Call the sparse GP.
Modules
ramsey.experimental.BayesianLinear
Bases: Module
Linear Bayesian layer.
A linear Bayesian layer using distributions over weights and bias. The KL divergences between the variational posteriors and priors for weigths and bias are calculated. The KL divergence terms can be used to obtain the ELBO as an objective to train a Bayesian neural network.
Attributes:
Name | Type | Description |
---|---|---|
output_size |
int
|
number of layer outputs |
with_bias |
bool
|
control usage of bias term |
w_prior |
Optional[Distribution]
|
prior distribution for weights |
b_prior |
Optional[Distribution]
|
prior distribution for bias |
name |
Optional[str]
|
name of the layer |
kwargs |
keyword arguments
|
you can supply the initializers for the parameters of the priors using the keyword arguments. For instance, if your prior on the weights is a dist.Normal(loc, scale) then you can supply hk.initializers.Initializer objects with names w_loc_init and w_scale_init as keyword arguments. Likewise you can supply initializers called b_loc_init and b_scale_init for the prior on the bias. If your prior on the weights is a dist.Uniform(low, high) you will need to supply initializers called w_low_init and w_high_init |
References
.. [1] Blundell C., Cornebise J., Kavukcuoglu K., Wierstra D. "Weight Uncertainty in Neural Networks". ICML, 2015.
b_prior: Optional[dist.Distribution] = dist.Normal(loc=0.0, scale=1.0)
class-attribute
instance-attribute
mc_sample_size: int = 10
class-attribute
instance-attribute
name: Optional[str] = None
class-attribute
instance-attribute
output_size: int
instance-attribute
use_bias: bool = True
class-attribute
instance-attribute
w_prior: Optional[dist.Distribution] = dist.Normal(loc=0.0, scale=1.0)
class-attribute
instance-attribute
__call__(x: Array, is_training: bool = False)
Call a sparse Gaussian process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
layer inputs |
required | |
is_training |
bool
|
training mode where KL divergence terms are calculated and returned |
False
|
setup()
Construct a linear Bayesian layer.
Covariance functions
ramsey.experimental.ExponentiatedQuadratic
Bases: Kernel
, Module
Exponentiated quadratic covariance function.
Attributes:
Name | Type | Description |
---|---|---|
active_dims |
Optional[list]
|
either None or a list of integers. Specified the dimensions of the data on which the kernel operates on |
rho_init |
Optional[Initializer]
|
an initializer object from Haiku or None |
sigma_init |
Optional[Initializer]
|
an initializer object from Haiku or None |
name |
Optional[str]
|
name of the layer |
active_dims: Optional[list] = None
class-attribute
instance-attribute
rho_init: Optional[initializers.Initializer] = None
class-attribute
instance-attribute
sigma_init: Optional[initializers.Initializer] = None
class-attribute
instance-attribute
__add__(other)
Add two kernels.
__call__(x1: Array, x2: Array = None)
Call the covariance function.
__mul__(other)
Multiply two kernels.
setup()
Construct a stationary covariance.
ramsey.experimental.exponentiated_quadratic(x1: Array, x2: Array, sigma: float, rho: Union[float, jnp.ndarray])
Exponentiated-quadratic convariance function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x1 |
Array
|
( |
required |
x2 |
Array
|
( |
required |
sigma |
float
|
the standard deviation of the kernel function |
required |
rho |
Union[float, ndarray]
|
the lengthscale of the kernel function. Can be a float or a
:math: |
required |
Returns:
Type | Description |
---|---|
Array
|
returns a ( |
ramsey.experimental.Linear
Bases: Kernel
, Module
Linear covariance function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
active_dims |
the indexes of the dimensions the kernel acts upon |
required | |
sigma_b_init |
an initializer object from Flax or None |
required | |
sigma_v_init |
an initializer object from Flax or None |
required | |
offset_init |
an initializer object from Flax or None |
required |
active_dims: Optional[list] = None
class-attribute
instance-attribute
offset_init: Optional[initializers.Initializer] = initializers.uniform()
class-attribute
instance-attribute
sigma_b_init: Optional[initializers.Initializer] = initializers.uniform()
class-attribute
instance-attribute
sigma_v_init: Optional[initializers.Initializer] = initializers.uniform()
class-attribute
instance-attribute
__add__(other)
Add two kernels.
__call__(x1: Array, x2: Array = None)
Call the covariance function.
__mul__(other)
Multiply two kernels.
setup()
Construct parameters.
ramsey.experimental.linear(x1: Array, x2: Array, sigma_b, sigma_v, offset)
Linear convariance function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x1 |
Array
|
:math: |
required |
x2 |
Array
|
:math: |
required |
sigma_b |
the standard deviation of the kernel function |
required | |
sigma_v |
the standard deviation of the kernel function |
required | |
offset |
|
required |
Returns:
Type | Description |
---|---|
Array
|
returns a :math: |
ramsey.experimental.Periodic
Bases: Kernel
, Module
Periodic covariance function.
Attributes:
Name | Type | Description |
---|---|---|
period |
float
|
the period of the periodic kernel |
active_dims |
Optional[list]
|
either None or a list of integers. Specified the dimensions of the data on which the kernel operates on |
rho_init |
Optional[Initializer]
|
an initializer object from Haiku or None |
sigma_init |
Optional[Initializer]
|
an initializer object from Haiku or None |
active_dims: Optional[list] = None
class-attribute
instance-attribute
period: float
instance-attribute
rho_init: Optional[initializers.Initializer] = initializers.uniform()
class-attribute
instance-attribute
sigma_init: Optional[initializers.Initializer] = initializers.uniform()
class-attribute
instance-attribute
__add__(other)
Add two kernels.
__call__(x1: Array, x2: Array = None)
Call the covariance function.
__mul__(other)
Multiply two kernels.
setup()
Construct the covariance function.
ramsey.experimental.periodic(x1: Array, x2: Array, period, sigma, rho)
Periodic convariance function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x1 |
Array
|
( |
required |
x2 |
Array
|
( |
required |
period |
the period |
required | |
sigma |
the standard deviation of the kernel function |
required | |
rho |
the lengthscale of the kernel function. Can be a float or a
:math: |
required |
Returns:
Type | Description |
---|---|
Array
|
returns a ( |
Train functions
ramsey.experimental.train_gaussian_process(rng_key: jr.PRNGKey, gaussian_process: GP, x: Array, y: Array, optimizer=optax.adam(0.003), n_iter=1000, verbose=False)
Train a Gaussian process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rng_key |
PRNGKey
|
a key for seeding random number generators |
required |
gaussian_process |
GP
|
a GP object |
required |
x |
Array
|
an input array of dimension :math: |
required |
y |
Array
|
|
required |
optimizer |
an optax optimizer |
adam(0.003)
|
|
n_iter |
number of training iterations |
1000
|
|
verbose |
print training details |
False
|
Returns:
Type | Description |
---|---|
Tuple[dict, Array]
|
a tuple of training parameters and training losses |
ramsey.experimental.train_sparse_gaussian_process(rng_key: jr.PRNGKey, gaussian_process: SparseGP, x: Array, y: Array, optimizer=optax.adam(0.003), n_iter=1000, verbose=False)
Train a sparse Gaussian process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rng_key |
PRNGKey
|
a key for seeding random number generators |
required |
gaussian_process |
SparseGP
|
a SparseGP object |
required |
x |
Array
|
an input array of dimension :math: |
required |
y |
Array
|
|
required |
optimizer |
an optax optimizer |
adam(0.003)
|
|
n_iter |
number of training iterations |
1000
|
|
verbose |
print training details |
False
|
Returns:
Type | Description |
---|---|
Tuple[dict, Array]
|
a tuple of training parameters and training losses |