Layers¶

This submodule contains Keras layers for building SPNs. Besides leaf layers and regularization layers, there are two main groups of layers:

Region layers for arbitrary decompositions of variables. Must be preceded with a FlatToRegions - BaseLeaf - PermuteAndPadScopes block. Regions are arbitrary sets of variables. A region graph describes how these sets of variables hierarchically define a probability distribution.
Spatial layers for Deep Generalized Convolutional Sum Product Networks

All layers propagate log probabilities in the forward pass. So in case you want to know about the ‘raw’ probability in linear space, you simply pass the output of a layer through \(\exp\).

Leaf layers¶

Leaf layers transform raw observations to probabilities.

NormalLeaf, CauchyLeaf and LaplaceLeaf can be used for continuous inputs.
IndicatorLeaf should be used for discrete inputs.

If a variable is not part of the evidence, that means that variable should be marginalized out. This can be done by replacing the output of the corresponding components with 0 since that corresponds with 1 in log-space.

Continuous leaf layers¶

class libspn_keras.layers.NormalLeaf(num_components, location_initializer=None, location_trainable=True, scale_initializer=None, scale_trainable=False, accumulator_initializer=None, use_accumulators=False, scale_constraint=None, **kwargs)¶

Computes the log probability of multiple components per variable along the final axis.

Each component is modelled as a normal distribution with a diagonal covariance matrix.

Parameters

num_components (int) – Number of components per variable
location_initializer (Optional[Initializer]) – Initializer for location variable
location_trainable (bool) – Boolean that indicates whether location is trainable
scale_initializer (Optional[Initializer]) – Initializer for scale variable
scale_trainable (bool) – Boolean that indicates whether scale is trainable
**kwargs – kwargs to pass on to the keras.Layer super class

class libspn_keras.layers.CauchyLeaf(num_components, location_initializer=None, location_trainable=True, scale_initializer=None, accumulator_initializer=None, use_accumulators=False, scale_trainable=False, **kwargs)¶

Computes the log probability of multiple components per variable along the final axis.

Each component is modelled as a Cauchy distribution with a diagonal location matrix.

Parameters

num_components (int) – Number of components per variable
location_initializer (Optional[Initializer]) – Initializer for location variable
location_trainable (bool) – Boolean that indicates whether location is trainable
scale_initializer (Optional[Initializer]) – Initializer for scale variable
**kwargs – kwargs to pass on to the keras.Layer super class

class libspn_keras.layers.LaplaceLeaf(num_components, location_initializer=None, location_trainable=True, scale_initializer=None, accumulator_initializer=None, use_accumulators=False, **kwargs)¶

Computes the log probability of multiple components per variable along the final axis.

Each component is modelled as a Laplace distribution with a diagonal location matrix.

Parameters

num_components (int) – Number of components per variable
location_initializer (Optional[Initializer]) – Initializer for location variable
location_trainable (bool) – Boolean that indicates whether location is trainable
scale_initializer (Optional[Initializer]) – Initializer for scale variable
**kwargs – kwargs to pass on to the keras.Layer super class

Discrete leaf layers¶

class libspn_keras.layers.IndicatorLeaf(num_components, dtype=tf.int32, **kwargs)¶

Indicator leaf distribution taking integer inputs and producing a discrete indicator representation.

This effectively comes down to computing a one-hot representation along the final axis.

Parameters

num_components (int) – Number of components, or indicators in this context.
dtype (DType) – Dtype of input
**kwargs – Kwargs to pass onto the keras.layers.Layer superclass.

Region layers¶

Region layers assume the tensors that are passed between them are of the shape [num_scopes, num_decomps, num_batch, num_nodes]. One region is given by the scope index + the decomposition (so it is indexed on the first two axes). This shape is chosen so that matmul operations done in DenseSum layers don’t always require transposing first.

class libspn_keras.layers.FlatToRegions(num_decomps, **kwargs)¶

Flat representation to a dense representation.

Reshapes a flat input of shape [batch, num_vars[, var_dimensionality]] to [batch, num_vars == scopes, decomp, var_dimensionality]

If var_dimensionality is 1, the shape can also be [batch, num_vars].

Parameters: **kwargs – Keyword arguments to pass on the keras.Layer super class

Permutation layers¶

These layers permute variables so that this only has to be done once at the bottom of the network .. autoclass:: libspn_keras.layers.PermuteAndPadScopes .. autoclass:: libspn_keras.layers.PermuteAndPadScopesRandom

Sum Layers¶

class libspn_keras.layers.DenseSum(num_sums, logspace_accumulators=None, accumulator_initializer=None, sum_op=None, accumulator_regularizer=None, logspace_accumulator_constraint=None, linear_accumulator_constraint=None, **kwargs)¶

Computes densely connected sums per scope and decomposition.

Expects incoming Tensor to be of shape [num_scopes, num_decomps, num_batch, num_nodes]. If your input is passed through a FlatToRegions layer this is already taken care of.

Parameters

num_sums (int) – Number of sums per scope
logspace_accumulators (Optional[bool]) – If True, accumulators will be represented in log-space which is typically used with SumOpGradBackprop. If False, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. If None (default) it will be set to True for SumOpGradBackprop and False otherwise.
accumulator_initializer (Optional[Initializer]) – Initializer for accumulator. Will automatically be converted to log-space values if logspace_accumulators is enabled.
accumulator_regularizer (Optional[Regularizer]) – Regularizer for accumulator (experimental)
linear_accumulator_constraint (Optional[Constraint]) – Constraint for accumulator defaults to constraint that ensures small positive constant at minimum. Will be ignored if logspace_accumulators is set to True.
sum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
**kwargs – kwargs to pass on to keras.Layer super class

class libspn_keras.layers.RootSum(return_weighted_child_logits=True, logspace_accumulators=None, accumulator_initializer=None, trainable=True, logspace_accumulator_constraint=None, accumulator_regularizer=None, linear_accumulator_constraint=None, sum_op=None, **kwargs)¶

Final sum of an SPN. Expects input to be in log-space and produces log-space output.

Parameters

return_weighted_child_logits (bool) – If True, returns a weighted child log probability, which can be used for e.g. (Sparse)CategoricalCrossEntropy losses. If False, computes the weighted sum of the input, which effectively is the log probability of the distribution defined by the SPN.
logspace_accumulators (Optional[bool]) – If True, accumulators will be represented in log-space which is typically used with SumOpGradBackprop. If False, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. If None (default) it will be set to True for SumOpGradBackprop and False otherwise.
accumulator_initializer (Optional[Initializer]) – Initializer for accumulator. If None, defaults to initializers.Constant(1.0)
accumulator_regularizer (Optional[Regularizer]) – Regularizer for accumulator.
linear_accumulator_constraint (Optional[Constraint]) – Constraint for linear accumulators. Defaults to a constraint that ensures a minimum of a small positive constant. If logspace_accumulators is set to True, this constraint wil be ignored
sum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
**kwargs – kwargs to pass on to the keras.Layer super class

Product Layers¶

class libspn_keras.layers.DenseProduct(num_factors, **kwargs)¶

Computes products per decomposition and scope by an ‘n-order’ outer product.

Assumes the incoming tensor is of shape [num_scopes, num_decomps, num_batch, num_nodes] and produces an output of [num_scopes // num_factors, num_decomps, num_batch, num_nodes ** num_factors]. It can be considered a dense product as it computes all possible products given the scopes it has to merge.

Parameters

num_factors (int) – Number of factors per product
**kwargs – kwargs to pass on to the keras.Layer super class

class libspn_keras.layers.ReduceProduct(num_factors, **kwargs)¶

Computes products per decomposition and scope by reduction.

Assumes the incoming tensor is of shape [num_batch, num_scopes, num_decomps, num_nodes] and produces an output of [num_batch, num_scopes // num_factors, num_decomps, num_nodes].

Parameters

num_factors (int) – Number of factors per product
**kwargs – kwargs to pass on to the keras.Layer super class.

Spatial layers¶

Spatial layers are layers needed for building DGC-SPNs. The final layer of an SPN should still be a RootSum. Use SpatialToRegions to convert the output from a spatial SPN to a region SPN.

class libspn_keras.layers.Local2DSum(num_sums, logspace_accumulators=None, accumulator_initializer=None, sum_op=None, accumulator_regularizer=None, logspace_accumulator_constraint=None, linear_accumulator_constraint=None, **kwargs)¶

Computes a spatial local sum, i.e. all cells will have unique weights.

In other words, there is no weight sharing across the spatial axes.

Parameters

num_sums (int) – Number of sums per spatial cell. Corresponds to the number of channels in the output
logspace_accumulators (Optional[bool]) – If True, accumulators will be represented in log-space which is typically used with SumOpGradBackprop. If False, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. If None (default) it will be set to True for SumOpGradBackprop and False otherwise.
accumulator_initializer (Optional[Initializer]) – Initializer for accumulator
accumulator_regularizer (Optional[Regularizer]) – Regularizer for accumulators
linear_accumulator_constraint (Optional[Constraint]) – Constraint for accumulators (only applied if log_space_accumulators==False)
sum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
**kwargs – kwargs to pass on to the keras.Layer super class

class libspn_keras.layers.Conv2DSum(num_sums, logspace_accumulators=None, accumulator_initializer=None, sum_op=None, accumulator_regularizer=None, logspace_accumulator_constraint=None, linear_accumulator_constraint=None, **kwargs)¶

Computes a convolutional sum, i.e. weights are shared across the spatial axes.

Parameters

num_sums (int) – Number of sums per spatial cell. Corresponds to the number of channels in the output
logspace_accumulators (Optional[bool]) – If True, accumulators will be represented in log-space which is typically used with SumOpGradBackprop. If False, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. If None (default) it will be set to True for SumOpGradBackprop and False otherwise.
accumulator_initializer (Optional[Initializer]) – Initializer for accumulator
sum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
accumulator_regularizer (Optional[Regularizer]) – Regularizer for accumulators
linear_accumulator_constraint (Optional[Constraint]) – Constraint for accumulators (only applied if log_space_accumulators==False)
**kwargs – kwargs to pass on to the keras.Layer super class

class libspn_keras.layers.Conv2DProduct(strides, dilations, kernel_size, num_channels=None, padding='valid', depthwise=False, **kwargs)¶

Convolutional product as described in (Van de Wolfshaar and Pronobis, 2019).

Expects log-space inputs and produces log-space outputs.

Parameters

strides (List[int]) – A tuple or list of strides
dilations (List[int]) – A tuple or list of dilations
kernel_size (List[int]) – A tuple or list of kernel sizes
num_channels (Optional[int]) – Number of channels. If None, will be set to num_in_channels ** prod(kernel_sizes). This can be source of OOM problems quickly.
padding (str) – Can be either ‘full’, ‘valid’ or ‘final’. Use ‘final’ for the top ConvProduct of a DGC-SPN. The other choices have the standard interpretation. Valid padding usually requires non-overlapping patches, whilst full padding is used with overlapping patches and expontentially increasing dilation rates, see also [Van de Wolfshaar, Pronobis (2019)].
depthwise (bool) – Whether to use depthwise convolutions. If True, the value of num_channels will be ignored
**kwargs – Keyword arguments to pass on to the keras.Layer superclass.

References

Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations, Van de Wolfshaar, Pronobis (2019)

class libspn_keras.layers.SpatialToRegions(*args, **kwargs)¶

Reshapes spatial SPN layer to a dense layer.

The dense output has leading dimensions for scopes and decomps (which will be [1, 1]).

Dynamic SPN layers¶

For reusing SPN structures along the temporal dimension one can implement dynamic SPNs. These rely on template SPNs, top SPNs and an interface. The interface of the previous timestep and the template at the current timestep can be combined through TemporalDenseProduct.

class libspn_keras.layers.TemporalDenseProduct(*args, **kwargs)¶

Computes ‘temporal’ dense products.

This is used to connect an interface stack at \(t - 1\) of a dynamic SPN with a template SPN at \(t\). Computes a product of all possible combinations of nodes along the last axis of the two incoming layers.

Parameters: **kwargs – kwargs to pass on to the keras.Layer super class.

Regularization layers¶

class libspn_keras.layers.LogDropout(rate, noise_shape=None, seed=None, axis_at_least_one=None, **kwargs)¶

Log dropout layer.

Applies dropout in log-space. Should not precede product layers in an SPN, since their scope probability then potentially becomes -inf, resulting in NaN-values during training.

Parameters

rate (float) – Rate at which to randomly dropout inputs.
noise_shape (Optional[Tuple[int, …]]) – Shape of dropout noise tensor
seed (Optional[int]) – Random seed
**kwargs – kwargs to pass on to the keras.Layer super class

Normalization¶

Normalize axes¶

class libspn_keras.layers.NormalizeAxes(value)¶

Enum for normalization axes.

Enumerates possible choices for normalization axes. SAMPLE orresponds to normalizing each sample. VARIABLE_WISE is for normalizing the values for each variable using statistics across all samples, while GLOBAL corresponds to statistics gathered from all input values (no specific axes excluded from reduction).

SAMPLE_WISE = 'sample-wise'¶: Normalize each sample

VARIABLE_WISE = 'variable-wise'¶: Normalize each variable

GLOBAL = 'global'¶: Normalize using all variables across all samples

Normalize layers¶

class libspn_keras.layers.NormalizeStandardScore(normalization_epsilon=1e-08, axes=<NormalizeAxes.SAMPLE_WISE: 'sample-wise'>, **kwargs)¶

Normalizes samples to a standard score.

In other words, the output is the input minus its mean and divided by the standard deviation. This can be used to achieve the same kind of normalization as used in (Poon and Domingos, 2011).

Parameters

normalization_epsilon (float) – Small positive constant to prevent division by zero, but could also be used a ‘smoothing’ factor.
axes (NormalizeAxes) – If NormalizationAxes.SAMPLE_WISE, will compute z-scores where statistics are computed sample-wise, otherwise, computes z-scores through computing cross-sample statistics. To use cross-sample statistics, call NormalizeStandardScore.adapt(train_ds) where train_ds is an instance of tf.data.Dataset.
**kwargs – kwargs to pass on to the keras.Layer super class

References

Sum-Product Networks, a New Deep Architecture Poon and Domingos, 2011

adapt(ds)¶

Compute cross-sample statistics. Assumes that ds is a batched dataset.

Parameters: ds (DatasetV2) – Instance of tf.data.Dataset containing train data to compute statistics from.
Raises: RuntimeError – Raised when axes are set to SAMPLE_WISE, in which case there is no point in calling adapt.
Return type: None