Layers¶
This submodule contains Keras layers for building SPNs. Besides leaf layers and regularization layers, there are two main groups of layers:
Region layers for arbitrary decompositions of variables. Must be preceded with a
FlatToRegions
-BaseLeaf
-PermuteAndPadScopes
block. Regions are arbitrary sets of variables. A region graph describes how these sets of variables hierarchically define a probability distribution.Spatial layers for Deep Generalized Convolutional Sum Product Networks
All layers propagate log probabilities in the forward pass. So in case you want to know about the ‘raw’ probability in linear space, you simply pass the output of a layer through \(\exp\).
Leaf layers¶
Leaf layers transform raw observations to probabilities.
NormalLeaf
,CauchyLeaf
andLaplaceLeaf
can be used for continuous inputs.IndicatorLeaf
should be used for discrete inputs.
If a variable is not part of the evidence, that means that variable should be marginalized out. This can be done by replacing the output of the corresponding components with 0 since that corresponds with 1 in log-space.
Continuous leaf layers¶
-
class
libspn_keras.layers.
NormalLeaf
(num_components, location_initializer=None, location_trainable=True, scale_initializer=None, scale_trainable=False, accumulator_initializer=None, use_accumulators=False, scale_constraint=None, **kwargs)¶ Computes the log probability of multiple components per variable along the final axis.
Each component is modelled as a normal distribution with a diagonal covariance matrix.
- Parameters
num_components (
int
) – Number of components per variablelocation_initializer (
Optional
[Initializer
]) – Initializer for location variablelocation_trainable (
bool
) – Boolean that indicates whether location is trainablescale_initializer (
Optional
[Initializer
]) – Initializer for scale variablescale_trainable (
bool
) – Boolean that indicates whether scale is trainable**kwargs – kwargs to pass on to the keras.Layer super class
-
class
libspn_keras.layers.
CauchyLeaf
(num_components, location_initializer=None, location_trainable=True, scale_initializer=None, accumulator_initializer=None, use_accumulators=False, scale_trainable=False, **kwargs)¶ Computes the log probability of multiple components per variable along the final axis.
Each component is modelled as a Cauchy distribution with a diagonal location matrix.
- Parameters
num_components (
int
) – Number of components per variablelocation_initializer (
Optional
[Initializer
]) – Initializer for location variablelocation_trainable (
bool
) – Boolean that indicates whether location is trainablescale_initializer (
Optional
[Initializer
]) – Initializer for scale variable**kwargs – kwargs to pass on to the keras.Layer super class
-
class
libspn_keras.layers.
LaplaceLeaf
(num_components, location_initializer=None, location_trainable=True, scale_initializer=None, accumulator_initializer=None, use_accumulators=False, **kwargs)¶ Computes the log probability of multiple components per variable along the final axis.
Each component is modelled as a Laplace distribution with a diagonal location matrix.
- Parameters
num_components (
int
) – Number of components per variablelocation_initializer (
Optional
[Initializer
]) – Initializer for location variablelocation_trainable (
bool
) – Boolean that indicates whether location is trainablescale_initializer (
Optional
[Initializer
]) – Initializer for scale variable**kwargs – kwargs to pass on to the keras.Layer super class
Discrete leaf layers¶
-
class
libspn_keras.layers.
IndicatorLeaf
(num_components, dtype=tf.int32, **kwargs)¶ Indicator leaf distribution taking integer inputs and producing a discrete indicator representation.
This effectively comes down to computing a one-hot representation along the final axis.
- Parameters
num_components (int) – Number of components, or indicators in this context.
dtype (
DType
) – Dtype of input**kwargs – Kwargs to pass onto the
keras.layers.Layer
superclass.
Region layers¶
Region layers assume the tensors that are passed between them are of the shape
[num_scopes, num_decomps, num_batch, num_nodes]
. One region is given by the scope index + the
decomposition (so it is indexed on the first two axes). This shape is chosen so that matmul
operations done in DenseSum
layers don’t always require transposing first.
-
class
libspn_keras.layers.
FlatToRegions
(num_decomps, **kwargs)¶ Flat representation to a dense representation.
Reshapes a flat input of shape
[batch, num_vars[, var_dimensionality]]
to[batch, num_vars == scopes, decomp, var_dimensionality]
If
var_dimensionality
is 1, the shape can also be[batch, num_vars]
.- Parameters
**kwargs – Keyword arguments to pass on the keras.Layer super class
Permutation layers¶
These layers permute variables so that this only has to be done once at the bottom of the network .. autoclass:: libspn_keras.layers.PermuteAndPadScopes .. autoclass:: libspn_keras.layers.PermuteAndPadScopesRandom
Sum Layers¶
-
class
libspn_keras.layers.
DenseSum
(num_sums, logspace_accumulators=None, accumulator_initializer=None, sum_op=None, accumulator_regularizer=None, logspace_accumulator_constraint=None, linear_accumulator_constraint=None, **kwargs)¶ Computes densely connected sums per scope and decomposition.
Expects incoming
Tensor
to be of shape [num_scopes, num_decomps, num_batch, num_nodes]. If your input is passed through aFlatToRegions
layer this is already taken care of.- Parameters
num_sums (
int
) – Number of sums per scopelogspace_accumulators (
Optional
[bool
]) – IfTrue
, accumulators will be represented in log-space which is typically used withSumOpGradBackprop
. IfFalse
, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. IfNone
(default) it will be set toTrue
forSumOpGradBackprop
andFalse
otherwise.accumulator_initializer (
Optional
[Initializer
]) – Initializer for accumulator. Will automatically be converted to log-space values iflogspace_accumulators
is enabled.accumulator_regularizer (
Optional
[Regularizer
]) – Regularizer for accumulator (experimental)linear_accumulator_constraint (
Optional
[Constraint
]) – Constraint for accumulator defaults to constraint that ensures small positive constant at minimum. Will be ignored if logspace_accumulators is set to True.sum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
**kwargs – kwargs to pass on to keras.Layer super class
-
class
libspn_keras.layers.
RootSum
(return_weighted_child_logits=True, logspace_accumulators=None, accumulator_initializer=None, trainable=True, logspace_accumulator_constraint=None, accumulator_regularizer=None, linear_accumulator_constraint=None, sum_op=None, **kwargs)¶ Final sum of an SPN. Expects input to be in log-space and produces log-space output.
- Parameters
return_weighted_child_logits (
bool
) – If True, returns a weighted child log probability, which can be used for e.g. (Sparse)CategoricalCrossEntropy losses. If False, computes the weighted sum of the input, which effectively is the log probability of the distribution defined by the SPN.logspace_accumulators (
Optional
[bool
]) – IfTrue
, accumulators will be represented in log-space which is typically used withSumOpGradBackprop
. IfFalse
, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. IfNone
(default) it will be set toTrue
forSumOpGradBackprop
andFalse
otherwise.accumulator_initializer (
Optional
[Initializer
]) – Initializer for accumulator. If None, defaults to initializers.Constant(1.0)accumulator_regularizer (
Optional
[Regularizer
]) – Regularizer for accumulator.linear_accumulator_constraint (
Optional
[Constraint
]) – Constraint for linear accumulators. Defaults to a constraint that ensures a minimum of a small positive constant. If logspace_accumulators is set to True, this constraint wil be ignoredsum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
**kwargs – kwargs to pass on to the keras.Layer super class
Product Layers¶
-
class
libspn_keras.layers.
DenseProduct
(num_factors, **kwargs)¶ Computes products per decomposition and scope by an ‘n-order’ outer product.
Assumes the incoming tensor is of shape
[num_scopes, num_decomps, num_batch, num_nodes]
and produces an output of[num_scopes // num_factors, num_decomps, num_batch, num_nodes ** num_factors]
. It can be considered a dense product as it computes all possible products given the scopes it has to merge.- Parameters
num_factors (int) – Number of factors per product
**kwargs – kwargs to pass on to the keras.Layer super class
-
class
libspn_keras.layers.
ReduceProduct
(num_factors, **kwargs)¶ Computes products per decomposition and scope by reduction.
Assumes the incoming tensor is of shape
[num_batch, num_scopes, num_decomps, num_nodes]
and produces an output of[num_batch, num_scopes // num_factors, num_decomps, num_nodes]
.- Parameters
num_factors (
int
) – Number of factors per product**kwargs – kwargs to pass on to the keras.Layer super class.
Spatial layers¶
Spatial layers are layers needed for building DGC-SPNs. The
final layer of an SPN should still be a RootSum
. Use SpatialToRegions
to convert the output
from a spatial SPN to a region SPN.
-
class
libspn_keras.layers.
Local2DSum
(num_sums, logspace_accumulators=None, accumulator_initializer=None, sum_op=None, accumulator_regularizer=None, logspace_accumulator_constraint=None, linear_accumulator_constraint=None, **kwargs)¶ Computes a spatial local sum, i.e. all cells will have unique weights.
In other words, there is no weight sharing across the spatial axes.
- Parameters
num_sums (
int
) – Number of sums per spatial cell. Corresponds to the number of channels in the outputlogspace_accumulators (
Optional
[bool
]) – IfTrue
, accumulators will be represented in log-space which is typically used withSumOpGradBackprop
. IfFalse
, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. IfNone
(default) it will be set toTrue
forSumOpGradBackprop
andFalse
otherwise.accumulator_initializer (
Optional
[Initializer
]) – Initializer for accumulatoraccumulator_regularizer (
Optional
[Regularizer
]) – Regularizer for accumulatorslinear_accumulator_constraint (
Optional
[Constraint
]) – Constraint for accumulators (only applied if log_space_accumulators==False)sum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
**kwargs – kwargs to pass on to the keras.Layer super class
-
class
libspn_keras.layers.
Conv2DSum
(num_sums, logspace_accumulators=None, accumulator_initializer=None, sum_op=None, accumulator_regularizer=None, logspace_accumulator_constraint=None, linear_accumulator_constraint=None, **kwargs)¶ Computes a convolutional sum, i.e. weights are shared across the spatial axes.
- Parameters
num_sums (
int
) – Number of sums per spatial cell. Corresponds to the number of channels in the outputlogspace_accumulators (
Optional
[bool
]) – IfTrue
, accumulators will be represented in log-space which is typically used withSumOpGradBackprop
. IfFalse
, accumulators will be represented in linear space. Weights are computed by normalizing the accumulators per sum, so that we always end up with a normalized SPN. IfNone
(default) it will be set toTrue
forSumOpGradBackprop
andFalse
otherwise.accumulator_initializer (
Optional
[Initializer
]) – Initializer for accumulatorsum_op (SumOpBase) – SumOpBase instance which determines how to compute the forward and backward pass of the weighted sums
accumulator_regularizer (
Optional
[Regularizer
]) – Regularizer for accumulatorslinear_accumulator_constraint (
Optional
[Constraint
]) – Constraint for accumulators (only applied if log_space_accumulators==False)**kwargs – kwargs to pass on to the keras.Layer super class
-
class
libspn_keras.layers.
Conv2DProduct
(strides, dilations, kernel_size, num_channels=None, padding='valid', depthwise=False, **kwargs)¶ Convolutional product as described in (Van de Wolfshaar and Pronobis, 2019).
Expects log-space inputs and produces log-space outputs.
- Parameters
strides (
List
[int
]) – A tuple or list of stridesdilations (
List
[int
]) – A tuple or list of dilationskernel_size (
List
[int
]) – A tuple or list of kernel sizesnum_channels (
Optional
[int
]) – Number of channels. If None, will be set to num_in_channels ** prod(kernel_sizes). This can be source of OOM problems quickly.padding (
str
) – Can be either ‘full’, ‘valid’ or ‘final’. Use ‘final’ for the top ConvProduct of a DGC-SPN. The other choices have the standard interpretation. Valid padding usually requires non-overlapping patches, whilst full padding is used with overlapping patches and expontentially increasing dilation rates, see also [Van de Wolfshaar, Pronobis (2019)].depthwise (
bool
) – Whether to use depthwise convolutions. If True, the value of num_channels will be ignored**kwargs – Keyword arguments to pass on to the keras.Layer superclass.
References
Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations, Van de Wolfshaar, Pronobis (2019)
-
class
libspn_keras.layers.
SpatialToRegions
(*args, **kwargs)¶ Reshapes spatial SPN layer to a dense layer.
The dense output has leading dimensions for scopes and decomps (which will be
[1, 1]
).
Dynamic SPN layers¶
For reusing SPN structures along the temporal dimension one can implement dynamic SPNs. These rely
on template SPNs, top SPNs and an interface. The interface of the previous timestep and the
template at the current timestep can be combined through TemporalDenseProduct
.
-
class
libspn_keras.layers.
TemporalDenseProduct
(*args, **kwargs)¶ Computes ‘temporal’ dense products.
This is used to connect an interface stack at \(t - 1\) of a dynamic SPN with a template SPN at \(t\). Computes a product of all possible combinations of nodes along the last axis of the two incoming layers.
- Parameters
**kwargs – kwargs to pass on to the keras.Layer super class.
Regularization layers¶
-
class
libspn_keras.layers.
LogDropout
(rate, noise_shape=None, seed=None, axis_at_least_one=None, **kwargs)¶ Log dropout layer.
Applies dropout in log-space. Should not precede product layers in an SPN, since their scope probability then potentially becomes -inf, resulting in NaN-values during training.
- Parameters
rate (
float
) – Rate at which to randomly dropout inputs.noise_shape (
Optional
[Tuple
[int
, …]]) – Shape of dropout noise tensorseed (
Optional
[int
]) – Random seed**kwargs – kwargs to pass on to the keras.Layer super class
Normalization¶
Normalize axes¶
-
class
libspn_keras.layers.
NormalizeAxes
(value)¶ Enum for normalization axes.
Enumerates possible choices for normalization axes.
SAMPLE
orresponds to normalizing each sample.VARIABLE_WISE
is for normalizing the values for each variable using statistics across all samples, whileGLOBAL
corresponds to statistics gathered from all input values (no specific axes excluded from reduction).-
SAMPLE_WISE
= 'sample-wise'¶ Normalize each sample
-
VARIABLE_WISE
= 'variable-wise'¶ Normalize each variable
-
GLOBAL
= 'global'¶ Normalize using all variables across all samples
-
Normalize layers¶
-
class
libspn_keras.layers.
NormalizeStandardScore
(normalization_epsilon=1e-08, axes=<NormalizeAxes.SAMPLE_WISE: 'sample-wise'>, **kwargs)¶ Normalizes samples to a standard score.
In other words, the output is the input minus its mean and divided by the standard deviation. This can be used to achieve the same kind of normalization as used in (Poon and Domingos, 2011).
- Parameters
normalization_epsilon (
float
) – Small positive constant to prevent division by zero, but could also be used a ‘smoothing’ factor.axes (
NormalizeAxes
) – IfNormalizationAxes.SAMPLE_WISE
, will compute z-scores where statistics are computed sample-wise, otherwise, computes z-scores through computing cross-sample statistics. To use cross-sample statistics, callNormalizeStandardScore.adapt(train_ds)
wheretrain_ds
is an instance oftf.data.Dataset
.**kwargs – kwargs to pass on to the keras.Layer super class
References
Sum-Product Networks, a New Deep Architecture Poon and Domingos, 2011
-
adapt
(ds)¶ Compute cross-sample statistics. Assumes that
ds
is a batched dataset.- Parameters
ds (
DatasetV2
) – Instance oftf.data.Dataset
containing train data to compute statistics from.- Raises
RuntimeError – Raised when axes are set to SAMPLE_WISE, in which case there is no point in calling adapt.
- Return type
None