Feature

In bmmltools.operations.feature all the feature extraction methods are collected.

Binarizer

Binarizer is used to produce a binary image with the correct values for the operations that are applied after it. The scope of this layer is mainly technical and should be used even on images that have already been binarized. In this way the two values of a binary mask are always 0 and 1, independently on the way the input data has been produced.

Transfer function

Given an input array with entries \(I[k,j,i]\), the layer outputs

\[\begin{split}O[k,j,i] = \begin{cases} 1 &\mbox{ if } I[k,j,i] \geq TH,\\ 0 &\mbox{ otherwise,} \end{cases}\end{split}\]

where \(TH\) is a given threshold.

Initialization and parameters

In the layer initialization one have to specify:

the trace on which the this operation act.

The layer parameters of the apply() method are:

threshold: (positive float) value of the threshold below which the output is set to zero (default value is 0.5).

Inputs and outputs

The operation has the following inputs:

input 0

description: Input 3d dataset.

data type: 3d numpy array.

data shape: \((N_z,N_y,N_x)\), where \(N_i\) is the number of voxels along the i-th dimension.

The operation has the following outputs:

output 0

description: Binarized 3d dataset.

data type: 3d numpy array.

data shape: \((N_z,N_y,N_x)\), where \(N_i\) is the number of voxels along the i-th dimension for the operation input.

Patch Transform 3D

PatchTransform3D is used to apply a function patch-wise on a 3d input data. The function can be specified by the user

Transfer function

Given a patch shape \((P_z,P_y,P_x)\) and an input \(I\) having shape \((N_z,N_y,N_x)\), the layers output consist in the collection \(\lbrace h_l[k,j,i] \rbrace_{l \in [0,1,\cdots,N_{patches}-1]}\)

\[h_l[k,j,i] = f(C_l[I])[k,j,i]\]

where \(f\) is the patch transformation function, \(C_l\) is the function returning the l-th patch of the input, and \(k \in [0,1,\cdots,P_z-1]\), \(y \in [0,1,\cdots,P_y-1]\), and \(i \in [0,1,\cdots,P_x-1]\). The number of patches is selected by the user when random sampling of patches is used, while when the patch transform is computed along the zyx-grid, the number of patches is equal to

\[N_{patches} = (N_z//P_z) \cdot (N_y//P_y) \cdot (N_x//P_x)\]

where \(a//b\) denotes the integer division between \(a\) and \(b\).

Initialization and parameters

In the layer initialization one has to specify:

the trace on which the this operation act;
the patch transformation function \(f\), which is a function taking the patch as input and returning the transformed patch as output.

The layer parameters of the apply() method are:

patch_shape: (tuple[int]) shape of the input patch used.
transform_name: (None or str) optional, name given to the dataset containing the transformed patch.
random_patches: (bool) optional, if True patches are sampled randomly from the regions of the input dataset having non-null volume, otherwise the patches are generated by taking the patches sequentially along the zyx-grid.
n_random_patches: (int) optional, number of patches sampled when the previous field is set True.

Inputs and outputs

The operation has the following inputs:

input 0

description: Binarized input dataset.

data type: 3d numpy array.

data shape: \((N_z,N_y,N_x)\), where \(N_i\) is the number of voxels along the i-th dimension.

The operation has the following outputs:

output 0

Dictionary with keys:

transformed_patch (this name may change according to the user setting)

description: dataset of transformed patches.

data type: numpy array.

data shape: \((N_{patches}, x)\) where \(x\) is the output shape of the transformation function.

patch_space_coordinate

description: dataframe containing the coordinate of each transformed patch in the patch space.

data type: pandas dataframe.

dataframe shape: \(N_{patches} \times 3\).

columns names: Z, Y, X.

columns description: z/y/x coordinate in patch space of the transformed patches contained in the transformed_patch array. The correspondence between the three coordinates and the transformed patch has to be understood row-by-row, i.e. the i-th index of the dataframe row correspond to the i-th element along the 0 axis of the transformed patch array.

Patch Discrete Fourier Transform 3D

PatchDiscreteFourierTransform3D is used to apply the 3d discrete Fourier Transform (DFT) patch-wise on a 3d input data.

Transfer function

Given a patch shape \((P_z,P_y,P_x)\) and an input \(I\) having shape \((N_z,N_y,N_x)\), the layers output consist in the collection \(\lbrace h_l[k,j,i] \rbrace_{l \in [0,1,\cdots,N_{patches}-1]}\)

\[h_l[k,j,i] = DFT3d(C_l[I])[k,j,i]\]

where \(DFT3d\) is the 3d DFT, \(C_l\) is the function returning the l-th patch of the input, and \(k \in [0,1,\cdots,P_z-1]\), \(y \in [0,1,\cdots,P_y-1]\), and \(i \in [0,1,\cdots,P_x-1]\). The number of patches is selected by the user when random sampling of patches is used, while when the patch transform is computed along the zyx-grid, the number of patches is equal to

\[N_{patches} = (N_z//P_z) \cdot (N_y//P_y) \cdot (N_x//P_x)\]

where \(a//b\) denotes the integer division between \(a\) and \(b\).

Initialization and parameters

In the layer initialization one has to specify:

the trace on which the this operation act.

The layer parameters of the apply() method are:

patch_shape: (tuple[int]) shape of the input patch used.
representation: (str) optional, specify here the way the output of the DFT is represented: it can be 'module_phase', to compute the module and the phase of the DFT coefficients, or 'real_imaginary', to compute the real and imaginary part ot the coefficients of the DFT.
random_patches: (bool) optional, if True patches are sampled randomly from the regions of the input dataset having non-null volume, otherwise the patches are generated by taking the patches sequentially along the zyx-grid.
n_random_patches: (int) optional, number of patches sampled when the previous field is set True.
use_periodic_smooth_decomposition: (bool) optional, if True the periodic component of the periodic-smooth decomposition of the 3d DFT is computed to reduce boundary related artifacts in the DFT result.

Inputs and outputs

The operation has the following inputs:

input 0

description: Binarized input dataset.

data type: 3d numpy array.

data shape: \((N_z,N_y,N_x)\), where \(N_i\) is the number of voxels along the i-th dimension.

The operation has the following outputs:

output 0

Dictionary with keys:

module (real) (this name depends on the user setting)

description: module (real part) of the 3d DFT patches.

data type: numpy array.

data shape: \((N_{patches}, x)\) where \(x\) is the patch shape.

phase (imaginary) (this name depends on the user setting)

description: phase (imaginary part) of the 3d DFT patches.

data type: numpy array.

data shape: \((N_{patches}, x)\) where \(x\) is the patch shape.

patch_space_coordinate

description: dataframe containing the coordinate of each patch DFT in the patch space.

data type: pandas dataframe.

dataframe shape: \(N_{patches} \times 3\).

column names: Z, Y, X.

column description: z/y/x coordinate in patch space of the module (real) and phase (imaginary part) of the patch DFTs contained in the module (real) and phase (imaginary) arrays. The correspondence between the three coordinates and these quantities has to be understood row-by-row, i.e. the i-th index of the dataframe row correspond to the i-th element along the 0 axis of the module (real) and phase (imaginary) arrays.

Dimensional reduction

DimensionalReduction is used to apply a dimensional reduction techniques on a input data. This method should be compatible with all the sklearn matrix decomposition techniques (see here). This operation in bmmltools preinitialized both with PCA and NMF can be used via classes DimensionalReduction_PCA and DimensionalReduction_NMF.

Transfer function

The input is assumed to be a collection of \(N\) n-dimensional objects, \(\lbrace i_l \rbrace_{l \in [0,1,\cdots,N]}\). This operation produce as output the sequence \(\lbrace o_l \rbrace_{l \in [0,1,\cdots,N]}\), defined as

\[o_l = DR(\mbox{vec }(i_l))\]

where \(DR(\cdot)\) is the function performing the dimensional reduction, \(vec(\cdot)\) perform the vectorization of the n-dimensional object (i.e. the object is “flatten” in a 1-d vector). The dimensional reduction algorithm can be trained on the same input collection \(\lbrace i_l \rbrace_{l \in [0,1,\cdots,N]}\), or on a difference sequence and the applied to the input collection.

Initialization and parameters

In the layer initialization one has to specify:

the trace on which the this operation act.
dimensional reduction class, i.e. a scikit-learn compatible class for dimensional reduction with all the standard methods and

The layer parameters of the apply() method are:

inference_key: (str) optional, if all the inputs except the last are a dictionary, this is the name of the key of the dictionary where the inference dataset is located.
training_key: (str) optional, if the last input is a dictionary, this is the name of the key of the dictionary where the training dataset is located.
n_components: (int) optional, number of components to keep for dimensional reduction. If the dimensional reduction algorithm does not have this attribute, this parameter can be ignored.
p: (dict) optional, dictionary containing the parameters for the initialization of the dimensional reduction class (if needed).
save_model: (bool) optional, if True the dimensional reduction model is saved using joblib.
trained_model_path: (str) optional, path to a dimensional reduction model saved using joblib. When this field is not None, this operation automatically assume that the model loaded is already trained: therefore no training is anymore performed.

Inputs and outputs

Assuming the operation has N inputs, the inputs are organized as follow:

input 0 ... input N-2

description: inference datasets, but it is also the training dataset when just a single input is given. When the inference_key is given this operation assume the inference dataset to be located at the specified key of a dictionary stored on the trace. When the second input is not given and inference_key is given, one need to specify also the training_key.

data type: numpy array.

data shape: \((N,x)\), where \(N\) is the number examples in the dataset while \(x\) is the shape of the data point.

input N-1

description: (optional) training dataset. When the training_key this operation assume the training dataset to be located at the specified key of a dictionary stored on the trace.

data type: numpy array.

data shape: \((N,x)\), where \(N\) is the number examples in the dataset while \(x\) is the shape of the data point.

The operation has the following outputs:

output 0 ... output N-2

description: projected datasets (i.e. datasets after the dimensional reduction).

data type: numpy array.

data shape: \((N,x)\), where \(N\) is the number examples in the inference dataset specified in the input while \(x\) is the shape of the projected data point. If the dimensional reduction class has the attribute n_components, \(x\) is equal to that number.

Data standardization

DataStandardization is used to standardize in various way the dataset given in input.

Transfer function

Given an input data \(x\) organized in an array with shape \((N_1,N_2,\cdots)\), i.e. \([x_{a_1,a_2,\cdots}]_{a_1 \in [0,1,\cdots,N_1-1], a_2 \in [0,1,\cdots,N_2-1],\cdots}\), for a given axis \(i\) selected by the user, this operation computes

\[\begin{split}\begin{align} m_i &= \frac{1}{N_i} \sum_{a_i = 0}^{N_i-1}x_{a_1,a_2,\cdots,a_i,\cdots} \\ s_i &= \sqrt{\frac{1}{N_i} \sum_{a_i = 0}^{N_i-1}(x_{a_1,a_2,\cdots,a_i,\cdots}-m_i)^2}. \end{align}\end{split}\]

The output is an array \(y\) having the same shape and the same number of elements of the input data, given by

\[y_{a_1,a_2,\cdots} = \frac{x_{a_1,a_2,\cdots} - m_i}{s_i}.\]

When more than one axis is specified, the normalization above is applied in sequence to each axis according to the order specified by the user.

Initialization and parameters

In the layer initialization one has to specify:

the trace on which the this operation act.

The layer parameters of the apply() method are:

axis: (int or tuple[int]) axis along which the standardization take place.
save_parameters: (bool) optional, if True the parameters \(m_i\) and \(s_i\) used for the standardization are saved.
load_parameters: (bool) optional, if True the parameters are loaded and not computed from the input training dataset.
parameters_path: (str) optional, path tp the precomputed parameters to use when load_parameters is True.
inference_key: (str) optional, if not None this operation assume the inputs to be dictionaries and this field specify the key where the inference dataset can be found in each of these dictionary.
training_key: (str) optional, if not None this operation assume the last input to be a dictionary and this field specify the key where the training dataset can be found.

Inputs and outputs

Assuming the operation has N inputs, the inputs are organized as follow:

input 0 ... input N-2

description: input inference datasets, i.e. the dataset on which the standardization is applied, but it is also the training dataset when just a single input is given.. When the inference_key is given this operation assume the inference dataset to be located at the specified key of a dictionary stored on the trace. When the second input is not given and inference_key is given, one need to specify also the training_key.

data type: numpy array.

data shape: arbitrary but all the same.

input N-1

description: (optional) training dataset, i.e. the dataset on which the standardization parameters are computed. When the training_key this operation assume the training dataset to be located at the specified key of a dictionary stored on the trace.

data type: numpy array.

data shape: the same of all the other inputs.

The operation has the following outputs:

output 0 ... output N-2

description: standardized datasets.

data type: numpy array.

data shape: equal to the shape of the input datasets.