transform#

SurfLabels#

class neurotools.transform.rois.SurfLabels(labels, background_label=0, mask=None, strategy='mean', vectorize=True)#

Extract signals from non-overlapping labels for surface data.

This class functions similar to NiftiLabelsMasker, except is designed to work with surface data.

Parameters
  • labels (str or array-like) – This should represent an array, of the same size as the data dimension, as a mask with unique integer values for each ROI. You can also pass a str location in which to load in this array. Anything accepted by load is acceptable here.

  • background_label (int, array-like of int, optional) –

    This parameter determines which label, if any, in the corresponding passed labels, should be treated as ‘background’ and therefore no ROI calculated for that value or values. You may pass either a single interger value, an array-like of integer values.

    If not background label is desired, just pass a label which doesn’t exist in any of the data, e.g., -2.

    default = 0
    

  • mask (None, str or array-like, optional) –

    This parameter allows you to optional pass a mask of values in which to not calculate ROI values for. This can be passed as a str or array-like of values (just like labels), and should be comprised of a boolean array (or 1’s and 0’s), where a value of 1 means that value will be ignored (set to background label) should be kept, and a value of 0, for that value should be masked away. This array should have the same length as the passed labels.

    default = None
    

  • strategy (specific str, custom_func, optional) –

    This parameter dictates the function to be applied to each data’s ROI’s individually, e.g., mean to calculate the mean by ROI.

    If a str is passed, it must correspond to one of the below preset options:

    • ’mean’

      Calculate the mean with numpy.mean()

    • ’sum’

      Calculate the sum with numpy.sum()

    • ’min’ or ‘minimum

      Calculate the min value with numpy.ndarray.min()

    • ’max’ or ‘maximum

      Calculate the max value with numpy.ndarray.max()

    • ’std’ or ‘standard_deviation’

      Calculate the standard deviation with numpy.std()

    • ’var’ or ‘variance’

      Calculate the variance with numpy.var()

    If a custom function is passed, it must accept two arguments, custom_func(X_i, axis=data_dim), X_i, where X_i is a subjects data array where that subjects data corresponds to labels == some class i, and can potentially be either a 1D array or 2D array, and an axis argument to specify which axis is the data dimension (e.g., if calculating for a time-series [n_timepoints, data_dim], then data_dim = 1, if calculating for say stacked contrasts where [data_dim, n_contrasts], data_dim = 0, and lastly for a 1D array, data_dim is also 0.

    default = 'mean'
    

  • vectorize (bool, optional) –

    If the returned array should be flattened to 1D. E.g., if the last step in a set of loader steps this should be True, if before a different step it may make sense to set to False.

    default = True
    

See also

SurfMaps

For extracting non-static / probabilistic parcellations.

nilearn.input_data.NiftiLabelsMasker

For working with volumetric data.

SurfMaps#

class neurotools.transform.rois.SurfMaps(maps, strategy='auto', mask=None, vectorize=True)#

Extract signals from overlapping labels for surface data.

This class functions similar to NiftiMapsMasker, except it is designed to work with surface projected data, instead of 3D/4D Nifti files.

Parameters
  • maps (str or array-like, optional) –

    This parameter represents the maps in which to apply to each surface, where the shape of the passed maps should be (# of features, # of maps) or in other words, the size of the data array in the first dimension and the number of maps (i.e., the number of outputted ROIs from fit) as the second dimension.

    You may pass maps as either an array-like, or the str file location of a numpy or other valid surface file format array in which to load. Anything accepted by load is acceptable here.

  • strategy ({'auto', 'ls', 'average'}, optional) –

    The strategy in which the maps are used to extract signal. If ‘ls’ is selected, which stands for least squares, the least-squares solution will be used for each region.

    Alternatively, if ‘average’ is passed, then the weighted average value for each map will be computed.

    By default ‘auto’ will be selected, which will use ‘average’ if the passed maps contain only positive weights, and ‘ls’ in the case that there are any negative values in the passed maps.

    Otherwise, you can set a specific strategy. In deciding which method to use, consider an example. Let’s say the fit data X, and maps are

    data = np.array([1, 1, 5, 5])
    maps = np.array([[0, 0],
                     [0, 0],
                     [1, -1],
                     [1, -1]])
    

    In this case, the ‘ls’ method would yield region signals [2.5, -2.5], whereas the weighted ‘average’ method, would yield [5, 5], notably ignoring the negative weights. This highlights an important limitation to the weighted averaged method, as it does not handle negative values well.

    On the other hand, consider changing the maps weights to

    data = np.array([1, 1, 5, 5])
    maps = np.array([[0, 1],
                     [0, 2],
                     [1, 0],
                     [1, 0]])
    
    ls_sol = [5. , 0.6]
    average_sol = [5, 1]
    

    In this case, we can see that the weighted average gives a maybe more intuitive summary of the regions. In general, it depends on what signal you are trying to summarize, and how you are trying to summarize it.

  • mask (None, str or array-like, optional) –

    This parameter allows you to optional pass a mask of values in which to not calculate ROI values for. This can be passed as a str or array-like of values (just like maps), and should be comprised of a boolean array (or 1’s and 0’s), where a value of 1 means that value will be ignored (set to 0) should be kept, and a value of 0, for that value should be masked away. This array should have the same length as the passed maps. Specifically, where the shape of maps is (size, n_maps), the shape of mask should be (size).

    default = None
    

  • vectorize (bool, optional) –

    If the returned array should be flattened to 1D. E.g., if this is the last step in a set of loader steps this should be True. Also note, if the surface data it is being applied to is 1D, then the output will be 1D regardless of this parameter.

    default = True
    

See also

SurfLabels

For extracting static / non-probabilistic parcellations.

nilearn.input_data.NiftiMapsMasker

For volumetric nifti data.

Examples

First let’s define an example set of probabilistic maps, we will assume there are just 5 features in our data, and we will define 6 total maps.

In [1]: import numpy as np

In [2]: from neurotools.transform import SurfMaps

# This should have shape number of features x number of maps!
In [3]: prob_maps = np.array([[3, 1, 1, 1, 1, 1],
   ...:                       [1, 3, 1, 1, 1, 1],
   ...:                       [1, 1, 3, 1, 1, 1],
   ...:                       [1, 1, 1, 3, 1, 1],
   ...:                       [1, 1, 1, 1, 3, 1]])
   ...: 

In [4]: prob_maps.shape
Out[4]: (5, 6)

Next we can define some input data to use with these maps.

In [5]: data1 = np.arange(5, dtype='float')

In [6]: data1
Out[6]: array([0., 1., 2., 3., 4.])

In [7]: data2 = np.ones(5, dtype='float')

In [8]: data2
Out[8]: array([1., 1., 1., 1., 1.])

Now let’s define the actual object and use it to transform the data.

In [9]: sm = SurfMaps(maps=prob_maps)

In [10]: sm.fit_transform(data1)
Out[10]: 
array([1.42857143, 1.71428571, 2.        , 2.28571429, 2.57142857,
       2.        ])

In [11]: sm.fit_transform(data2)
Out[11]: array([1., 1., 1., 1., 1., 1.])

Okay so what is going on when we transform this data? Basically we are just taking weighted averages for each one of the defined maps. We could also explicitly change the strategy from ‘auto’ to ‘ls’ which would take the least squares solution instead.

In [12]: sm = SurfMaps(maps=prob_maps, strategy='ls')

In [13]: data_trans = sm.fit_transform(data1)

In [14]: data_trans
Out[14]: 
array([-0.74074074, -0.24074074,  0.25925926,  0.75925926,  1.25925926,
        0.18518519])

While a little less intuitive, the least squares solution allows us to reverse the feature transformation (although not always exactly)

In [15]: sm.inverse_transform(data_trans)
Out[15]: 
array([-1.11022302e-15,  1.00000000e+00,  2.00000000e+00,  3.00000000e+00,
        4.00000000e+00])

This can be useful in the say the case of converting back downstream calculated feature importance to the original data space.

gen_indv_roi_network#

neurotools.transform.network.gen_indv_roi_network(data, labels, metric='jsd', vectorize=False, discard_diagonal=False)#

This function is designed to generate a network of 1 - distance function between groups of different ROIs.

Specifically, this method calls the function scipy.stats.ks_2samp() or scipy.spatial.distance.jensenshannon() to calculate the distances between each collection of data points in each pair of ROIs, calculating the distance between the two distributions. Values are then subtracted from 1, such that exact distribution matches have value 1, i.e., these are the strongest connections.

Parameters
  • data (1D numpy array) –

    Data at this stage must be a single dimensional numpy array representing the underlying data in which the labelled regions correspond to.

    For example, this is typically a subject’s neuroimaging data along with corresponding label file in some ROI space.

  • labels (1D numpy array) – Labels in the form of a 1D numpy array with the same shape / len as data. This file should contain integer labels corresponding to which vertex or data elements belong to different ROIs.

  • metric ({'jsd', 'ks', 'ks_demean', 'ks_normalize'}, optional) –

    The type of distance to compute between points.

    • ’jsd’ : Use the 1 - Jensen Shannon distance between each rois kde estimated distributions.

    • ’ks’ : Use the 1 - Kolmogorov–Smirnov distance between distributions.

    • ’ks_demean’ : Same as ‘ks’, but with each ROI’s points de-meaned.

    • ’ks_normalize’ : Same as ‘ks’, but with each ROI’s points normalized.

    default = 'jsd'
    

  • vectorize (bool, optional) –

    If True, matrices are reshaped into 1D arrays and only their flattened lower triangular parts are returned.

    default = False
    

  • discard_diagonal (bool, optional) –

    If True, the diagonal elements of the calculated matrix are discarded.

    default = False
    

Returns

matrix – Base behavior is a two dimensional array containing the distances / statistics between every combination of ROI. If vectorize is True, then a 1D array is returned instead, and if discard_diagonal is set, then the diag will be removed as well.

Return type

1/2D numpy array

gen_fs_subj_vertex_network#

neurotools.transform.network.gen_fs_subj_vertex_network(subj_dr, modality='thickness', parc='aparc.a2009s.annot', metric='jsd', vectorize=False, discard_diagonal=False)#

This function is helper function for calling gen_indv_roi_network(), for data organized in freesurfer individual subject directory style.

Parameters
  • subj_dr (str) – The str location of the subject’s freesurfer directory in which to generate the ks roi network.

  • modality (str, optional) –

    The name of the modality (e.g., thickness, sulc, area, …) in which to generate the network. Should be saved in subdirectory surf with names: lh.{modality} and rh.{modality}

    default = 'thickness'
    

  • parc (str, optional) –

    The name of the label / parcellation file in which

    to use, as found in subdirectory label with names: lh.{parc} and rh.{parc}.

    These are concatenated internally with function merge_parc_hemis().

    The default value is to use the destr. parcellation.

    default = 'aparc.a2009s.annot'
    
    metric{‘jsd’, ‘ks’, ‘ks_demean’, ‘ks_normalize’}, optional

    The type of distance to compute between points.

    • ’jsd’ : Use the 1 - Jensen Shannon distance between each rois kde estimated distributions.

    • ’ks’ : Use the 1 - Kolmogorov–Smirnov distance between distributions.

    • ’ks_demean’ : Same as ‘ks’, but with each ROI’s points de-meaned.

    • ’ks_normalize’ : Same as ‘ks’, but with each ROI’s points normalized.

    default = 'jsd'
    

  • vectorize (bool, optional) –

    If True, matrices are reshaped into 1D arrays and only their flattened lower triangular parts are returned.

    default = False
    

  • discard_diagonal (bool, optional) –

    If True, the diagonal elements of the calculated matrix are discarded.

    default = False
    

Returns

matrix – Two dimensional array containing the 1 - Kolmogorov–Smirnov statistic / distances between every combination of ROI.

If vectorize is True, then a 1D array is returned.

Return type

1/2D numpy array