stats#

get_resid#

neurotools.stats.get_resid(covars, data)#

Compute the simple linear residualized version of a set of data according to a passed set of covariates. This fits a linear regression from covariates to data and then returns the difference between the predicted values and the true values for the data.

In this case of missing data, within the data portion of the input, this method will operate on only the the subset of subject’s with non-missing data per feature, any NaN data will be propegated to the returned residualized data.

Note that the intercept information from the linear model is re-added to the residuals.

Parameters
  • covars (numpy array) – The covariates in which to use to residualize the passed data, seperate per feature. This input must have shape number of subjects x number of covariates.

  • data (numpy array) – The data in which to residualize. This input must have shape number of subjects x number of data features

Returns

resid_data – A residualized version, with the same shape, of the passed data is returned. Any NaN values in the original input data are preserved.

Return type

numpy array

get_cohens#

neurotools.stats.get_cohens(data)#

Get cohen’s d value with or without any missing values present in data

Parameters

data (numpy array) – The data in which to compute the cohen’s d metric more. Input must have shape number of subjects x number of data features.

Returns

cohens – A 1D array with the same shape as the number of features / axis=1 in data, representing the calculated cohen’s d.

Return type

numpy array

permutations.permuted_v#

neurotools.stats.permutations.permuted_v(tested_vars, target_vars, confounding_vars=None, permutation_structure=None, within_grp=True, n_perm=100, variance_groups=None, two_sided_test=True, demean_targets=True, demean_confounds=True, model_intercept=False, min_vg_size=None, dtype=None, use_tf=False, use_z=False, random_state=None, n_jobs=1, verbose=0)#

This function is used to perform a permutation based statistical test on data with an underlying exchangability-block type structure.

In the case that no permutation structure is passed, i.e., value of None, then the original function will NOT be called. Instead, nilearn.mass_univariate.permuted_ols() will be used instead, and t-statistics calculated!

In the case that the passed tested_vars are a single group only, e.g., all 1’s, then any passed permutation_structure will only be used to generate variance groups, as instead of permutations based on swapping data, permutations will be performed through random sign flips of the data.

This code is based to a large degree upon matlab code from program PALM, as well as influence by the permutation function in python library nilearn.

Parameters
  • tested_vars (numpy array or pandas DataFrame) –

    Pass as an array or DataFrame, with shape either 1D or 2D as subjects x 1, and containing the values of interest in which to calculate the v-statistic against the passed target_vars, and as corrected for any passed confounding_vars.

    Note: The subject / data point order and length should match exactly between the first dimensions of tested_vars, ‘target_vars`, confounding_vars and permutation_structure.

  • target_vars (numpy array) –

    A 2D numpy array with shape subjects x features, containing the typically imaging features per subject to univariately calculate v-stats for.

    Note: The subject / data point order and length should match exactly between the first dimensions of tested_vars, ‘target_vars`, confounding_vars and permutation_structure.

  • confounding_vars (numpy array, pandas DataFrame or None, optional) –

    Confounding variates / covariates, passed as a 2D numpy array with shape subjects x number of confounding variables, or as None. If None, no variates are added, except maybe a constant column according to the value of parameter model_intercept. Otherwise, the passed variables influence will be removed from tested_vars before calculating the relationship between tested_vars and target_vars.

    Note: The subject / data point order and length should match exactly between the first dimensions of tested_vars, ‘target_vars`, confounding_vars and permutation_structure.

    default = None
    

  • permutation_structure (numpy array, pandas DataFrame or None, optional) –

    This parameter represents the underlying exchangability-block structure of the data passed. It is also used in order to automatically determine the underlying variance structure of the passed data.

    See PALM’s documentation for an introduction on how to format ExchangeabilityBlocks: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/PALM/ExchangeabilityBlocks

    This parameter accepts the same style input as PALM, except it is passed here as an array or DataFrame instead of as a file. The main requirement is that the shape of the structure match the number of subjects / data points in the first dimension.

    In this case that no permutation structure is passed, i.e., value of None, then the original function will NOT be called. Instead, nilearn.mass_univariate.permuted_ols() will be used instead, and t-statistics calculated!

    Note: The subject / data point order and length should match exactly between the first dimensions of tested_vars, ‘target_vars`, confounding_vars and permutation_structure.

    default = None
    

  • within_grp (bool, optional) –

    This parameter is only relevant when a permutation structure is passed, in that case it describes how the left-most exchanability / permutation structure column should act. Specifically, if True, then it specifies that the left-most column should be treated as groups to act in a within group swap only manner. If False, then it will consider the left-most column groups to only be able to swap at the group level with other groups of the same size.

    default = True
    

  • n_perm (int, optional) –

    The number of permutations to perform. Permutations are costly but the more are performed, the more precision one gets in the p-values estimation.

    If passed 0, this a valid option, and can be used to calculate just the original scores.

    default = 500
    

  • variance_groups (None or numpy array / pandas DataFrame, optional) –

    By default these are automatically computed from the passed permutation structure. Otherwise, this parameter can be overriden by passing a custom set of variance groups.

    default = None
    

  • two_sided_test (bool, optional) –

    If True, performs an unsigned v-test, where both positive and negative effects are considered. If False, only positive effects are considered as relevant.

    default = True
    

  • demean_targets (bool, optional) –

    If True, then the passed target_vars are demeaned across passed subjects (i.e., each single feature scaled to have mean 0 across subjects).

    Note: If performing an intercept based test (i.e., the tested vars are all 1’s) then this parameter should be left True, as sign flips to data that are not de-meaned might cause strange issues.

    default = True
    

  • demean_confounds (bool, optional) –

    If True, then the passed confounding_vars are demeaned across passed subjects (i.e., each single variable / column scaled to have mean 0 across subjects).

    default = True
    

  • model_intercept (bool, optional) –

    If True, a constant column is added to the confounding variates unless the tested variate is already the intercept.

    default = False
    

  • min_vg_size (int or None, optional) –

    If None, this parameter is ignored. Otherwise, if passed as int, this defines the smallest sized unique variance group allowed. Specifically, variance groups are calculated automatically from the passed permutation structure, and like-wise some statistics are calculated seperately per variance group, so that it can be a group idea to set a filter here that will drop any subject’s data that are below that threshold.

    default = None
    

  • dtype (str or None, optional) –

    If left as default of None, then the original datatypes for all passed data will be used. Alternatively, you may specify either ‘float32’ or ‘float64’ and all data and calculations will be cast to that datatype.

    It can be very beneficial in practice to specify ‘float32’ and perform less precise calculations. This can greatly improve memory and will also provide a speedup (a significant speed up when use_tf is set, otherwise a modest one).

    default = None
    

  • use_tf (bool, optional) –

    This flag specifies if permutations should be run on a special optimized version of the code designed to use a GPU, written in tensorflow.

    Note: If True, then this parameter requires that you have tensorflow installed and working. Ideally, also setup with a gpu, as using tensorflow with a cpu will not provide any benefit relative to using the base numpy version of the code.

    default = False
    

  • use_z (bool, optional) –

    v-statstics can optionally be converted into z-statstics. If passed True, then the returned original_scores will be z-statistics instead of v-statstics, and likewise, the permutation test will be performed by comparing max z-stats instead of v-stats.

    Note: This option cannot be used with use_tf.

    default = False
    

  • random_state (int or None, optional) –

    The random seed for random number generator. This can be set by passing an int to specify the same permutations, or by passing None a random seed will be used.

    default = None
    

  • n_jobs (int, optional) –

    Number of parallel workers. If 0 is provided, all CPUs are used. A negative number indicates that all the CPUs except (abs(n_jobs) - 1) ones will be used.

    default = 1
    

  • verbose (int, optional) –

    verbosity level (0 means no message).

    default = 0
    

Returns

  • pvals (numpy array) – The p-values associated with the significance test of the explanatory variates against the target variates. Family-wise corrected p-values.

  • original_scores (numpy array) – Either v (default), z or t statistics associated with the significance test of the explanatory variates against the target variates. The ranks of the scores into the h0 distribution correspond to the p-values.

  • h0_vmax (numpy array) – Distribution of the (max) v/z/t-statistic under the null hypothesis (obtained from the permutations). Array is sorted.