Loading Fake Timeseries Surface Data#

This notebook is designed to explore some functionality with loading DataFiles and using Loaders.

This example will require some extra optional libraries, including nibabel and nilearn! Note: while nilearn is not imported, when trying to import SingleConnectivityMeasure, if nilearn is not installed, this will give an ImportError.

We will also use fake data for this example - so no special datasets required!

[1]:
import BPt as bp
import nibabel as nib
import numpy as np
import pandas as pd
import os
[2]:
def save_fake_timeseries_data():
    '''Save fake timeseries and fake surface data.'''

    X = np.random.random(size = (20, 100, 10242))
    os.makedirs('fake_time_data', exist_ok=True)

    for x in range(len(X)):
        np.save('fake_time_data/' + str(x) + '_lh', X[x])
    for x in range(len(X)):
        np.save('fake_time_data/' + str(x) + '_rh', X[x])

save_fake_timeseries_data()
[3]:
# Init a Dataset
data = bp.Dataset()

Next, we are interested in loading in the files to the dataset as data files. There are a few different ways to do this, but we will use the method add_data_files. We will try and load the timeseries data first.

First we need a dictionary mapping desired column name to location or a file glob (which is easier so let’s use that).

[4]:
# The *'s just mean wildcard
files = {'timeseries_lh': 'fake_time_data/*_lh*',
         'timeseries_rh': 'fake_time_data/*_rh*'}

# Now let's try loading with 'auto' as the file to subject function
data.add_data_files(files, 'auto')
[4]:

Data

timeseries_lh timeseries_rh
13_lh Loc(0) nan
9_lh Loc(1) nan
8_lh Loc(2) nan
2_lh Loc(3) nan
16_lh Loc(4) nan
11_lh Loc(5) nan
6_lh Loc(6) nan
7_lh Loc(7) nan
1_lh Loc(8) nan
17_lh Loc(9) nan
19_lh Loc(10) nan
15_lh Loc(11) nan
10_lh Loc(12) nan
3_lh Loc(13) nan
14_lh Loc(14) nan
0_lh Loc(15) nan
18_lh Loc(16) nan
5_lh Loc(17) nan
4_lh Loc(18) nan
12_lh Loc(19) nan
11_rh nan Loc(20)
10_rh nan Loc(21)
12_rh nan Loc(22)
3_rh nan Loc(23)
0_rh nan Loc(24)
18_rh nan Loc(25)
1_rh nan Loc(26)
9_rh nan Loc(27)
14_rh nan Loc(28)
6_rh nan Loc(29)
15_rh nan Loc(30)
7_rh nan Loc(31)
4_rh nan Loc(32)
19_rh nan Loc(33)
5_rh nan Loc(34)
2_rh nan Loc(35)
13_rh nan Loc(36)
8_rh nan Loc(37)
16_rh nan Loc(38)
17_rh nan Loc(39)

We can see ‘auto’ doesn’t work for us, so we can try writing our own function instead.

[5]:
def file_to_subj(loc):
    return loc.split('/')[-1].split('_')[0]

# Actually load it this time
data = data.add_data_files(files, file_to_subj)
data
[5]:

Data

timeseries_lh timeseries_rh
13 Loc(0) Loc(36)
9 Loc(1) Loc(27)
8 Loc(2) Loc(37)
2 Loc(3) Loc(35)
16 Loc(4) Loc(38)
11 Loc(5) Loc(20)
6 Loc(6) Loc(29)
7 Loc(7) Loc(31)
1 Loc(8) Loc(26)
17 Loc(9) Loc(39)
19 Loc(10) Loc(33)
15 Loc(11) Loc(30)
10 Loc(12) Loc(21)
3 Loc(13) Loc(23)
14 Loc(14) Loc(28)
0 Loc(15) Loc(24)
18 Loc(16) Loc(25)
5 Loc(17) Loc(34)
4 Loc(18) Loc(32)
12 Loc(19) Loc(22)

What’s this though? Why are the files showing up as Loc(int). Whats going on is that the data files are really stored as just integers, see:

[6]:
data['timeseries_lh']
[6]:
13     0.0
9      1.0
8      2.0
2      3.0
16     4.0
11     5.0
6      6.0
7      7.0
1      8.0
17     9.0
19    10.0
15    11.0
10    12.0
3     13.0
14    14.0
0     15.0
18    16.0
5     17.0
4     18.0
12    19.0
Name: timeseries_lh, dtype: float64

They correspond to locations in a stored file mapping (note: you don’t need to worry about any of this most of the time)

[7]:
data.file_mapping[0], data.file_mapping[1], data.file_mapping[2]
[7]:
(DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/13_lh.npy'),
 DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/9_lh.npy'),
 DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/8_lh.npy'))

Let’s add a fake target to our dataset now

[8]:
data['t'] = np.random.random(len(data))
data.set_target('t', inplace=True)
data
[8]:

Data

timeseries_lh timeseries_rh
13 Loc(0) Loc(36)
9 Loc(1) Loc(27)
8 Loc(2) Loc(37)
2 Loc(3) Loc(35)
16 Loc(4) Loc(38)
11 Loc(5) Loc(20)
6 Loc(6) Loc(29)
7 Loc(7) Loc(31)
1 Loc(8) Loc(26)
17 Loc(9) Loc(39)
19 Loc(10) Loc(33)
15 Loc(11) Loc(30)
10 Loc(12) Loc(21)
3 Loc(13) Loc(23)
14 Loc(14) Loc(28)
0 Loc(15) Loc(24)
18 Loc(16) Loc(25)
5 Loc(17) Loc(34)
4 Loc(18) Loc(32)
12 Loc(19) Loc(22)

Target

t
13 0.656648
9 0.298354
8 0.495359
2 0.414660
16 0.606687
11 0.453163
6 0.853856
7 0.044329
1 0.916036
17 0.865733
19 0.015055
15 0.082130
10 0.731628
3 0.074572
14 0.589903
0 0.768409
18 0.536750
5 0.401537
4 0.580557
12 0.508457

Next we will generate a Loader to apply a parcellation, then extract a measure of connectivity.

[9]:
from BPt.extensions import SurfLabels

lh_parc = SurfLabels(labels='data/lh.aparc.annot', vectorize=False)
rh_parc = SurfLabels(labels='data/rh.aparc.annot', vectorize=False)

We can see how this object works on example data first.

[10]:
ex_lh = data.file_mapping[0].load()
ex_lh.shape
[10]:
(100, 10242)
[11]:
trans = lh_parc.fit_transform(ex_lh)
trans.shape
[11]:
(100, 35)

We essentially get a reduction from 10242 features to 35.

Next, we want to transform the matrix into a correlation matrix.

[12]:
from BPt.extensions import SingleConnectivityMeasure
scm = SingleConnectivityMeasure(kind='covariance', discard_diagonal=True, vectorize=True)
[13]:
scm.fit_transform(trans).shape
[13]:
(595,)

The single connectivity measure is just a wrapper designed to let the ConnectivityMeasure from nilearn work with a single subject’s data at a time.

Next, let’s use the input special Pipe wrapper to compose these two objects into their own pipeline

[14]:
lh_loader = bp.Loader(bp.Pipe([lh_parc, scm]), scope='_lh')
rh_loader = bp.Loader(bp.Pipe([rh_parc, scm]), scope='_rh')

Define a simple pipeline with just our loader steps, then evaluate with mostly default settings.

[15]:
pipeline = bp.Pipeline([lh_loader, rh_loader, bp.Model('linear')])

results = bp.evaluate(pipeline, data)
results
[15]:
BPtEvaluator
------------
mean_scores = {'explained_variance': -0.3492082271322736, 'neg_mean_squared_error': -0.08532586202634963}
std_scores = {'explained_variance': 0.37944917198666483, 'neg_mean_squared_error': 0.025409784568717956}

Saved Attributes: ['estimators', 'preds', 'timing', 'train_subjects', 'val_subjects', 'feat_names', 'ps', 'mean_scores', 'std_scores', 'weighted_mean_scores', 'scores', 'fis_', 'coef_']

Available Methods: ['get_preds_dfs', 'get_fis', 'get_coef_', 'permutation_importance']

Evaluated with:
ProblemSpec(problem_type='regression',
            scorer={'explained_variance': make_scorer(explained_variance_score),
                    'neg_mean_squared_error': make_scorer(mean_squared_error, greater_is_better=False)},
            subjects='all', target='t')

Don’t be discouraged that this didn’t work, we are after all trying to predict random noise with random noise …

[16]:
# These are the steps of the pipeline
fold0_pipeline = results.estimators[0]
for step in fold0_pipeline.steps:
    print(step[0])
loader_pipe0
loader_pipe1
linear regressor

We can investigate pieces, or use special functions like

[17]:
results.get_X_transform_df(data, fold=0)
[17]:
timeseries_rh_0 timeseries_rh_1 timeseries_rh_2 timeseries_rh_3 timeseries_rh_4 timeseries_rh_5 timeseries_rh_6 timeseries_rh_7 timeseries_rh_8 timeseries_rh_9 ... timeseries_lh_585 timeseries_lh_586 timeseries_lh_587 timeseries_lh_588 timeseries_lh_589 timeseries_lh_590 timeseries_lh_591 timeseries_lh_592 timeseries_lh_593 timeseries_lh_594
0 -0.000165 0.000046 -0.000077 -0.000075 0.000074 -0.000011 -0.000049 0.000047 -0.000024 -0.000024 ... -8.290498e-06 -0.000006 -0.000023 1.610693e-06 0.000015 -0.000006 4.867083e-06 -1.215231e-04 -0.000140 -0.000048
1 0.000051 0.000027 -0.000011 -0.000003 0.000022 0.000033 0.000049 0.000072 0.000010 -0.000014 ... 9.147214e-06 -0.000033 -0.000015 4.817195e-06 0.000001 0.000009 -3.010718e-05 5.807162e-05 -0.000070 0.000016
2 -0.000019 -0.000024 -0.000004 0.000027 -0.000054 0.000013 0.000064 -0.000118 -0.000065 0.000063 ... -8.021237e-06 -0.000059 0.000004 -1.018778e-05 -0.000026 -0.000003 1.120659e-05 -3.874970e-05 0.000057 -0.000008
3 0.000037 0.000027 0.000050 0.000080 0.000038 0.000009 -0.000094 -0.000117 0.000056 -0.000005 ... 2.637188e-07 -0.000015 -0.000011 -6.939784e-06 0.000022 0.000005 -2.519195e-05 1.219129e-04 0.000021 0.000074
4 -0.000030 0.000013 -0.000048 -0.000002 0.000043 -0.000021 -0.000021 0.000045 0.000015 -0.000008 ... -4.193627e-05 -0.000005 -0.000038 -1.579288e-05 -0.000010 0.000007 -2.074608e-05 1.288912e-04 0.000048 0.000015
5 -0.000027 0.000012 0.000049 -0.000040 0.000137 -0.000020 0.000023 0.000057 0.000020 0.000018 ... -2.317345e-05 0.000047 -0.000021 -3.256373e-06 0.000013 0.000006 -2.017995e-05 3.174790e-05 -0.000044 -0.000050
6 -0.000003 0.000011 0.000037 -0.000007 0.000026 0.000034 0.000007 -0.000071 -0.000019 -0.000004 ... 1.230251e-05 0.000065 0.000008 8.041033e-07 0.000001 -0.000026 -1.401379e-05 2.662647e-05 -0.000020 0.000032
7 0.000038 0.000019 0.000006 0.000017 -0.000173 0.000027 -0.000058 0.000120 0.000028 -0.000029 ... -2.762708e-05 0.000019 0.000015 -5.296039e-06 -0.000021 0.000017 -3.512035e-06 -1.743649e-04 0.000015 0.000002
8 -0.000009 0.000007 0.000034 -0.000002 0.000032 -0.000011 -0.000021 -0.000113 0.000040 0.000024 ... -1.286571e-06 -0.000022 -0.000027 2.031265e-05 -0.000008 0.000035 -5.331094e-06 -5.483645e-05 0.000103 -0.000014
9 0.000062 -0.000022 0.000060 0.000010 -0.000017 0.000012 -0.000019 0.000093 -0.000002 0.000028 ... -1.272615e-05 0.000027 -0.000015 -1.022682e-05 -0.000044 -0.000006 4.879025e-06 3.508208e-07 -0.000069 -0.000002
10 0.000019 0.000110 0.000062 -0.000019 0.000011 -0.000007 -0.000059 -0.000056 0.000022 -0.000041 ... -1.971200e-05 0.000055 0.000020 -5.049802e-06 0.000014 0.000014 -4.576251e-07 -3.902154e-05 0.000023 -0.000025
11 0.000013 -0.000036 -0.000063 -0.000026 -0.000008 -0.000007 0.000029 -0.000117 0.000052 0.000013 ... 4.998446e-07 -0.000018 -0.000016 -1.614390e-05 0.000006 -0.000006 1.069373e-05 -6.800519e-06 0.000029 -0.000103
12 -0.000033 -0.000027 0.000066 0.000013 0.000021 -0.000012 0.000061 0.000105 0.000020 0.000022 ... -3.358210e-06 -0.000003 -0.000018 2.135645e-05 0.000009 0.000002 -1.748675e-05 2.181139e-04 0.000018 -0.000078
13 0.000080 -0.000046 -0.000040 0.000033 -0.000092 0.000013 -0.000005 -0.000085 0.000020 0.000096 ... -3.432920e-06 0.000038 0.000048 5.295833e-06 0.000013 0.000030 5.164307e-06 -9.442774e-05 -0.000010 -0.000014
14 0.000077 -0.000009 -0.000118 0.000056 -0.000049 0.000021 -0.000036 0.000130 -0.000081 0.000017 ... -9.383758e-06 -0.000027 -0.000019 -2.622800e-06 0.000005 0.000009 -1.135353e-05 1.509882e-05 -0.000070 -0.000058
15 -0.000113 -0.000045 0.000040 0.000020 -0.000040 -0.000010 -0.000081 0.000031 -0.000066 0.000002 ... 1.523565e-05 -0.000071 0.000031 -6.086060e-06 -0.000013 0.000003 1.540947e-06 1.604218e-04 0.000140 0.000034

16 rows × 1190 columns