Loading Fake Timeseries Surface Data#

This notebook is designed to explore some functionality with loading DataFiles and using Loaders.

This example will require some extra optional libraries, including nibabel and nilearn! Note: while nilearn is not imported, when trying to import SingleConnectivityMeasure, if nilearn is not installed, this will give an ImportError.

We will also use fake data for this example - so no special datasets required!

[1]:

import BPt as bp
import nibabel as nib
import numpy as np
import pandas as pd
import os

[2]:

def save_fake_timeseries_data():
    '''Save fake timeseries and fake surface data.'''

    X = np.random.random(size = (20, 100, 10242))
    os.makedirs('fake_time_data', exist_ok=True)

    for x in range(len(X)):
        np.save('fake_time_data/' + str(x) + '_lh', X[x])
    for x in range(len(X)):
        np.save('fake_time_data/' + str(x) + '_rh', X[x])

save_fake_timeseries_data()

[3]:

# Init a Dataset
data = bp.Dataset()

Next, we are interested in loading in the files to the dataset as data files. There are a few different ways to do this, but we will use the method add_data_files. We will try and load the timeseries data first.

First we need a dictionary mapping desired column name to location or a file glob (which is easier so let’s use that).

[4]:

# The *'s just mean wildcard
files = {'timeseries_lh': 'fake_time_data/*_lh*',
         'timeseries_rh': 'fake_time_data/*_rh*'}

# Now let's try loading with 'auto' as the file to subject function
data.add_data_files(files, 'auto')

[4]:

Data

	timeseries_lh	timeseries_rh
13_lh	Loc(0)	nan
9_lh	Loc(1)	nan
8_lh	Loc(2)	nan
2_lh	Loc(3)	nan
16_lh	Loc(4)	nan
11_lh	Loc(5)	nan
6_lh	Loc(6)	nan
7_lh	Loc(7)	nan
1_lh	Loc(8)	nan
17_lh	Loc(9)	nan
19_lh	Loc(10)	nan
15_lh	Loc(11)	nan
10_lh	Loc(12)	nan
3_lh	Loc(13)	nan
14_lh	Loc(14)	nan
0_lh	Loc(15)	nan
18_lh	Loc(16)	nan
5_lh	Loc(17)	nan
4_lh	Loc(18)	nan
12_lh	Loc(19)	nan
11_rh	nan	Loc(20)
10_rh	nan	Loc(21)
12_rh	nan	Loc(22)
3_rh	nan	Loc(23)
0_rh	nan	Loc(24)
18_rh	nan	Loc(25)
1_rh	nan	Loc(26)
9_rh	nan	Loc(27)
14_rh	nan	Loc(28)
6_rh	nan	Loc(29)
15_rh	nan	Loc(30)
7_rh	nan	Loc(31)
4_rh	nan	Loc(32)
19_rh	nan	Loc(33)
5_rh	nan	Loc(34)
2_rh	nan	Loc(35)
13_rh	nan	Loc(36)
8_rh	nan	Loc(37)
16_rh	nan	Loc(38)
17_rh	nan	Loc(39)

We can see ‘auto’ doesn’t work for us, so we can try writing our own function instead.

[5]:

def file_to_subj(loc):
    return loc.split('/')[-1].split('_')[0]

# Actually load it this time
data = data.add_data_files(files, file_to_subj)
data

[5]:

Data

	timeseries_lh	timeseries_rh
13	Loc(0)	Loc(36)
9	Loc(1)	Loc(27)
8	Loc(2)	Loc(37)
2	Loc(3)	Loc(35)
16	Loc(4)	Loc(38)
11	Loc(5)	Loc(20)
6	Loc(6)	Loc(29)
7	Loc(7)	Loc(31)
1	Loc(8)	Loc(26)
17	Loc(9)	Loc(39)
19	Loc(10)	Loc(33)
15	Loc(11)	Loc(30)
10	Loc(12)	Loc(21)
3	Loc(13)	Loc(23)
14	Loc(14)	Loc(28)
0	Loc(15)	Loc(24)
18	Loc(16)	Loc(25)
5	Loc(17)	Loc(34)
4	Loc(18)	Loc(32)
12	Loc(19)	Loc(22)

What’s this though? Why are the files showing up as Loc(int). Whats going on is that the data files are really stored as just integers, see:

[6]:

data['timeseries_lh']

[6]:

13     0.0
9      1.0
8      2.0
2      3.0
16     4.0
11     5.0
6      6.0
7      7.0
1      8.0
17     9.0
19    10.0
15    11.0
10    12.0
3     13.0
14    14.0
0     15.0
18    16.0
5     17.0
4     18.0
12    19.0
Name: timeseries_lh, dtype: float64

They correspond to locations in a stored file mapping (note: you don’t need to worry about any of this most of the time)

[7]:

data.file_mapping[0], data.file_mapping[1], data.file_mapping[2]

[7]:

(DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/13_lh.npy'),
 DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/9_lh.npy'),
 DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/8_lh.npy'))

Let’s add a fake target to our dataset now

[8]:

data['t'] = np.random.random(len(data))
data.set_target('t', inplace=True)
data

[8]:

Data

	timeseries_lh	timeseries_rh
13	Loc(0)	Loc(36)
9	Loc(1)	Loc(27)
8	Loc(2)	Loc(37)
2	Loc(3)	Loc(35)
16	Loc(4)	Loc(38)
11	Loc(5)	Loc(20)
6	Loc(6)	Loc(29)
7	Loc(7)	Loc(31)
1	Loc(8)	Loc(26)
17	Loc(9)	Loc(39)
19	Loc(10)	Loc(33)
15	Loc(11)	Loc(30)
10	Loc(12)	Loc(21)
3	Loc(13)	Loc(23)
14	Loc(14)	Loc(28)
0	Loc(15)	Loc(24)
18	Loc(16)	Loc(25)
5	Loc(17)	Loc(34)
4	Loc(18)	Loc(32)
12	Loc(19)	Loc(22)

Target

	t
13	0.656648
9	0.298354
8	0.495359
2	0.414660
16	0.606687
11	0.453163
6	0.853856
7	0.044329
1	0.916036
17	0.865733
19	0.015055
15	0.082130
10	0.731628
3	0.074572
14	0.589903
0	0.768409
18	0.536750
5	0.401537
4	0.580557
12	0.508457

Next we will generate a Loader to apply a parcellation, then extract a measure of connectivity.

[9]:

from BPt.extensions import SurfLabels

lh_parc = SurfLabels(labels='data/lh.aparc.annot', vectorize=False)
rh_parc = SurfLabels(labels='data/rh.aparc.annot', vectorize=False)

We can see how this object works on example data first.

[10]:

ex_lh = data.file_mapping[0].load()
ex_lh.shape

[10]:

(100, 10242)

[11]:

trans = lh_parc.fit_transform(ex_lh)
trans.shape

[11]:

(100, 35)

We essentially get a reduction from 10242 features to 35.

Next, we want to transform the matrix into a correlation matrix.

[12]:

from BPt.extensions import SingleConnectivityMeasure
scm = SingleConnectivityMeasure(kind='covariance', discard_diagonal=True, vectorize=True)

[13]:

scm.fit_transform(trans).shape

[13]:

(595,)

The single connectivity measure is just a wrapper designed to let the ConnectivityMeasure from nilearn work with a single subject’s data at a time.

Next, let’s use the input special Pipe wrapper to compose these two objects into their own pipeline

[14]:

lh_loader = bp.Loader(bp.Pipe([lh_parc, scm]), scope='_lh')
rh_loader = bp.Loader(bp.Pipe([rh_parc, scm]), scope='_rh')

Define a simple pipeline with just our loader steps, then evaluate with mostly default settings.

[15]:

pipeline = bp.Pipeline([lh_loader, rh_loader, bp.Model('linear')])

results = bp.evaluate(pipeline, data)
results

[15]:

BPtEvaluator
------------
mean_scores = {'explained_variance': -0.3492082271322736, 'neg_mean_squared_error': -0.08532586202634963}
std_scores = {'explained_variance': 0.37944917198666483, 'neg_mean_squared_error': 0.025409784568717956}

Saved Attributes: ['estimators', 'preds', 'timing', 'train_subjects', 'val_subjects', 'feat_names', 'ps', 'mean_scores', 'std_scores', 'weighted_mean_scores', 'scores', 'fis_', 'coef_']

Available Methods: ['get_preds_dfs', 'get_fis', 'get_coef_', 'permutation_importance']

Evaluated with:
ProblemSpec(problem_type='regression',
            scorer={'explained_variance': make_scorer(explained_variance_score),
                    'neg_mean_squared_error': make_scorer(mean_squared_error, greater_is_better=False)},
            subjects='all', target='t')

Don’t be discouraged that this didn’t work, we are after all trying to predict random noise with random noise …

[16]:

# These are the steps of the pipeline
fold0_pipeline = results.estimators[0]
for step in fold0_pipeline.steps:
    print(step[0])

loader_pipe0
loader_pipe1
linear regressor

We can investigate pieces, or use special functions like

[17]:

results.get_X_transform_df(data, fold=0)

[17]:

	timeseries_rh_0	timeseries_rh_1	timeseries_rh_2	timeseries_rh_3	timeseries_rh_4	timeseries_rh_5	timeseries_rh_6	timeseries_rh_7	timeseries_rh_8	timeseries_rh_9	...	timeseries_lh_585	timeseries_lh_586	timeseries_lh_587	timeseries_lh_588	timeseries_lh_589	timeseries_lh_590	timeseries_lh_591	timeseries_lh_592	timeseries_lh_593	timeseries_lh_594
0	-0.000165	0.000046	-0.000077	-0.000075	0.000074	-0.000011	-0.000049	0.000047	-0.000024	-0.000024	...	-8.290498e-06	-0.000006	-0.000023	1.610693e-06	0.000015	-0.000006	4.867083e-06	-1.215231e-04	-0.000140	-0.000048
1	0.000051	0.000027	-0.000011	-0.000003	0.000022	0.000033	0.000049	0.000072	0.000010	-0.000014	...	9.147214e-06	-0.000033	-0.000015	4.817195e-06	0.000001	0.000009	-3.010718e-05	5.807162e-05	-0.000070	0.000016
2	-0.000019	-0.000024	-0.000004	0.000027	-0.000054	0.000013	0.000064	-0.000118	-0.000065	0.000063	...	-8.021237e-06	-0.000059	0.000004	-1.018778e-05	-0.000026	-0.000003	1.120659e-05	-3.874970e-05	0.000057	-0.000008
3	0.000037	0.000027	0.000050	0.000080	0.000038	0.000009	-0.000094	-0.000117	0.000056	-0.000005	...	2.637188e-07	-0.000015	-0.000011	-6.939784e-06	0.000022	0.000005	-2.519195e-05	1.219129e-04	0.000021	0.000074
4	-0.000030	0.000013	-0.000048	-0.000002	0.000043	-0.000021	-0.000021	0.000045	0.000015	-0.000008	...	-4.193627e-05	-0.000005	-0.000038	-1.579288e-05	-0.000010	0.000007	-2.074608e-05	1.288912e-04	0.000048	0.000015
5	-0.000027	0.000012	0.000049	-0.000040	0.000137	-0.000020	0.000023	0.000057	0.000020	0.000018	...	-2.317345e-05	0.000047	-0.000021	-3.256373e-06	0.000013	0.000006	-2.017995e-05	3.174790e-05	-0.000044	-0.000050
6	-0.000003	0.000011	0.000037	-0.000007	0.000026	0.000034	0.000007	-0.000071	-0.000019	-0.000004	...	1.230251e-05	0.000065	0.000008	8.041033e-07	0.000001	-0.000026	-1.401379e-05	2.662647e-05	-0.000020	0.000032
7	0.000038	0.000019	0.000006	0.000017	-0.000173	0.000027	-0.000058	0.000120	0.000028	-0.000029	...	-2.762708e-05	0.000019	0.000015	-5.296039e-06	-0.000021	0.000017	-3.512035e-06	-1.743649e-04	0.000015	0.000002
8	-0.000009	0.000007	0.000034	-0.000002	0.000032	-0.000011	-0.000021	-0.000113	0.000040	0.000024	...	-1.286571e-06	-0.000022	-0.000027	2.031265e-05	-0.000008	0.000035	-5.331094e-06	-5.483645e-05	0.000103	-0.000014
9	0.000062	-0.000022	0.000060	0.000010	-0.000017	0.000012	-0.000019	0.000093	-0.000002	0.000028	...	-1.272615e-05	0.000027	-0.000015	-1.022682e-05	-0.000044	-0.000006	4.879025e-06	3.508208e-07	-0.000069	-0.000002
10	0.000019	0.000110	0.000062	-0.000019	0.000011	-0.000007	-0.000059	-0.000056	0.000022	-0.000041	...	-1.971200e-05	0.000055	0.000020	-5.049802e-06	0.000014	0.000014	-4.576251e-07	-3.902154e-05	0.000023	-0.000025
11	0.000013	-0.000036	-0.000063	-0.000026	-0.000008	-0.000007	0.000029	-0.000117	0.000052	0.000013	...	4.998446e-07	-0.000018	-0.000016	-1.614390e-05	0.000006	-0.000006	1.069373e-05	-6.800519e-06	0.000029	-0.000103
12	-0.000033	-0.000027	0.000066	0.000013	0.000021	-0.000012	0.000061	0.000105	0.000020	0.000022	...	-3.358210e-06	-0.000003	-0.000018	2.135645e-05	0.000009	0.000002	-1.748675e-05	2.181139e-04	0.000018	-0.000078
13	0.000080	-0.000046	-0.000040	0.000033	-0.000092	0.000013	-0.000005	-0.000085	0.000020	0.000096	...	-3.432920e-06	0.000038	0.000048	5.295833e-06	0.000013	0.000030	5.164307e-06	-9.442774e-05	-0.000010	-0.000014
14	0.000077	-0.000009	-0.000118	0.000056	-0.000049	0.000021	-0.000036	0.000130	-0.000081	0.000017	...	-9.383758e-06	-0.000027	-0.000019	-2.622800e-06	0.000005	0.000009	-1.135353e-05	1.509882e-05	-0.000070	-0.000058
15	-0.000113	-0.000045	0.000040	0.000020	-0.000040	-0.000010	-0.000081	0.000031	-0.000066	0.000002	...	1.523565e-05	-0.000071	0.000031	-6.086060e-06	-0.000013	0.000003	1.540947e-06	1.604218e-04	0.000140	0.000034

16 rows × 1190 columns