Loading Fake Timeseries Surface Data#
This notebook is designed to explore some functionality with loading DataFiles and using Loaders.
This example will require some extra optional libraries, including nibabel and nilearn! Note: while nilearn is not imported, when trying to import SingleConnectivityMeasure, if nilearn is not installed, this will give an ImportError.
We will also use fake data for this example - so no special datasets required!
[1]:
import BPt as bp
import nibabel as nib
import numpy as np
import pandas as pd
import os
[2]:
def save_fake_timeseries_data():
'''Save fake timeseries and fake surface data.'''
X = np.random.random(size = (20, 100, 10242))
os.makedirs('fake_time_data', exist_ok=True)
for x in range(len(X)):
np.save('fake_time_data/' + str(x) + '_lh', X[x])
for x in range(len(X)):
np.save('fake_time_data/' + str(x) + '_rh', X[x])
save_fake_timeseries_data()
[3]:
# Init a Dataset
data = bp.Dataset()
Next, we are interested in loading in the files to the dataset as data files. There are a few different ways to do this, but we will use the method add_data_files. We will try and load the timeseries data first.
First we need a dictionary mapping desired column name to location or a file glob (which is easier so let’s use that).
[4]:
# The *'s just mean wildcard
files = {'timeseries_lh': 'fake_time_data/*_lh*',
'timeseries_rh': 'fake_time_data/*_rh*'}
# Now let's try loading with 'auto' as the file to subject function
data.add_data_files(files, 'auto')
[4]:
Data
timeseries_lh | timeseries_rh | |
---|---|---|
13_lh | Loc(0) | nan |
9_lh | Loc(1) | nan |
8_lh | Loc(2) | nan |
2_lh | Loc(3) | nan |
16_lh | Loc(4) | nan |
11_lh | Loc(5) | nan |
6_lh | Loc(6) | nan |
7_lh | Loc(7) | nan |
1_lh | Loc(8) | nan |
17_lh | Loc(9) | nan |
19_lh | Loc(10) | nan |
15_lh | Loc(11) | nan |
10_lh | Loc(12) | nan |
3_lh | Loc(13) | nan |
14_lh | Loc(14) | nan |
0_lh | Loc(15) | nan |
18_lh | Loc(16) | nan |
5_lh | Loc(17) | nan |
4_lh | Loc(18) | nan |
12_lh | Loc(19) | nan |
11_rh | nan | Loc(20) |
10_rh | nan | Loc(21) |
12_rh | nan | Loc(22) |
3_rh | nan | Loc(23) |
0_rh | nan | Loc(24) |
18_rh | nan | Loc(25) |
1_rh | nan | Loc(26) |
9_rh | nan | Loc(27) |
14_rh | nan | Loc(28) |
6_rh | nan | Loc(29) |
15_rh | nan | Loc(30) |
7_rh | nan | Loc(31) |
4_rh | nan | Loc(32) |
19_rh | nan | Loc(33) |
5_rh | nan | Loc(34) |
2_rh | nan | Loc(35) |
13_rh | nan | Loc(36) |
8_rh | nan | Loc(37) |
16_rh | nan | Loc(38) |
17_rh | nan | Loc(39) |
We can see ‘auto’ doesn’t work for us, so we can try writing our own function instead.
[5]:
def file_to_subj(loc):
return loc.split('/')[-1].split('_')[0]
# Actually load it this time
data = data.add_data_files(files, file_to_subj)
data
[5]:
Data
timeseries_lh | timeseries_rh | |
---|---|---|
13 | Loc(0) | Loc(36) |
9 | Loc(1) | Loc(27) |
8 | Loc(2) | Loc(37) |
2 | Loc(3) | Loc(35) |
16 | Loc(4) | Loc(38) |
11 | Loc(5) | Loc(20) |
6 | Loc(6) | Loc(29) |
7 | Loc(7) | Loc(31) |
1 | Loc(8) | Loc(26) |
17 | Loc(9) | Loc(39) |
19 | Loc(10) | Loc(33) |
15 | Loc(11) | Loc(30) |
10 | Loc(12) | Loc(21) |
3 | Loc(13) | Loc(23) |
14 | Loc(14) | Loc(28) |
0 | Loc(15) | Loc(24) |
18 | Loc(16) | Loc(25) |
5 | Loc(17) | Loc(34) |
4 | Loc(18) | Loc(32) |
12 | Loc(19) | Loc(22) |
What’s this though? Why are the files showing up as Loc(int). Whats going on is that the data files are really stored as just integers, see:
[6]:
data['timeseries_lh']
[6]:
13 0.0
9 1.0
8 2.0
2 3.0
16 4.0
11 5.0
6 6.0
7 7.0
1 8.0
17 9.0
19 10.0
15 11.0
10 12.0
3 13.0
14 14.0
0 15.0
18 16.0
5 17.0
4 18.0
12 19.0
Name: timeseries_lh, dtype: float64
They correspond to locations in a stored file mapping (note: you don’t need to worry about any of this most of the time)
[7]:
data.file_mapping[0], data.file_mapping[1], data.file_mapping[2]
[7]:
(DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/13_lh.npy'),
DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/9_lh.npy'),
DataFile(loc='/home/sage/BPt/Examples/Short_Examples/fake_time_data/8_lh.npy'))
Let’s add a fake target to our dataset now
[8]:
data['t'] = np.random.random(len(data))
data.set_target('t', inplace=True)
data
[8]:
Data
timeseries_lh | timeseries_rh | |
---|---|---|
13 | Loc(0) | Loc(36) |
9 | Loc(1) | Loc(27) |
8 | Loc(2) | Loc(37) |
2 | Loc(3) | Loc(35) |
16 | Loc(4) | Loc(38) |
11 | Loc(5) | Loc(20) |
6 | Loc(6) | Loc(29) |
7 | Loc(7) | Loc(31) |
1 | Loc(8) | Loc(26) |
17 | Loc(9) | Loc(39) |
19 | Loc(10) | Loc(33) |
15 | Loc(11) | Loc(30) |
10 | Loc(12) | Loc(21) |
3 | Loc(13) | Loc(23) |
14 | Loc(14) | Loc(28) |
0 | Loc(15) | Loc(24) |
18 | Loc(16) | Loc(25) |
5 | Loc(17) | Loc(34) |
4 | Loc(18) | Loc(32) |
12 | Loc(19) | Loc(22) |
Target
t | |
---|---|
13 | 0.656648 |
9 | 0.298354 |
8 | 0.495359 |
2 | 0.414660 |
16 | 0.606687 |
11 | 0.453163 |
6 | 0.853856 |
7 | 0.044329 |
1 | 0.916036 |
17 | 0.865733 |
19 | 0.015055 |
15 | 0.082130 |
10 | 0.731628 |
3 | 0.074572 |
14 | 0.589903 |
0 | 0.768409 |
18 | 0.536750 |
5 | 0.401537 |
4 | 0.580557 |
12 | 0.508457 |
Next we will generate a Loader to apply a parcellation, then extract a measure of connectivity.
[9]:
from BPt.extensions import SurfLabels
lh_parc = SurfLabels(labels='data/lh.aparc.annot', vectorize=False)
rh_parc = SurfLabels(labels='data/rh.aparc.annot', vectorize=False)
We can see how this object works on example data first.
[10]:
ex_lh = data.file_mapping[0].load()
ex_lh.shape
[10]:
(100, 10242)
[11]:
trans = lh_parc.fit_transform(ex_lh)
trans.shape
[11]:
(100, 35)
We essentially get a reduction from 10242 features to 35.
Next, we want to transform the matrix into a correlation matrix.
[12]:
from BPt.extensions import SingleConnectivityMeasure
scm = SingleConnectivityMeasure(kind='covariance', discard_diagonal=True, vectorize=True)
[13]:
scm.fit_transform(trans).shape
[13]:
(595,)
The single connectivity measure is just a wrapper designed to let the ConnectivityMeasure from nilearn work with a single subject’s data at a time.
Next, let’s use the input special Pipe wrapper to compose these two objects into their own pipeline
[14]:
lh_loader = bp.Loader(bp.Pipe([lh_parc, scm]), scope='_lh')
rh_loader = bp.Loader(bp.Pipe([rh_parc, scm]), scope='_rh')
Define a simple pipeline with just our loader steps, then evaluate with mostly default settings.
[15]:
pipeline = bp.Pipeline([lh_loader, rh_loader, bp.Model('linear')])
results = bp.evaluate(pipeline, data)
results
[15]:
BPtEvaluator
------------
mean_scores = {'explained_variance': -0.3492082271322736, 'neg_mean_squared_error': -0.08532586202634963}
std_scores = {'explained_variance': 0.37944917198666483, 'neg_mean_squared_error': 0.025409784568717956}
Saved Attributes: ['estimators', 'preds', 'timing', 'train_subjects', 'val_subjects', 'feat_names', 'ps', 'mean_scores', 'std_scores', 'weighted_mean_scores', 'scores', 'fis_', 'coef_']
Available Methods: ['get_preds_dfs', 'get_fis', 'get_coef_', 'permutation_importance']
Evaluated with:
ProblemSpec(problem_type='regression',
scorer={'explained_variance': make_scorer(explained_variance_score),
'neg_mean_squared_error': make_scorer(mean_squared_error, greater_is_better=False)},
subjects='all', target='t')
Don’t be discouraged that this didn’t work, we are after all trying to predict random noise with random noise …
[16]:
# These are the steps of the pipeline
fold0_pipeline = results.estimators[0]
for step in fold0_pipeline.steps:
print(step[0])
loader_pipe0
loader_pipe1
linear regressor
We can investigate pieces, or use special functions like
[17]:
results.get_X_transform_df(data, fold=0)
[17]:
timeseries_rh_0 | timeseries_rh_1 | timeseries_rh_2 | timeseries_rh_3 | timeseries_rh_4 | timeseries_rh_5 | timeseries_rh_6 | timeseries_rh_7 | timeseries_rh_8 | timeseries_rh_9 | ... | timeseries_lh_585 | timeseries_lh_586 | timeseries_lh_587 | timeseries_lh_588 | timeseries_lh_589 | timeseries_lh_590 | timeseries_lh_591 | timeseries_lh_592 | timeseries_lh_593 | timeseries_lh_594 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -0.000165 | 0.000046 | -0.000077 | -0.000075 | 0.000074 | -0.000011 | -0.000049 | 0.000047 | -0.000024 | -0.000024 | ... | -8.290498e-06 | -0.000006 | -0.000023 | 1.610693e-06 | 0.000015 | -0.000006 | 4.867083e-06 | -1.215231e-04 | -0.000140 | -0.000048 |
1 | 0.000051 | 0.000027 | -0.000011 | -0.000003 | 0.000022 | 0.000033 | 0.000049 | 0.000072 | 0.000010 | -0.000014 | ... | 9.147214e-06 | -0.000033 | -0.000015 | 4.817195e-06 | 0.000001 | 0.000009 | -3.010718e-05 | 5.807162e-05 | -0.000070 | 0.000016 |
2 | -0.000019 | -0.000024 | -0.000004 | 0.000027 | -0.000054 | 0.000013 | 0.000064 | -0.000118 | -0.000065 | 0.000063 | ... | -8.021237e-06 | -0.000059 | 0.000004 | -1.018778e-05 | -0.000026 | -0.000003 | 1.120659e-05 | -3.874970e-05 | 0.000057 | -0.000008 |
3 | 0.000037 | 0.000027 | 0.000050 | 0.000080 | 0.000038 | 0.000009 | -0.000094 | -0.000117 | 0.000056 | -0.000005 | ... | 2.637188e-07 | -0.000015 | -0.000011 | -6.939784e-06 | 0.000022 | 0.000005 | -2.519195e-05 | 1.219129e-04 | 0.000021 | 0.000074 |
4 | -0.000030 | 0.000013 | -0.000048 | -0.000002 | 0.000043 | -0.000021 | -0.000021 | 0.000045 | 0.000015 | -0.000008 | ... | -4.193627e-05 | -0.000005 | -0.000038 | -1.579288e-05 | -0.000010 | 0.000007 | -2.074608e-05 | 1.288912e-04 | 0.000048 | 0.000015 |
5 | -0.000027 | 0.000012 | 0.000049 | -0.000040 | 0.000137 | -0.000020 | 0.000023 | 0.000057 | 0.000020 | 0.000018 | ... | -2.317345e-05 | 0.000047 | -0.000021 | -3.256373e-06 | 0.000013 | 0.000006 | -2.017995e-05 | 3.174790e-05 | -0.000044 | -0.000050 |
6 | -0.000003 | 0.000011 | 0.000037 | -0.000007 | 0.000026 | 0.000034 | 0.000007 | -0.000071 | -0.000019 | -0.000004 | ... | 1.230251e-05 | 0.000065 | 0.000008 | 8.041033e-07 | 0.000001 | -0.000026 | -1.401379e-05 | 2.662647e-05 | -0.000020 | 0.000032 |
7 | 0.000038 | 0.000019 | 0.000006 | 0.000017 | -0.000173 | 0.000027 | -0.000058 | 0.000120 | 0.000028 | -0.000029 | ... | -2.762708e-05 | 0.000019 | 0.000015 | -5.296039e-06 | -0.000021 | 0.000017 | -3.512035e-06 | -1.743649e-04 | 0.000015 | 0.000002 |
8 | -0.000009 | 0.000007 | 0.000034 | -0.000002 | 0.000032 | -0.000011 | -0.000021 | -0.000113 | 0.000040 | 0.000024 | ... | -1.286571e-06 | -0.000022 | -0.000027 | 2.031265e-05 | -0.000008 | 0.000035 | -5.331094e-06 | -5.483645e-05 | 0.000103 | -0.000014 |
9 | 0.000062 | -0.000022 | 0.000060 | 0.000010 | -0.000017 | 0.000012 | -0.000019 | 0.000093 | -0.000002 | 0.000028 | ... | -1.272615e-05 | 0.000027 | -0.000015 | -1.022682e-05 | -0.000044 | -0.000006 | 4.879025e-06 | 3.508208e-07 | -0.000069 | -0.000002 |
10 | 0.000019 | 0.000110 | 0.000062 | -0.000019 | 0.000011 | -0.000007 | -0.000059 | -0.000056 | 0.000022 | -0.000041 | ... | -1.971200e-05 | 0.000055 | 0.000020 | -5.049802e-06 | 0.000014 | 0.000014 | -4.576251e-07 | -3.902154e-05 | 0.000023 | -0.000025 |
11 | 0.000013 | -0.000036 | -0.000063 | -0.000026 | -0.000008 | -0.000007 | 0.000029 | -0.000117 | 0.000052 | 0.000013 | ... | 4.998446e-07 | -0.000018 | -0.000016 | -1.614390e-05 | 0.000006 | -0.000006 | 1.069373e-05 | -6.800519e-06 | 0.000029 | -0.000103 |
12 | -0.000033 | -0.000027 | 0.000066 | 0.000013 | 0.000021 | -0.000012 | 0.000061 | 0.000105 | 0.000020 | 0.000022 | ... | -3.358210e-06 | -0.000003 | -0.000018 | 2.135645e-05 | 0.000009 | 0.000002 | -1.748675e-05 | 2.181139e-04 | 0.000018 | -0.000078 |
13 | 0.000080 | -0.000046 | -0.000040 | 0.000033 | -0.000092 | 0.000013 | -0.000005 | -0.000085 | 0.000020 | 0.000096 | ... | -3.432920e-06 | 0.000038 | 0.000048 | 5.295833e-06 | 0.000013 | 0.000030 | 5.164307e-06 | -9.442774e-05 | -0.000010 | -0.000014 |
14 | 0.000077 | -0.000009 | -0.000118 | 0.000056 | -0.000049 | 0.000021 | -0.000036 | 0.000130 | -0.000081 | 0.000017 | ... | -9.383758e-06 | -0.000027 | -0.000019 | -2.622800e-06 | 0.000005 | 0.000009 | -1.135353e-05 | 1.509882e-05 | -0.000070 | -0.000058 |
15 | -0.000113 | -0.000045 | 0.000040 | 0.000020 | -0.000040 | -0.000010 | -0.000081 | 0.000031 | -0.000066 | 0.000002 | ... | 1.523565e-05 | -0.000071 | 0.000031 | -6.086060e-06 | -0.000013 | 0.000003 | 1.540947e-06 | 1.604218e-04 | 0.000140 | 0.000034 |
16 rows × 1190 columns