{ "cells": [ { "cell_type": "markdown", "id": "understanding-messaging", "metadata": {}, "source": [ "# Predict BMI\n", "\n", "This script shows a real world example using BPt to study the relationship between BMI and the brain. The data used in this notebook cannot be made public as it is from the ABCD Study, which requires a data use agreement in order to use the data.\n", "\n", "This notebook covers a number of different topics:\n", "\n", "- Preparing Data\n", "- Evaluating a single pipeline\n", "- Considering different options for how to use a test set\n", "- Introduce and use the Evaluate input option" ] }, { "cell_type": "code", "execution_count": 1, "id": "illegal-account", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import BPt as bp\n", "import numpy as np\n", "\n", "# Don't show sklearn convergence warnings\n", "from warnings import simplefilter\n", "from sklearn.exceptions import ConvergenceWarning\n", "simplefilter(\"ignore\", category=ConvergenceWarning)\n", "\n", "# Display tables up to five decimals\n", "pd.options.display.float_format = \"{:,.5f}\".format" ] }, { "cell_type": "markdown", "id": "approximate-swift", "metadata": {}, "source": [ "## Preparing Data" ] }, { "cell_type": "markdown", "id": "swiss-external", "metadata": {}, "source": [ "We will first load in the underlying dataset for this project which has been saved as a csv. It contains multi-modal change in ROI data from two timepoints of the ABCD Study (difference from follow up and baseline).\n", "\n", "This saved dataset doesn't include the real family ids, but an interesting piece of the ABCD study derived data is that there are a number of subjects from the same family. We will handle that in this example (granted with a fake family structure which we will generate below) by ensuring that for any cross-validation split, members of the same family stay in the same training or testing fold." ] }, { "cell_type": "code", "execution_count": 2, "id": "smart-defensive", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Unnamed: 0',\n", " 'src_subject_id',\n", " 'b_averaged_puberty',\n", " 'b_agemos',\n", " 'b_sex',\n", " 'b_race_ethnicity_categories',\n", " 'b_demo_highest_education_categories',\n", " 'b_site_id_l',\n", " 'b_subjects_no_missing_data',\n", " 'bmi_keep']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = pd.read_excel('data/structure_base.xlsx')\n", "list(data)[:10]" ] }, { "cell_type": "markdown", "id": "chinese-thirty", "metadata": {}, "source": [ "This dataset contains a number of columns we don't need. We will use the next cell to both group variables of interest together, and then select only the relvant columns to keep." ] }, { "cell_type": "code", "execution_count": 3, "id": "chubby-length", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(9724, 668)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Our target variable\n", "targets = ['b_bmi']\n", "\n", "# Columns with different traditional 'co-variates'.\n", "covars = ['b_sex', 'b_demo_highest_education_categories','b_race_ethnicity_categories',\n", " 'b_agemos', 'b_mri_info_deviceserialnumber']\n", "\n", "# Let's also note which of these are categorical\n", "cat_covars = ['b_mri_info_deviceserialnumber',\n", " 'b_demo_highest_education_categories',\n", " 'b_race_ethnicity_categories',\n", " 'b_sex']\n", "\n", "# These variables are any which we might want to use\n", "# but not directly as input features! E.g., we\n", "# might want to use them to inform choice of cross-validation.\n", "non_input = ['b_rel_family_id']\n", "\n", "# The different imaging features\n", "thick = [d for d in list(data) if 'thick' in d]\n", "area = [d for d in list(data) if 'smri_area_cort' in d]\n", "subcort = [d for d in list(data) if 'smri_vol' in d]\n", "dti_fa = [d for d in list(data) if 'dmri_dti_full_fa_' in d]\n", "dti_md = [d for d in list(data) if 'dmri_dti_full_md_' in d]\n", "brain = thick + area + subcort + dti_fa + dti_md\n", "\n", "# All to keep\n", "to_keep = brain + targets + covars + non_input\n", "\n", "data = data[to_keep]\n", "data.shape" ] }, { "cell_type": "markdown", "id": "editorial-vehicle", "metadata": {}, "source": [ "Now let's convert from a pandas DataFrame to a BPt Dataset." ] }, { "cell_type": "code", "execution_count": 4, "id": "mobile-berlin", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(9724, 668)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = bp.Dataset(data)\n", "\n", "# This is optional, to print some extra statements.\n", "data.verbose = 1\n", "data.shape" ] }, { "cell_type": "markdown", "id": "macro-matrix", "metadata": {}, "source": [ "Next, we perform some actions specific to the Dataset class. These include specifying which columns are 'target' and 'non input', with any we don't set to one these roles treated as the default role, 'data'." ] }, { "cell_type": "code", "execution_count": 5, "id": "random-commissioner", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting NaN threshold to: 0.5\n", "Dropped 8 Rows\n" ] }, { "data": { "text/html": [ "
\n", "

Data

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_agemosb_demo_highest_education_categoriesb_mri_info_deviceserialnumberb_race_ethnicity_categoriesb_sexdmri_dti_full_fa_subcort_aseg_accumbens_area_lhdmri_dti_full_fa_subcort_aseg_accumbens_area_rhdmri_dti_full_fa_subcort_aseg_amygdala_lhdmri_dti_full_fa_subcort_aseg_amygdala_rhdmri_dti_full_fa_subcort_aseg_caudate_lh...smri_vol_scs_subcorticalgvsmri_vol_scs_suprateialvsmri_vol_scs_tplhsmri_vol_scs_tprhsmri_vol_scs_vedclhsmri_vol_scs_vedcrhsmri_vol_scs_wholebsmri_vol_scs_wmhintsmri_vol_scs_wmhintlhsmri_vol_scs_wmhintrh
01242HASHe4f6957a210.171950.173890.199230.166540.15518...52,440.00000935,475.835146,220.900005,787.400003,554.800003,427.900001,045,923.63514478.400000.000000.00000
11225HASH1314a204120.168970.164770.200000.177130.15848...62,550.000001,078,644.862578,222.600007,571.400003,872.500003,837.400001,197,394.06258769.700000.000000.00000
21143HASH69f406fa110.228810.252500.232210.222980.22545...60,695.000001,133,471.504607,040.500006,752.900004,588.800004,631.500001,291,126.704601,114.700000.000000.00000
31305HASH1314a204120.167370.188400.151350.162150.18848...65,614.000001,055,580.451878,365.700007,656.400004,600.100004,920.600001,189,841.051871,788.500000.000000.00000
41155HASHc3bf3d9c120.189020.213750.239450.205810.20100...59,174.000001,010,567.332706,577.900006,612.700003,434.300003,942.600001,144,069.732701,036.300000.000000.00000
\n", "

5 rows × 666 columns

\n", "
\n", "
\n", "

Target

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_bmi
015.17507
116.45090
224.43703
317.38701
417.59670
\n", "
\n", "
\n", "

Non Input

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_rel_family_id
02257
111328
27607
311324
47608
\n", "
\n" ], "text/plain": [ " smri_thick_cort_destrieux_g_and_s_frontomargin_lh \\\n", "0 2.34300 \n", "1 2.74300 \n", "2 2.45400 \n", "3 2.96600 \n", "4 2.48500 \n", "\n", " smri_thick_cort_destrieux_g_and_s_occipital_inf_lh \\\n", "0 2.43700 \n", "1 2.31000 \n", "2 2.80100 \n", "3 2.60500 \n", "4 2.68100 \n", "\n", " smri_thick_cort_destrieux_g_and_s_paracentral_lh \\\n", "0 2.41200 \n", "1 2.86600 \n", "2 2.70300 \n", "3 3.19700 \n", "4 2.65500 \n", "\n", " smri_thick_cort_destrieux_g_and_s_subcentral_lh \\\n", "0 2.81300 \n", "1 3.33100 \n", "2 2.45700 \n", "3 3.09100 \n", "4 2.60800 \n", "\n", " smri_thick_cort_destrieux_g_and_s_transv_frontopol_lh \\\n", "0 2.69700 \n", "1 3.07600 \n", "2 2.62100 \n", "3 3.25200 \n", "4 3.05000 \n", "\n", " smri_thick_cort_destrieux_g_and_s_cingul_ant_lh \\\n", "0 2.92300 \n", "1 3.24900 \n", "2 3.17200 \n", "3 3.12600 \n", "4 3.23100 \n", "\n", " smri_thick_cort_destrieux_g_and_s_cingul_mid_ant_lh \\\n", "0 2.91300 \n", "1 3.23000 \n", "2 3.13800 \n", "3 3.24100 \n", "4 2.95600 \n", "\n", " smri_thick_cort_destrieux_g_and_s_cingul_mid_post_lh \\\n", "0 2.75700 \n", "1 3.08100 \n", "2 2.84600 \n", "3 3.05700 \n", "4 2.84400 \n", "\n", " smri_thick_cort_destrieux_g_cingul_post_dorsal_lh \\\n", "0 3.18000 \n", "1 3.49000 \n", "2 3.43000 \n", "3 3.44800 \n", "4 3.15200 \n", "\n", " smri_thick_cort_destrieux_g_cingul_post_ventral_lh ... \\\n", "0 2.40700 ... \n", "1 2.38600 ... \n", "2 2.90700 ... \n", "3 3.21400 ... \n", "4 2.41400 ... \n", "\n", " dmri_dti_full_md_subcort_aseg_amygdala_rh \\\n", "0 0.67408 \n", "1 0.66106 \n", "2 0.73185 \n", "3 0.67879 \n", "4 0.71735 \n", "\n", " dmri_dti_full_md_subcort_aseg_accumbens_area_rh \\\n", "0 0.64020 \n", "1 0.63493 \n", "2 0.66539 \n", "3 0.61966 \n", "4 0.72282 \n", "\n", " dmri_dti_full_md_subcort_aseg_ventraldc_rh b_bmi b_sex \\\n", "0 0.48485 15.17507 1 \n", "1 0.51481 16.45090 2 \n", "2 0.55961 24.43703 1 \n", "3 0.50990 17.38701 2 \n", "4 0.56224 17.59670 2 \n", "\n", " b_demo_highest_education_categories b_race_ethnicity_categories b_agemos \\\n", "0 2 2 124 \n", "1 5 1 122 \n", "2 3 1 114 \n", "3 5 1 130 \n", "4 5 1 115 \n", "\n", " b_mri_info_deviceserialnumber b_rel_family_id \n", "0 HASHe4f6957a 2257 \n", "1 HASH1314a204 11328 \n", "2 HASH69f406fa 7607 \n", "3 HASH1314a204 11324 \n", "4 HASHc3bf3d9c 7608 \n", "\n", "[5 rows x 668 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set's targets to target role\n", "data = data.set_role(targets, 'target')\n", "\n", "# Set non input to non input role\n", "data = data.set_role(non_input, 'non input')\n", "\n", "# Drop any missing values in the target variable\n", "data = data.drop_subjects_by_nan(scope='target')\n", "\n", "# We can optional add the categories we made as scopes!\n", "data.add_scope(covars, 'covars', inplace=True)\n", "data.add_scope(cat_covars, 'cat covars', inplace=True)\n", "data.add_scope(thick, 'thick', inplace=True)\n", "data.add_scope(area, 'area', inplace=True)\n", "data.add_scope(subcort, 'subcort', inplace=True)\n", "data.add_scope(dti_fa, 'dti_fa', inplace=True)\n", "data.add_scope(dti_md, 'dti_md', inplace=True)\n", "data.add_scope(brain, 'brain', inplace=True )\n", "\n", "# Drop all NaN from any column\n", "# Though BPt can generally handle NaN data fine,\n", "# it makes certain pieces easier for this example as we don't have to worry\n", "# about imputation.\n", "data = data.dropna()\n", "\n", "# Just show the first few rows\n", "data.head()" ] }, { "cell_type": "markdown", "id": "civilian-display", "metadata": {}, "source": [ "The scopes we defined are nice, as it lets use check columns, or compose different scopes together. For example\n", "we can check the scope we set 'cat covars' as composed with another variable as:" ] }, { "cell_type": "code", "execution_count": 6, "id": "unusual-rental", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['b_demo_highest_education_categories',\n", " 'b_mri_info_deviceserialnumber',\n", " 'b_race_ethnicity_categories',\n", " 'b_rel_family_id',\n", " 'b_sex']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.get_cols([cat_covars, 'b_rel_family_id'])" ] }, { "cell_type": "markdown", "id": "renewable-correction", "metadata": {}, "source": [ "These are notably the columns we want to make sure are categorical, so lets ordinalize them, then plot." ] }, { "cell_type": "code", "execution_count": 7, "id": "homeless-bahamas", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b_demo_highest_education_categories: 9478 rows\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "b_mri_info_deviceserialnumber: 9478 rows\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "b_race_ethnicity_categories: 9478 rows\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "b_rel_family_id: 9478 rows\n", "b_sex: 9478 rows\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/sage/anaconda3/envs/bpt/lib/python3.9/site-packages/BPt/dataset/helpers.py:167: UserWarning: Skipping plot: b_rel_family_id as >= categories!\n", " warnings.warn(as_str)\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAF+CAYAAACidPAUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVpUlEQVR4nO3deZRtZX3m8e/DLFzCqMikTBpDDCEM7UTSjoEQNZ20MbAwaIwDatIh2q2gBoMmsdW00SzSC8EYY8QxxqXBARFwjAHvJaAMIYxGAUGggQshhuHXf+y3yKGounUu1Lmn3qrvZ62zap9377P376117nPfes8+e6eqkCT1Y6NpFyBJWj8GtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwa2HJck1SZ497ToeiiRvTPL+RdzfHUn2assfTPJHi7jvk5P8wWLtT30zuLUsJflKkn9PsjbJ7UnWJDkuyeYz21TVn1TVy8bc14LbVdWqqrpqEWp/SZJvzNr3MVX1toe7by0PBreWs9+pqq2BnYHXAUcAn0+SxTxIkk0Wc3/SQgxuLYaDk1yS5P8l+askW6xr4yQ7Jjk9ya1Jbkny9SQbtXW7JPlUkh8luTrJ/2jt2yf5QZLnteerklyR5OiFiquqO6vqK8DzgacAv9z28YdJPtyWt0jy4SQ3t7q+nWSnJH8M/DxwUpsKOaltX0lek+Ry4PKRtn1GDr1jkjPbqP+rSR7bttujbXt/4M+M6pP8FHAy8JR2vFvb+gdMvSR5eev/LUk+m2SXkXWV5Jgkl7e+/MVi/2el6TK4tRiOAg4F9gYeD7x5ge1fB/wAeCSwE/BGoFp4/z1wIbAr8Czg2CSHVtUtwEuBU5M8Cvgz4IKq+tC4RVbVvwKrGYJ4thcD2wC7AzsAxwB3VdWbgK8zjN5XVdXvjLzmvwFPAvad55BHAW8DdgQuAE4bo8ZL27G/1Y637extkjwTeDvwQoa/Jr4HfGzWZs8FDgb2a9sdutCx1Q+DW4vhpKr6fgvXPwaOXGD7uxkC57FVdXdVfb2Gq50dDDyyqt5aVf/R5otPZZjioKq+BHwSOAs4HHjlQ6j1OmD7eWraAdinqu6tqjVVdfsC+3p7Vd1SVXfNs/5zVfW1qvox8CaGUfTuD6Hm2Y4CPlBV57d9H9/2vcfINv+7qm5t/1mdA+y/CMfVEmFwazF8f2T5e8Au823YvAu4AvhSkquSHNfaHwvs0v68v7VNE7yRYVQ+4xTgicAHq+rmh1DrrsAtc7T/DXAG8LEk1yV5Z5JNF9jX98ddX1V3tOMu9LsZxy4Mv+fRfd/M0LcZPxxZ/jdg1SIcV0uEwa3FMDqKfAzDqHZeVbW2ql5XVXsxzDu/NsmzGILu6qraduSxdVUdDpBkY4bg/hDw6lnzyQtqo90DGaY+Ztd0d1WdWFX7Ak9lmGqYmT+f79rHC10T+f7fS5JVDCP964A7W/OWI9s+ej32ex3Df3Iz+96K4a+Faxd4nZYJg1uL4TVJdkuyPcOUwMfXtXGS5ybZp31gdhtwL3AfcB6wNskbkjwiycZJnpjk4PbSNzKE2ksZRu0famG+Tkm2TPJfgc+0Y3x+jm2ekeRn2v5uZ5g6ua+tvgHYa6HjzOHwJIck2Yxhrvsf25TSjxhC9kWtjy9l+Hxgxg3Abu11c/ko8FtJ9m+nN/4JcG5VXfMQalSHDG4tho8AXwKuAq4EFvriyeOALwN3AN8C/m9VnVNV9zKMdPcHrgZuAt4PbJPkQOC1wNFtu3cwhPhxD979/U5KspYhCN8DfAo4rKrum2PbRwN/yxDalwJfZZg+AXgv8IJ21syfL9C3UR8B3sIwRXIg8KKRdS8H/hfDFMdPA/8wsu5s4GLgh0lumr3Tqvoy8AetP9czhP4R61GXOhfvgCNJfXHELUmdMbg1ERmuA3LHHI8vTLs2qXdOlUhSZ5bUNRYOO+yw+uIXvzjtMiRpKZj3MgVLaqrkppse9AG6JGmWJRXckqSFGdyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdWVI3Utjq0XvWE37zxGmXIUmLYs27jn44L+/jetySpIUZ3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqzMSCO8kHktyY5KJJHUOSVqJJjrg/CBw2wf1L0oo0seCuqq8Bt0xq/5K0Uk19jjvJK5KsTrL6nn9bO+1yJGnJm3pwV9UpVXVQVR20yZZbT7scSVryph7ckqT1Y3BLUmcmeTrgR4FvAT+Z5AdJfntSx5KklWSTSe24qo6c1L4laSVzqkSSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1Jn1ju4k2yXZL9JFCNJWthYwZ3kK0l+Isn2wPnAqUnePdnSJElzGXfEvU1V3Q78GvChqnoS8OzJlSVJms+4wb1Jkp2BFwKnT7AeSdICxg3utwJnAFdW1beT7AVcPrmyJEnz2WScjarqk8AnR55fBfz3SRUlSZrfuB9OPj7JWUkuas/3S/LmyZYmSZrLuFMlpwLHA3cDVNV3gCMmVZQkaX5jTZUAW1bVeUlG2+5Z7GJ+arcdWP2uoxd7t5K0rIw74r4pyd5AASR5AXD9xKqSJM1r3BH3a4BTgCckuRa4GnjRxKqSJM1r3LNKrgKenWQrYKOqWjvZsiRJ81lncCd5UVV9OMlrZ7UDUFV+7V2SNrCFRtxbtZ9bT7oQSdJ41hncVfW+JBsDt1fVn22gmiRJ67DgWSVVdS9w5AaoRZI0hnHPKvlmkpOAjwN3zjRW1fkTqUqSNK9xg3v/9vOtI20FPHNRq5EkLWjc0wGfMelCJEnjGfciU9skeXeS1e3xf5JsM+niJEkPNu5X3j8ArGW4kcILgduBv5pUUZKk+Y07x713VY1ef/vEJBdMoB5J0gLGHXHfleSQmSdJngbcNZmSJEnrMu6I+1XAX7d57QC3AC+ZVFGSpPmNe1bJBcDPJvmJ9vz2SRYlSZrfWME9z0WmbgPWtFCXJG0g485xHwQcA+zaHq8EDgNOTfL6CdUmSZrDuHPcuwEHVNUdAEneAnwO+AVgDfDOyZQnSZpt3BH3o4Afjzy/G9ipqu6a1S5JmrBxR9ynAecm+Ux7/jzgI+2OOJdMpDJJ0pzGPavkbUm+ADytNR1TVavb8lETqUySNKdxp0oAtmC4ocJ7ge8l2XNCNUmS1mHci0y9BXgDcHxr2hT48KSKkiTNb9wR968Cz6fdRKGqrsP7UErSVIwb3P9RVcVw8wTah5KSpCkYN7g/keR9wLZJXg58GXj/5MqSJM1n3LNK/jTJcxiuw/2TwAlVdeZEK5MkzWnca5W8o6reAJw5R5skaQMad6rkOXO0/dJiFiJJGs86R9xJXgW8GtgryXdGVm0NfHOShUmS5pbhZJF5Vg43TtgOeDtw3MiqtVV1y2IXs9+uj6jTX7nPYu9WkgB4zAnfnXYJ6yPzrVjniLuqbmO47vaRAEkexfANylVJVlXVvy5mlZKkhY37zcnnJbkcuBr4KnAN8IUJ1iVJmse4H07+EfBk4F+qak/gWcA/TqwqSdK8xg3uu6vqZmCjJBtV1TkMd8WRJG1g416P+9Ykq4CvAacluZF23RJJ0oa10OmA+wA7Ab8C3AX8PsP1tx8L/O7Eq5MkPchCUyXvYbgG951VdV9V3VNVfw18GvjDSRcnSXqwhYJ7p6p60ImPrW2PiVQkSVqnhYJ723Wse8Qi1iFJGtNCwb26Xcb1AZK8DFgzmZIkSeuy0FklxwKfTnIU/xnUBwGbMdwVR5K0gS30lfcbgKcmeQbwxNb8uao6e+KVSZLmNO6NFM4BzplwLZKkMYz7zUlJ0hJhcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM5MLLiT7J7knCSXJLk4ye9N6liStJJsMsF93wO8rqrOT7I1sCbJmVV1yQSPKUnL3sRG3FV1fVWd35bXApcCu07qeJK0UmyQOe4kewA/B5w7x7pXJFmdZPUtd967IcqRpK5NPLiTrAI+BRxbVbfPXl9Vp1TVQVV10PZbbTzpciSpexMN7iSbMoT2aVX1d5M8liStFJM8qyTAXwKXVtW7J3UcSVppJjnifhrwm8Azk1zQHodP8HiStCJM7HTAqvoGkEntX5JWKr85KUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOGNyS1BmDW5I6Y3BLUmcMbknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdWaTaRcwarOdf5rHnLB62mVI0pLmiFuSOmNwS1JnDG5J6ozBLUmdMbglqTMGtyR1xuCWpM4Y3JLUGYNbkjpjcEtSZwxuSeqMwS1JnTG4JakzBrckdcbglqTOpKqmXcP9kqwFLpt2HVOwI3DTtIuYgpXab1i5fbff47upqg6ba8WSupECcFlVHTTtIja0JKvt98qyUvtuvxeHUyWS1BmDW5I6s9SC+5RpFzAl9nvlWal9t9+LYEl9OClJWthSG3FLkhZgcEtSZ5ZEcCc5LMllSa5Icty063m4knwgyY1JLhpp2z7JmUkubz+3a+1J8uet799JcsDIa17ctr88yYun0Zf1kWT3JOckuSTJxUl+r7WvhL5vkeS8JBe2vp/Y2vdMcm7r48eTbNbaN2/Pr2jr9xjZ1/Gt/bIkh06pS+slycZJ/inJ6e35Sun3NUm+m+SCJKtb2+Tf71U11QewMXAlsBewGXAhsO+063qYffoF4ADgopG2dwLHteXjgHe05cOBLwABngyc29q3B65qP7dry9tNu28L9Htn4IC2vDXwL8C+K6TvAVa15U2Bc1ufPgEc0dpPBl7Vll8NnNyWjwA+3pb3bf8GNgf2bP82Np52/8bo/2uBjwCnt+crpd/XADvOapv4+30pdPwpwBkjz48Hjp92XYvQrz1mBfdlwM5teWeGLxsBvA84cvZ2wJHA+0baH7BdDw/gM8BzVlrfgS2B84EnMXxbbpPWfv97HTgDeEpb3qRtl9nv/9HtluoD2A04C3gmcHrrx7Lvd6tzruCe+Pt9KUyV7Ap8f+T5D1rbcrNTVV3fln8I7NSW5+t/17+X9ifwzzGMPFdE39t0wQXAjcCZDKPGW6vqnrbJaD/u72NbfxuwA332/T3A64H72vMdWBn9BijgS0nWJHlFa5v4+32pfeV9RaiqSrJsz8NMsgr4FHBsVd2e5P51y7nvVXUvsH+SbYFPA0+YbkWTl+S5wI1VtSbJ06dczjQcUlXXJnkUcGaSfx5dOan3+1IYcV8L7D7yfLfWttzckGRngPbzxtY+X/+7/L0k2ZQhtE+rqr9rzSui7zOq6lbgHIYpgm2TzAyQRvtxfx/b+m2Am+mv708Dnp/kGuBjDNMl72X59xuAqrq2/byR4T/r/8IGeL8vheD+NvC49in0ZgwfWHx2yjVNwmeBmU+LX8ww/zvTfnT7xPnJwG3tz6wzgF9Msl37VPoXW9uSlWFo/ZfApVX17pFVK6Hvj2wjbZI8gmFu/1KGAH9B22x232d+Jy8Azq5hgvOzwBHt7Is9gccB522QTjwEVXV8Ve1WVXsw/Ns9u6qOYpn3GyDJVkm2nllmeJ9exIZ4v097cr9Nxh/OcAbClcCbpl3PIvTno8D1wN0M81W/zTCPdxZwOfBlYPu2bYC/aH3/LnDQyH5eClzRHr817X6N0e9DGOb8vgNc0B6Hr5C+7wf8U+v7RcAJrX0vhgC6AvgksHlr36I9v6Kt32tkX29qv5PLgF+adt/W43fwdP7zrJJl3+/Wxwvb4+KZ7NoQ73e/8i5JnVkKUyWSpPVgcEtSZwxuSeqMwS1JnTG4JakzBreWnSSPTvKxJFe2ryJ/PsnjF3H/T0/y1MXan7S+DG4tK+1LQJ8GvlJVe1fVgQwXMNpp3a9cL08HDG5NjcGt5eYZwN1VdfJMQ1VdCHwjybuSXNSun/wbcP/o+fSZbZOclOQlbfmaJCcmOb+95gnt4lnHAL/frsH880l+ve33wiRf25Cd1crkRaa03DwRWDNH+68B+wM/C+wIfHvMkL2pqg5I8mrgf1bVy5KcDNxRVX8KkOS7wKE1XGxo28XohLQujri1UhwCfLSq7q2qG4CvAgeP8bqZC2WtYbjG+ly+CXwwycsZbgwiTZTBreXmYuDA9dj+Hh7472CLWet/3H7eyzx/oVbVMcCbGa7wtibJDutxfGm9Gdxabs4GNh+5qD1J9gNuBX6j3ezgkQy3lzsP+B6wb7sq3bbAs8Y4xlqGW7PN7H/vqjq3qk4AfsQDL9EpLTrnuLWsVFUl+VXgPUneAPw7w+2ljgVWMVzJrYDXV9UPAZJ8guGKflczXOFvIX8P/G2SXwF+l+GDyscxXP3trHYMaWK8OqAkdcapEknqjMEtSZ0xuCWpMwa3JHXG4JakzhjcktQZg1uSOvP/AZv7Fi6pQmBZAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data = data.ordinalize([cat_covars, 'b_rel_family_id'])\n", "\n", "# Then plot just the categorical variables\n", "data.plot(scope='category', subjects='all', decode_values=True)" ] }, { "cell_type": "markdown", "id": "municipal-worcester", "metadata": {}, "source": [ "Let's plot the target variables as well." ] }, { "cell_type": "code", "execution_count": 8, "id": "thousand-nickname", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b_bmi: 9478 rows\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data.plot('target')" ] }, { "cell_type": "markdown", "id": "jewish-short", "metadata": {}, "source": [ "Okay, we note that there are some extreme outliers." ] }, { "cell_type": "code", "execution_count": 9, "id": "numeric-helping", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 1 Rows\n" ] }, { "data": { "text/plain": [ "b_bmi 53.90845\n", "dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = data.filter_outliers_by_std(n_std=10, scope='target')\n", "data['target'].max()" ] }, { "cell_type": "code", "execution_count": 10, "id": "afraid-specialist", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b_bmi: 9477 rows\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data.plot('target')" ] }, { "cell_type": "markdown", "id": "comparable-market", "metadata": {}, "source": [ "Okay this maximum seem much more reasonable. Let's also assume that there may be some extreme values present in the input data as well, and that these represet corrupted data that we therefore want to drop." ] }, { "cell_type": "code", "execution_count": 11, "id": "public-office", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 62 Rows\n", "Dropped 14 Rows\n" ] } ], "source": [ "# Repeat it twice, to deal with outliers at multiple scales\n", "data = data.filter_outliers_by_std(n_std=10, scope='float')\n", "data = data.filter_outliers_by_std(n_std=10, scope='float')" ] }, { "cell_type": "markdown", "id": "verified-spray", "metadata": {}, "source": [ "Next, we consider splitting up out data with a global train and test split. This can be useful in some instances. Note that we also define a cv strategy which says to perform the train test split keeping members of the same family in the same fold." ] }, { "cell_type": "code", "execution_count": 12, "id": "urban-edinburgh", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Performing test split on: 9401 subjects.\n", "random_state: 5\n", "Test split size: 0.2\n", "\n", "Performed train/test split\n", "Train size: 7481\n", "Test size: 1920\n" ] }, { "data": { "text/html": [ "
\n", "

Data

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_agemosb_demo_highest_education_categoriesb_mri_info_deviceserialnumberb_race_ethnicity_categoriesb_sexdmri_dti_full_fa_subcort_aseg_accumbens_area_lhdmri_dti_full_fa_subcort_aseg_accumbens_area_rhdmri_dti_full_fa_subcort_aseg_amygdala_lhdmri_dti_full_fa_subcort_aseg_amygdala_rhdmri_dti_full_fa_subcort_aseg_caudate_lh...smri_vol_scs_subcorticalgvsmri_vol_scs_suprateialvsmri_vol_scs_tplhsmri_vol_scs_tprhsmri_vol_scs_vedclhsmri_vol_scs_vedcrhsmri_vol_scs_wholebsmri_vol_scs_wmhintsmri_vol_scs_wmhintlhsmri_vol_scs_wmhintrh
0124126100.171950.173890.199230.166540.15518...52,440.00000935,475.835146,220.900005,787.400003,554.800003,427.900001,045,923.63514478.400000.000000.00000
112242010.168970.164770.200000.177130.15848...62,550.000001,078,644.862578,222.600007,571.400003,872.500003,837.400001,197,394.06258769.700000.000000.00000
2114213000.228810.252500.232210.222980.22545...60,695.000001,133,471.504607,040.500006,752.900004,588.800004,631.500001,291,126.704601,114.700000.000000.00000
313042010.167370.188400.151350.162150.18848...65,614.000001,055,580.451878,365.700007,656.400004,600.100004,920.600001,189,841.051871,788.500000.000000.00000
4115420010.189020.213750.239450.205810.20100...59,174.000001,010,567.332706,577.900006,612.700003,434.300003,942.600001,144,069.732701,036.300000.000000.00000
..................................................................
971711912210.163920.172220.199060.178290.15375...47,422.00000893,537.356066,066.300005,342.000002,808.400003,469.60000999,770.356061,152.400000.000000.00000
972012947010.234690.253380.259710.263210.22641...59,348.000001,070,071.008127,036.600006,669.100003,817.400004,103.400001,198,435.10812794.900000.000000.00000
972110820100.154690.202230.171450.153000.15900...63,328.000001,093,051.508978,123.200007,346.200003,992.800004,219.800001,213,030.80897805.500000.000000.00000
9722110217010.165300.224840.195240.160840.16671...57,037.00000997,648.582736,692.500006,437.900003,833.400003,887.700001,127,133.482731,477.900000.000000.00000
9723113426010.150500.160660.169820.164840.15355...61,090.00000989,701.562667,113.500006,835.300004,029.600003,826.000001,134,202.762662,304.800000.000000.00000
\n", "

9401 rows × 666 columns

\n", "

7481 rows × 666 columns - Train Set

1920 rows × 666 columns - Test Set

\n", "
\n", "

Target

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_bmi
015.17507
116.45090
224.43703
317.38701
417.59670
......
971716.26895
972031.95732
972114.19771
972224.42694
972318.25486
\n", "

9401 rows × 1 columns

\n", "

7481 rows × 1 columns - Train Set

1920 rows × 1 columns - Test Set

\n", "
\n", "

Non Input

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
b_rel_family_id
01589
17867
25174
37864
45175
......
97178070
97201814
97215719
97226412
97231740
\n", "

9401 rows × 1 columns

\n", "

7481 rows × 1 columns - Train Set

1920 rows × 1 columns - Test Set

\n" ], "text/plain": [ " smri_thick_cort_destrieux_g_and_s_frontomargin_lh \\\n", "0 2.34300 \n", "1 2.74300 \n", "2 2.45400 \n", "3 2.96600 \n", "4 2.48500 \n", "... ... \n", "9717 2.71000 \n", "9720 2.42100 \n", "9721 2.69000 \n", "9722 2.56400 \n", "9723 2.66500 \n", "\n", " smri_thick_cort_destrieux_g_and_s_occipital_inf_lh \\\n", "0 2.43700 \n", "1 2.31000 \n", "2 2.80100 \n", "3 2.60500 \n", "4 2.68100 \n", "... ... \n", "9717 2.45900 \n", "9720 2.63900 \n", "9721 2.71200 \n", "9722 2.52600 \n", "9723 2.91500 \n", "\n", " smri_thick_cort_destrieux_g_and_s_paracentral_lh \\\n", "0 2.41200 \n", "1 2.86600 \n", "2 2.70300 \n", "3 3.19700 \n", "4 2.65500 \n", "... ... \n", "9717 2.70200 \n", "9720 2.50500 \n", "9721 2.61900 \n", "9722 2.80000 \n", "9723 2.66100 \n", "\n", " smri_thick_cort_destrieux_g_and_s_subcentral_lh \\\n", "0 2.81300 \n", "1 3.33100 \n", "2 2.45700 \n", "3 3.09100 \n", "4 2.60800 \n", "... ... \n", "9717 2.79600 \n", "9720 3.20100 \n", "9721 2.86100 \n", "9722 3.10400 \n", "9723 3.11400 \n", "\n", " smri_thick_cort_destrieux_g_and_s_transv_frontopol_lh \\\n", "0 2.69700 \n", "1 3.07600 \n", "2 2.62100 \n", "3 3.25200 \n", "4 3.05000 \n", "... ... \n", "9717 2.56900 \n", "9720 2.69200 \n", "9721 2.96000 \n", "9722 2.89300 \n", "9723 2.96800 \n", "\n", " smri_thick_cort_destrieux_g_and_s_cingul_ant_lh \\\n", "0 2.92300 \n", "1 3.24900 \n", "2 3.17200 \n", "3 3.12600 \n", "4 3.23100 \n", "... ... \n", "9717 2.94600 \n", "9720 3.16600 \n", "9721 2.92900 \n", "9722 3.08100 \n", "9723 3.16700 \n", "\n", " smri_thick_cort_destrieux_g_and_s_cingul_mid_ant_lh \\\n", "0 2.91300 \n", "1 3.23000 \n", "2 3.13800 \n", "3 3.24100 \n", "4 2.95600 \n", "... ... \n", "9717 2.88900 \n", "9720 2.87400 \n", "9721 3.00500 \n", "9722 3.07900 \n", "9723 3.05800 \n", "\n", " smri_thick_cort_destrieux_g_and_s_cingul_mid_post_lh \\\n", "0 2.75700 \n", "1 3.08100 \n", "2 2.84600 \n", "3 3.05700 \n", "4 2.84400 \n", "... ... \n", "9717 2.72600 \n", "9720 2.79800 \n", "9721 2.88300 \n", "9722 2.86700 \n", "9723 2.97600 \n", "\n", " smri_thick_cort_destrieux_g_cingul_post_dorsal_lh \\\n", "0 3.18000 \n", "1 3.49000 \n", "2 3.43000 \n", "3 3.44800 \n", "4 3.15200 \n", "... ... \n", "9717 3.26200 \n", "9720 3.36600 \n", "9721 3.17500 \n", "9722 3.42400 \n", "9723 3.35500 \n", "\n", " smri_thick_cort_destrieux_g_cingul_post_ventral_lh ... \\\n", "0 2.40700 ... \n", "1 2.38600 ... \n", "2 2.90700 ... \n", "3 3.21400 ... \n", "4 2.41400 ... \n", "... ... ... \n", "9717 2.62300 ... \n", "9720 2.70900 ... \n", "9721 3.05500 ... \n", "9722 2.43600 ... \n", "9723 2.16800 ... \n", "\n", " dmri_dti_full_md_subcort_aseg_amygdala_rh \\\n", "0 0.67408 \n", "1 0.66106 \n", "2 0.73185 \n", "3 0.67879 \n", "4 0.71735 \n", "... ... \n", "9717 0.68056 \n", "9720 0.73108 \n", "9721 0.66988 \n", "9722 0.67194 \n", "9723 0.69006 \n", "\n", " dmri_dti_full_md_subcort_aseg_accumbens_area_rh \\\n", "0 0.64020 \n", "1 0.63493 \n", "2 0.66539 \n", "3 0.61966 \n", "4 0.72282 \n", "... ... \n", "9717 0.63070 \n", "9720 0.68669 \n", "9721 0.63930 \n", "9722 0.64405 \n", "9723 0.66166 \n", "\n", " dmri_dti_full_md_subcort_aseg_ventraldc_rh b_bmi b_sex \\\n", "0 0.48485 15.17507 0 \n", "1 0.51481 16.45090 1 \n", "2 0.55961 24.43703 0 \n", "3 0.50990 17.38701 1 \n", "4 0.56224 17.59670 1 \n", "... ... ... ... \n", "9717 0.53454 16.26895 1 \n", "9720 0.53526 31.95732 1 \n", "9721 0.51814 14.19771 0 \n", "9722 0.52716 24.42694 1 \n", "9723 0.50890 18.25486 1 \n", "\n", " b_demo_highest_education_categories b_race_ethnicity_categories \\\n", "0 1 1 \n", "1 4 0 \n", "2 2 0 \n", "3 4 0 \n", "4 4 0 \n", "... ... ... \n", "9717 1 2 \n", "9720 4 0 \n", "9721 2 1 \n", "9722 2 0 \n", "9723 4 0 \n", "\n", " b_agemos b_mri_info_deviceserialnumber b_rel_family_id \n", "0 124 26 1589 \n", "1 122 2 7867 \n", "2 114 13 5174 \n", "3 130 2 7864 \n", "4 115 20 5175 \n", "... ... ... ... \n", "9717 119 2 8070 \n", "9720 129 7 1814 \n", "9721 108 0 5719 \n", "9722 110 17 6412 \n", "9723 113 26 1740 \n", "\n", "[9401 rows x 668 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Let's say we want to keep family members in the same train or test split\n", "cv_strategy = bp.CVStrategy(groups='b_rel_family_id')\n", "\n", "# Test split\n", "data = data.set_test_split(.2, random_state=5, cv_strategy=cv_strategy)\n", "data" ] }, { "cell_type": "markdown", "id": "enormous-construction", "metadata": {}, "source": [ "## Single Training Set Evaluation" ] }, { "cell_type": "markdown", "id": "composed-inquiry", "metadata": {}, "source": [ "Our Dataset is now fully prepared. We can now define and evaluate a machine learning pipeline.\n", "We will start by considering a pipeline which a few steps, these are:\n", "\n", "1. Winsorize just the brain data.\n", "2. Perform Robust Scaling on any 'float' type columns, 'brain' or 'covars'\n", "3. One Hot Encode any categorical features\n", "4. Fit an Elastic-Net Regression with nested random hyper-parameter search\n", "\n", "Let's use one other feature of the toolbox, that is a custom cross-validation strategy. This is the same idea that we used when defining the train-test split, but now we want it to apply both during the evaluation and during the splits made when evaluating hyper-parameters." ] }, { "cell_type": "code", "execution_count": 13, "id": "collect-tribe", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Pipeline(steps=[Scaler(extra_params={'quantile_range': (2, 98)},\n", " obj='winsorize', scope='brain'),\n", " Scaler(obj='standard'),\n", " Transformer(obj='one hot encoder', scope='category'),\n", " Model(obj='elastic',\n", " param_search=ParamSearch(cv=CV(cv_strategy=CVStrategy(groups='b_rel_family_id')),\n", " n_iter=60),\n", " params=1)])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#1\n", "w_scaler = bp.Scaler('winsorize', quantile_range=(2, 98), scope='brain')\n", "\n", "#2\n", "s_scaler = bp.Scaler('standard', scope='float')\n", "\n", "#3\n", "ohe = bp.Transformer('one hot encoder', scope='category')\n", "\n", "#4\n", "param_search=bp.ParamSearch('RandomSearch', n_iter=60,\n", " cv=bp.CV(splits=3, cv_strategy=cv_strategy))\n", "elastic = bp.Model('elastic', params=1,\n", " param_search=param_search)\n", "\n", "# Now we can actually defined the pipeline\n", "pipe = bp.Pipeline(steps=[w_scaler, s_scaler, ohe, elastic])\n", "pipe" ] }, { "cell_type": "markdown", "id": "ancient-ancient", "metadata": {}, "source": [ "Let's say we want to use a 5-fold CV to evaluate this model on just the training set. We can first define the cv, the same as for the param_search above, but this time with splits=5. " ] }, { "cell_type": "code", "execution_count": 14, "id": "executive-infrastructure", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CV(cv_strategy=CVStrategy(groups='b_rel_family_id'), splits=5)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cv=bp.CV(splits=5, n_repeats=1, cv_strategy=cv_strategy)\n", "cv" ] }, { "cell_type": "markdown", "id": "elect-necklace", "metadata": {}, "source": [ "And we will make a problem_spec to store some common params. In this case the random state for reproducibility of results and the number of jobs to use." ] }, { "cell_type": "code", "execution_count": 15, "id": "cultural-logic", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ProblemSpec(n_jobs=8, random_state=10)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use problem spec just to store n jobs and random state\n", "ps = bp.ProblemSpec(n_jobs=8, random_state=10)\n", "ps" ] }, { "cell_type": "markdown", "id": "wired-charleston", "metadata": {}, "source": [ "Now we are ready to evaluate this pipeline, let's check an example using the function evaluate.\n", "\n", "We will set just one specific parameter to start, which is a scope of 'brain' to say\n", "we just want to run a model with just the brain features (i.e., not the co-variates)\n" ] }, { "cell_type": "code", "execution_count": 16, "id": "pleased-panic", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "779b2929beec46d4852596afd5e13ddd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Folds: 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean_scores_explained_variancemean_scores_neg_mean_squared_errorstd_scores_explained_variancestd_scores_neg_mean_squared_errormean_timing_fitmean_timing_score
scope
covars0.11317-14.517610.011921.452434.737120.03311
brain0.25656-12.152310.023941.1099615.581190.05231
all0.27908-11.787510.022941.1053715.453470.07103
thick0.10118-14.695640.027111.389718.069420.01529
area0.02774-15.896530.009911.462887.712400.01417
subcort0.05189-15.506150.008311.394054.389060.02439
dti_fa0.10302-14.667910.006591.393668.242340.01827
dti_md0.14541-13.962970.011271.218648.315850.01754
\n", "" ], "text/plain": [ " mean_scores_explained_variance mean_scores_neg_mean_squared_error \\\n", "scope \n", "covars 0.11317 -14.51761 \n", "brain 0.25656 -12.15231 \n", "all 0.27908 -11.78751 \n", "thick 0.10118 -14.69564 \n", "area 0.02774 -15.89653 \n", "subcort 0.05189 -15.50615 \n", "dti_fa 0.10302 -14.66791 \n", "dti_md 0.14541 -13.96297 \n", "\n", " std_scores_explained_variance std_scores_neg_mean_squared_error \\\n", "scope \n", "covars 0.01192 1.45243 \n", "brain 0.02394 1.10996 \n", "all 0.02294 1.10537 \n", "thick 0.02711 1.38971 \n", "area 0.00991 1.46288 \n", "subcort 0.00831 1.39405 \n", "dti_fa 0.00659 1.39366 \n", "dti_md 0.01127 1.21864 \n", "\n", " mean_timing_fit mean_timing_score \n", "scope \n", "covars 4.73712 0.03311 \n", "brain 15.58119 0.05231 \n", "all 15.45347 0.07103 \n", "thick 8.06942 0.01529 \n", "area 7.71240 0.01417 \n", "subcort 4.38906 0.02439 \n", "dti_fa 8.24234 0.01827 \n", "dti_md 8.31585 0.01754 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compare_scopes = bp.Compare(['covars', 'brain', 'all',\n", " 'thick', 'area', 'subcort',\n", " 'dti_fa', 'dti_md'])\n", "\n", "evaluators = bp.evaluate(pipeline=pipe,\n", " scope=compare_scopes,\n", " dataset=data,\n", " problem_spec=ps,\n", " cv=cv)\n", "\n", "# Look at a summary of the results - note this option is only available after a call to evaluate with\n", "# Compare's has been made!\n", "evaluators.summary()" ] }, { "cell_type": "markdown", "id": "whole-mason", "metadata": {}, "source": [ "We can also try another method available which will run a pairwise model ttest comparison between all options" ] }, { "cell_type": "code", "execution_count": 24, "id": "entertaining-beaver", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
scope (1)scope (2)t_statp_val
0covarsbrain-11.368250.00478
1covarsall-14.323680.00193
2covarsthick0.864501.00000
3covarsarea7.665060.02180
4covarssubcort9.073010.01145
5covarsdti_fa1.973691.00000
6covarsdti_md-4.568560.14381
7brainall-15.678120.00135
8brainthick30.461950.00010
9brainarea17.321170.00091
10brainsubcort17.400240.00090
11braindti_fa12.946960.00287
12braindti_md11.988640.00388
13allthick31.580270.00008
14allarea19.162810.00061
15allsubcort20.494400.00047
16alldti_fa15.581720.00139
17alldti_md15.373220.00146
18thickarea5.109270.09713
19thicksubcort3.323720.40987
20thickdti_fa-0.145991.00000
21thickdti_md-3.562520.32946
22areasubcort-2.769530.70498
23areadti_fa-11.382680.00476
24areadti_md-12.599380.00320
25subcortdti_fa-7.999380.01854
26subcortdti_md-31.206960.00009
27dti_fadti_md-6.475800.04102
\n", "
" ], "text/plain": [ " scope (1) scope (2) t_stat p_val\n", "0 covars brain -11.36825 0.00478\n", "1 covars all -14.32368 0.00193\n", "2 covars thick 0.86450 1.00000\n", "3 covars area 7.66506 0.02180\n", "4 covars subcort 9.07301 0.01145\n", "5 covars dti_fa 1.97369 1.00000\n", "6 covars dti_md -4.56856 0.14381\n", "7 brain all -15.67812 0.00135\n", "8 brain thick 30.46195 0.00010\n", "9 brain area 17.32117 0.00091\n", "10 brain subcort 17.40024 0.00090\n", "11 brain dti_fa 12.94696 0.00287\n", "12 brain dti_md 11.98864 0.00388\n", "13 all thick 31.58027 0.00008\n", "14 all area 19.16281 0.00061\n", "15 all subcort 20.49440 0.00047\n", "16 all dti_fa 15.58172 0.00139\n", "17 all dti_md 15.37322 0.00146\n", "18 thick area 5.10927 0.09713\n", "19 thick subcort 3.32372 0.40987\n", "20 thick dti_fa -0.14599 1.00000\n", "21 thick dti_md -3.56252 0.32946\n", "22 area subcort -2.76953 0.70498\n", "23 area dti_fa -11.38268 0.00476\n", "24 area dti_md -12.59938 0.00320\n", "25 subcort dti_fa -7.99938 0.01854\n", "26 subcort dti_md -31.20696 0.00009\n", "27 dti_fa dti_md -6.47580 0.04102" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluators.pairwise_t_stats(metric='explained_variance')" ] }, { "cell_type": "markdown", "id": "announced-dodge", "metadata": {}, "source": [ "We can also explicitly compare two evaluators" ] }, { "cell_type": "code", "execution_count": 25, "id": "peaceful-florence", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " This method is designed to perform a statistical comparison\n", " between the results from the evaluation stored in this object\n", " and another instance of :class:`BPtEvaluator`. The statistics\n", " produced here are explained in:\n", " https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_stats.html\n", "\n", " .. note::\n", " In the case that the sizes of the training and validation sets\n", " at each fold vary dramatically, it is unclear if this\n", " statistics are still valid.\n", " In that case, the mean train size and mean validation sizes\n", " are employed when computing statistics.\n", "\n", " Parameters\n", " ------------\n", " other : :class:`BPtEvaluator`\n", " Another instance of :class:`BPtEvaluator` in which\n", " to compare which. The cross-validation used\n", " should be the same in both instances, otherwise\n", " statistics will not be generated.\n", "\n", " rope_interval : list or dict of\n", " | This parameter allows for passing in a custom\n", " region of practical equivalence interval (or rope interval)\n", " a concept from bayesian statistics. If passed as\n", " a list, this should be a list with two elements, describing\n", " the difference in score which should be treated as two\n", " models or runs being practically equivalent.\n", "\n", " | Alternatively, in the case of multiple underlying\n", " scorers / metrics. A dictionary, where keys correspond\n", " to scorer / metric names can be passed with a separate\n", " rope_interval for each. For example:\n", "\n", " ::\n", "\n", " rope_interval = {'explained_variance': [-0.01, 0.01],\n", " 'neg_mean_squared_error': [-1, 1]}\n", "\n", " This example would define separate rope regions depending\n", " on the metric.\n", "\n", " ::\n", "\n", " default = [-0.01, 0.01]\n", "\n", " Returns\n", " -------\n", " compare_df : pandas DataFrame\n", " | The returned DataFrame will generate separate rows\n", " for all overlapping metrics / scorers between the\n", " evaluators being compared. Further, columns with\n", " statistics of interest will be generated:\n", "\n", " - 'mean_diff'\n", " The mean score minus other's mean score\n", "\n", " - 'std_diff'\n", " The std minus other's std\n", "\n", " | Further, only in the case that the cross-validation\n", " folds are identical between the comparisons,\n", " the following additional columns will be generated:\n", "\n", " - 't_stat'\n", " Corrected paired ttest statistic.\n", "\n", " - 'p_val'\n", " The p value for the corrected paired ttest statistic.\n", "\n", " - 'better_prob'\n", " The probability that this evaluated option is better than\n", " the other evaluated option under a bayesian framework and\n", " the passed value of rope_interval. See sklearn example\n", " for more details.\n", "\n", " - 'worse_prob'\n", " The probability that this evaluated option is worse than\n", " the other evaluated option under a bayesian framework and\n", " the passed value of rope_interval. See sklearn example\n", " for more details.\n", "\n", " - 'rope_prob'\n", " The probability that this evaluated option is equivalent to\n", " the other evaluated option under a bayesian framework and\n", " the passed value of rope_interval. See sklearn example\n", " for more details.\n", "\n", " \n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean_diffstd_difft_statp_valbetter_probworse_probrope_prob
explained_variance-0.14339-0.01202-11.368250.000170.000130.999770.00009
neg_mean_squared_error-2.365300.34247-8.096970.000630.000620.999360.00002
\n", "
" ], "text/plain": [ " mean_diff std_diff t_stat p_val better_prob \\\n", "explained_variance -0.14339 -0.01202 -11.36825 0.00017 0.00013 \n", "neg_mean_squared_error -2.36530 0.34247 -8.09697 0.00063 0.00062 \n", "\n", " worse_prob rope_prob \n", "explained_variance 0.99977 0.00009 \n", "neg_mean_squared_error 0.99936 0.00002 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "e1 = evaluators['scope=covars']\n", "e2 = evaluators['scope=brain']\n", "\n", "# Look at function docstring\n", "print(' ', e1.compare.__doc__)\n", "print()\n", "\n", "e1.compare(e2)" ] }, { "cell_type": "markdown", "id": "sufficient-poker", "metadata": {}, "source": [ "Okay so it looks like it ran all the different combinations, but how do we look at each indivudal evaluator / the full results?? There are few different ways to index the object storing the different evaluators, but its essentially a dictionary. We can index one run as follows:" ] }, { "cell_type": "code", "execution_count": 26, "id": "increasing-throw", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BPtEvaluator\n", "------------\n", "mean_scores = {'explained_variance': 0.11316930054010477, 'neg_mean_squared_error': -14.517611278919068}\n", "std_scores = {'explained_variance': 0.011916342728361404, 'neg_mean_squared_error': 1.4524277905958483}\n", "\n", "Saved Attributes: ['estimators', 'preds', 'timing', 'train_subjects', 'val_subjects', 'feat_names', 'ps', 'mean_scores', 'std_scores', 'weighted_mean_scores', 'scores', 'fis_', 'coef_']\n", "\n", "Available Methods: ['get_preds_dfs', 'get_fis', 'get_coef_', 'permutation_importance']\n", "\n", "Evaluated with:\n", "ProblemSpec(n_jobs=8, problem_type='regression', random_state=10,\n", " scope='covars',\n", " scorer={'explained_variance': make_scorer(explained_variance_score),\n", " 'neg_mean_squared_error': make_scorer(mean_squared_error, greater_is_better=False)},\n", " subjects='train', target='b_bmi')" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluators['scope=covars']" ] }, { "cell_type": "code", "execution_count": 27, "id": "informational-honolulu", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b_demo_highest_education_categories=1 0.59780\n", "b_demo_highest_education_categories=2 0.62281\n", "b_demo_highest_education_categories=4 -0.79331\n", "b_demo_highest_education_categories=5 -1.12268\n", "b_mri_info_deviceserialnumber='HASH03db707f' 0.17417\n", "b_mri_info_deviceserialnumber='HASH11ad4ed5' -0.60833\n", "b_mri_info_deviceserialnumber='HASH1314a204' -0.70919\n", "b_mri_info_deviceserialnumber='HASH311170b9' -0.14871\n", "b_mri_info_deviceserialnumber='HASH3935c89e' -0.42419\n", "b_mri_info_deviceserialnumber='HASH4b0b8b05' 0.02187\n", "b_mri_info_deviceserialnumber='HASH5ac2b20b' -0.00810\n", "b_mri_info_deviceserialnumber='HASH5b0cf1bb' 0.02770\n", "b_mri_info_deviceserialnumber='HASH5b2fcf80' -0.32682\n", "b_mri_info_deviceserialnumber='HASH65b39280' 0.44976\n", "b_mri_info_deviceserialnumber='HASH69f406fa' 0.00030\n", "b_mri_info_deviceserialnumber='HASH6b4422a7' 0.09136\n", "b_mri_info_deviceserialnumber='HASH7911780b' 0.73788\n", "b_mri_info_deviceserialnumber='HASH7f91147d' -0.05698\n", "b_mri_info_deviceserialnumber='HASH96a0c182' -0.01863\n", "b_mri_info_deviceserialnumber='HASHa3e45734' -0.09858\n", "b_mri_info_deviceserialnumber='HASHb640a1b8' 0.56248\n", "b_mri_info_deviceserialnumber='HASHc3bf3d9c' 0.10516\n", "b_mri_info_deviceserialnumber='HASHd422be27' 0.00579\n", "b_mri_info_deviceserialnumber='HASHd7cb4c6d' 0.01132\n", "b_mri_info_deviceserialnumber='HASHdb2589d4' 0.00784\n", "b_mri_info_deviceserialnumber='HASHe4f6957a' -0.19978\n", "b_mri_info_deviceserialnumber='HASHfeb7e81a' 0.03217\n", "b_race_ethnicity_categories=1 -0.73047\n", "b_race_ethnicity_categories=2 0.87523\n", "b_race_ethnicity_categories=3 0.74217\n", "b_race_ethnicity_categories=4 -0.32284\n", "b_sex=1 -0.08126\n", "b_sex=2 0.07723\n", "b_agemos 0.39502\n", "dtype: float64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note, with get_fis, if mean=True it will return\n", "# the mean and only non-null, and non-zero features directly.\n", "evaluators['scope=covars'].get_fis(mean=True)" ] }, { "cell_type": "markdown", "id": "suburban-corner", "metadata": {}, "source": [ "Warning: This doesn't mean that this mean features were selected! Remember this is the average over training 5 models, so it means that if a feature shows up it was selected at least once in one of the models!" ] }, { "cell_type": "code", "execution_count": 28, "id": "sexual-biodiversity", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys([Options(scope=covars), Options(scope=brain), Options(scope=all), Options(scope=thick), Options(scope=area), Options(scope=subcort), Options(scope=dti_fa), Options(scope=dti_md)])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# These are the different options\n", "evaluators.keys()" ] }, { "cell_type": "code", "execution_count": null, "id": "residential-signature", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "african-thanksgiving", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.1 64-bit ('bpt': conda)", "language": "python", "name": "python39164bitbptconda7805b3f5d58e4b658b79cb94739371e6" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 5 }