{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predict Waist Circumference with Diffusion Weighted Imaging\n", "\n", "This notebook using diffusion weighted imaging data, and subjects waist circumference in cm from the ABCD Study.\n", "We will use as input feature derived Restriction spectrum imaging (RSI) from diffusion weighted images. This notebook\n", "covers data loading as well as evaluation across a large number of different ML Pipelines. This notebook may be useful\n", "for people looking for more examples on what different Pipelines to try." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import BPt as bp\n", "import pandas as pd\n", "import os\n", "\n", "from warnings import simplefilter\n", "from sklearn.exceptions import ConvergenceWarning\n", "simplefilter(\"ignore\", category=ConvergenceWarning)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the data needed\n", "\n", "Data is loaded from a large csv file with all of the features from release 2 of the ABCD study." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def load_from_rds(names, eventname='baseline_year_1_arm_1'):\n", " \n", " data = pd.read_csv('data/nda_rds_201.csv',\n", " usecols=['src_subject_id', 'eventname'] + names,\n", " na_values=['777', 999, '999', 777])\n", " \n", " data = data.loc[data[data['eventname'] == eventname].index]\n", " data = data.set_index('src_subject_id')\n", " data = data.drop('eventname', axis=1)\n", " \n", " # Obsificate subject ID for public example\n", " data.index = list(range(len(data)))\n", " \n", " # Return as pandas DataFrame cast to BPt Dataset\n", " return bp.Dataset(data)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['subjectid',\n", " 'src_subject_id',\n", " 'eventname',\n", " 'anthro_1_height_in',\n", " 'anthro_2_height_in',\n", " 'anthro_3_height_in',\n", " 'anthro_height_calc',\n", " 'anthro_weight_cast',\n", " 'anthro_weight_a_location',\n", " 'anthro_weight1_lb']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This way we can look at all column available\n", "all_cols = list(pd.read_csv('data/nda_rds_201.csv', nrows=0))\n", "all_cols[:10]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "294" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The target variable\n", "target_cols = ['anthro_waist_cm']\n", "\n", "# non input feature - i.e., those that inform \n", "non_input_cols = ['sex', 'rel_family_id']\n", "\n", "# We will use the fiber at dti measures\n", "dti_cols = [c for c in all_cols if '_fiber.at' in c and 'rsi.' in c]\n", "len(dti_cols)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use the helper function defined at the start to load these features in as a Dataset" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(11875, 297)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = load_from_rds(target_cols + non_input_cols + dti_cols)\n", "data.shape" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# This is optional, but will print out some extra verbosity when using the dataset operations\n", "data.verbose = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step we will do is tell the dataset what roles the different columns are. See: https://sahahn.github.io/BPt/user_guide/role.html" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 2 Rows\n", "Dropped 6 Rows\n" ] }, { "data": { "text/html": [ "
\n", "

Data

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dmri_rsi.n0_fiber.at_allfib.lhdmri_rsi.n0_fiber.at_allfib.rhdmri_rsi.n0_fiber.at_allfibersdmri_rsi.n0_fiber.at_allfibnocc.lhdmri_rsi.n0_fiber.at_allfibnocc.rhdmri_rsi.n0_fiber.at_atr.lhdmri_rsi.n0_fiber.at_atr.rhdmri_rsi.n0_fiber.at_ccdmri_rsi.n0_fiber.at_cgc.lhdmri_rsi.n0_fiber.at_cgc.rh...dmri_rsi.vol_fiber.at_scs.lhdmri_rsi.vol_fiber.at_scs.rhdmri_rsi.vol_fiber.at_sifc.lhdmri_rsi.vol_fiber.at_sifc.rhdmri_rsi.vol_fiber.at_slf.lhdmri_rsi.vol_fiber.at_slf.rhdmri_rsi.vol_fiber.at_tslf.lhdmri_rsi.vol_fiber.at_tslf.rhdmri_rsi.vol_fiber.at_unc.lhdmri_rsi.vol_fiber.at_unc.rh
00.3276230.3234200.3259570.3405590.3323640.3478370.3360720.3068030.3113470.304854...23672.013056.09648.09528.010152.011504.08384.08024.04968.07176.0
1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
20.3253740.3114650.3190270.3412130.3263340.3466510.3353620.2881240.3264160.300990...33112.019256.011928.08688.013144.015344.010488.010936.06904.09480.0
30.3050950.3043570.3051700.3154770.3128660.3139720.3167290.2887420.2891660.290347...28480.016016.013024.011960.013600.014880.011416.010592.06952.08736.0
40.3168600.3152380.3163990.3282510.3272590.3339980.3181620.2940080.2978000.299230...29904.017968.012720.011336.013528.015672.011096.011816.05912.07336.0
..................................................................
118700.3357410.3360480.3363720.3498060.3477320.3499660.3456920.3120560.3240540.336676...28328.015400.09656.010080.011312.013496.08728.09176.04960.07392.0
118710.3205630.3175250.3194290.3273020.3221610.3330860.3154820.3085540.2999380.298093...23792.013632.09928.08912.09152.012288.07128.08912.05744.07376.0
118720.3270510.3253860.3265220.3409180.3348540.3454350.3356100.3057200.3086300.330612...28640.016384.09496.011216.012168.012312.09520.08952.04568.09056.0
118730.3235790.3193770.3218050.3349450.3294330.3322000.3340170.3043990.3038310.307037...26216.014672.09408.08872.010960.012584.08880.09176.03696.06168.0
118740.3835370.3714830.3778220.3944130.3734860.4046840.3744350.3662700.4153420.418280...26544.015624.09904.010360.09904.012216.07712.08000.05208.08816.0
\n", "

11867 rows × 294 columns

\n", "
\n", "
\n", "

Target

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
anthro_waist_cm
031.00
130.50
226.75
323.50
430.00
......
1187026.00
1187130.00
1187219.00
1187325.00
1187432.00
\n", "

11867 rows × 1 columns

\n", "
\n", "
\n", "

Non Input

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rel_family_idsex
08780.0F
110207.0F
24720.0M
33804.0M
45358.0M
.........
118703791.0M
118712441.0F
118727036.0F
118736681.0F
118747588.0F
\n", "

11867 rows × 2 columns

\n", "
\n" ], "text/plain": [ " anthro_waist_cm dmri_rsi.n0_fiber.at_fx.rh \\\n", "0 31.00 0.246540 \n", "1 30.50 NaN \n", "2 26.75 0.146416 \n", "3 23.50 0.229894 \n", "4 30.00 0.192228 \n", "... ... ... \n", "11870 26.00 0.236385 \n", "11871 30.00 0.247628 \n", "11872 19.00 0.224581 \n", "11873 25.00 0.212500 \n", "11874 32.00 0.237271 \n", "\n", " dmri_rsi.n0_fiber.at_fx.lh dmri_rsi.n0_fiber.at_cgc.rh \\\n", "0 0.240964 0.304854 \n", "1 NaN NaN \n", "2 0.241515 0.300990 \n", "3 0.225981 0.290347 \n", "4 0.201559 0.299230 \n", "... ... ... \n", "11870 0.233723 0.336676 \n", "11871 0.244926 0.298093 \n", "11872 0.181725 0.330612 \n", "11873 0.225045 0.307037 \n", "11874 0.232131 0.418280 \n", "\n", " dmri_rsi.n0_fiber.at_cgc.lh dmri_rsi.n0_fiber.at_cgh.rh \\\n", "0 0.311347 0.255081 \n", "1 NaN NaN \n", "2 0.326416 0.230479 \n", "3 0.289166 0.204329 \n", "4 0.297800 0.249119 \n", "... ... ... \n", "11870 0.324054 0.245577 \n", "11871 0.299938 0.201506 \n", "11872 0.308630 0.251494 \n", "11873 0.303831 0.254441 \n", "11874 0.415342 0.241162 \n", "\n", " dmri_rsi.n0_fiber.at_cgh.lh dmri_rsi.n0_fiber.at_cst.rh \\\n", "0 0.244332 0.378148 \n", "1 NaN NaN \n", "2 0.232169 0.386119 \n", "3 0.200217 0.364850 \n", "4 0.248674 0.365131 \n", "... ... ... \n", "11870 0.259280 0.388121 \n", "11871 0.192919 0.386154 \n", "11872 0.265312 0.382624 \n", "11873 0.257420 0.380283 \n", "11874 0.263211 0.394936 \n", "\n", " dmri_rsi.n0_fiber.at_cst.lh dmri_rsi.n0_fiber.at_atr.rh ... \\\n", "0 0.388728 0.336072 ... \n", "1 NaN NaN ... \n", "2 0.400546 0.335362 ... \n", "3 0.365397 0.316729 ... \n", "4 0.359856 0.318162 ... \n", "... ... ... ... \n", "11870 0.388715 0.345692 ... \n", "11871 0.390666 0.315482 ... \n", "11872 0.379022 0.335610 ... \n", "11873 0.377752 0.334017 ... \n", "11874 0.414786 0.374435 ... \n", "\n", " dmri_rsi.vol_fiber.at_ifsfc.lh dmri_rsi.vol_fiber.at_fxcut.rh \\\n", "0 13224.0 2720.0 \n", "1 NaN NaN \n", "2 18080.0 2728.0 \n", "3 18768.0 2776.0 \n", "4 18112.0 2528.0 \n", "... ... ... \n", "11870 15848.0 2760.0 \n", "11871 14368.0 2728.0 \n", "11872 17616.0 2928.0 \n", "11873 16024.0 2728.0 \n", "11874 17096.0 1976.0 \n", "\n", " dmri_rsi.vol_fiber.at_fxcut.lh dmri_rsi.vol_fiber.at_allfibers \\\n", "0 1928.0 264000.0 \n", "1 NaN NaN \n", "2 2264.0 339840.0 \n", "3 1784.0 331024.0 \n", "4 2408.0 327192.0 \n", "... ... ... \n", "11870 2192.0 292304.0 \n", "11871 2072.0 271624.0 \n", "11872 2072.0 317816.0 \n", "11873 2032.0 286832.0 \n", "11874 1776.0 298280.0 \n", "\n", " dmri_rsi.vol_fiber.at_allfibnocc.rh \\\n", "0 91368.0 \n", "1 NaN \n", "2 119280.0 \n", "3 114912.0 \n", "4 115360.0 \n", "... ... \n", "11870 101352.0 \n", "11871 98328.0 \n", "11872 113016.0 \n", "11873 97896.0 \n", "11874 105096.0 \n", "\n", " dmri_rsi.vol_fiber.at_allfibnocc.lh dmri_rsi.vol_fiber.at_allfib.rh \\\n", "0 92144.0 133816.0 \n", "1 NaN NaN \n", "2 123784.0 166808.0 \n", "3 115472.0 167400.0 \n", "4 114592.0 165224.0 \n", "... ... ... \n", "11870 103848.0 145272.0 \n", "11871 94352.0 138504.0 \n", "11872 106280.0 162352.0 \n", "11873 97496.0 143704.0 \n", "11874 101088.0 152848.0 \n", "\n", " dmri_rsi.vol_fiber.at_allfib.lh sex rel_family_id \n", "0 131856.0 F 8780.0 \n", "1 NaN F 10207.0 \n", "2 174480.0 M 4720.0 \n", "3 165336.0 M 3804.0 \n", "4 164176.0 M 5358.0 \n", "... ... ... ... \n", "11870 148776.0 M 3791.0 \n", "11871 134832.0 F 2441.0 \n", "11872 157344.0 F 7036.0 \n", "11873 144768.0 F 6681.0 \n", "11874 147488.0 F 7588.0 \n", "\n", "[11867 rows x 297 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = data.set_target(target_cols) # Note we doing data = data.func()\n", "data = data.set_non_input(non_input_cols)\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few things to note right off the bat.\n", "\n", "1. The verbosity printed us out two statements, about dropping rows. This is due to a constraint on columns of role 'non input' that there cannot be any NaN / missing data, so those lines just say 2 NaN's were found when loading the first non input column and 6 when loading the next.\n", "\n", "2. The values for sex are still 'F' and 'M', we will handle that next.\n", "\n", "3. Some columns with role data are missing values. We will handle that as well." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "

Non Input

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rel_family_idsex
073210
186340
239711
331391
445431
.........
1187031281
1187121110
1187259070
1187355940
1187462380
\n", "

11867 rows × 2 columns

\n", "
\n" ], "text/plain": [ " rel_family_id sex\n", "0 7321 0\n", "1 8634 0\n", "2 3971 1\n", "3 3139 1\n", "4 4543 1\n", "... ... ..\n", "11870 3128 1\n", "11871 2111 0\n", "11872 5907 0\n", "11873 5594 0\n", "11874 6238 0\n", "\n", "[11867 rows x 2 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We explicitly say this variable should be binary\n", "data.to_binary('sex', inplace=True)\n", "\n", "# We will ordinalize rel_family_id too\n", "data = data.ordinalize(scope='rel_family_id')\n", "\n", "data['non input']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next let's look into that NaN problem we saw before." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loaded NaN Info:\n", "There are: 332348 total missing values\n", "180 columns found with 1131 missing values (column name overlap: ['dmri_rsi.n', '_fiber.at_'])\n", "66 columns found with 1130 missing values (column name overlap: ['dmri_rsi.n', '_fiber.at_'])\n", "42 columns found with 1128 missing values (column name overlap: ['dmri_rsi.vol_fiber.at_'])\n", "6 columns found with 1133 missing values (column name overlap: ['_fiber.at_cgh.lh', 'dmri_rsi.n'])\n", "\n" ] } ], "source": [ "data.nan_info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seems like most of the missing data is missing for everyone, i.e., if the above info founds columns with only a few missing values, we might want to do something different, but this tells us that when data is missing it is missing for all columns.\n", "\n", "We just drop any subjects with any NaN data below across the target variable and the Data" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 1145 Rows\n" ] } ], "source": [ "data = data.drop_nan_subjects(scope='all')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another thing we need to worry about with data like this is corrupted data, i.e., data with values that don't make sense due to a failure in the automatic processing pipeline. Let's look at the target variable first, then the data." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "anthro_waist_cm: 10722 rows\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAuIklEQVR4nO3deZwcV33v/c93dm0zkkajffWK5Q3bQpjrsITFmABWuBiw2QlcQoJJ2G4eh5twwUAe4EkgJDiExYBZgm2cAA4YDNhgwNjG8iZbXmVZuyyNllk00uy/54+qlluj1kzPqHu6e+b7fr3mNd1V1dWne2r62+ecqnMUEZiZmQ1VVeoCmJlZeXJAmJlZTg4IMzPLyQFhZmY5OSDMzCwnB4SZmeXkgDAzs5wcEFY0kkLSSaUuRz4kHZB0QqnLUQiSlqavp7pA+/t3SX+f3n6RpG2F2G+6v+dLeqxQ+7PCckBYQUj6taR3lbocYxUR0yNi43DbFPrDcSwkvV3SQBoAByQ9Jekbkk7JbBMRW9LXM5DHvn430nNGxHsi4hMFKv8RXxoi4rcRcWoh9m2F54CwsiCpptRlqCB3RMR0oAl4KXAIuEfSGYV+okLVQqwyOSDsMElXSHpSUqekhyW9Jmvd2yX9TtI/StqffnN9RbruU8DzgS+m32q/mLXbl0p6QlKbpKskKWt/t0v6vKS9wMckNUn6lqRWSZsl/Z2kYY/RdLvz0ttvSr+hnp7ef6ekH6a3V0u6Iy3HTklflFSXtZ/D32wl/Un6+jslbZf0YUnTgJ8CC7O+vS8cplzVkj6S9X7eI2lJ1nP9Zfq+dEr6hKQTJf1eUoek67PLdiwRMRART0bEXwK3AR9L9788fY6arPd6Y/pcT6Xv02nAvwPPS19LW7rtNyV9SdJNkrqAP06XfXLI6/uIpD2SNkl6U9byI2qS2bUUSb9JFz+QPucbhtbKJJ2W7qNN0npJF2et+2Z6DP0kfS13STpxpPfJjkNE+Mc/RATA64CFJF8c3gB0AQvSdW8H+oD/BVQDfwHsAJSu/zXwriH7C+DHwExgKdAKXJS1v37gfUANMAX4FvAjYAawHHgceOcIZf4W8KH09leAJ4G/yFr3gfT2ecD56XMtBx4B3j+krCelt3cCz09vzwLOTW+/CNiW53v5v4EHgVMBAWcDzVnP9SOgETgd6AFuAU4gqRU8DLztGPt9O/C7HMv/DNiV3l6ePkcNMA3oAE5N1y0ATj/WvoBvAu3ABelx0JAu+2TWe9APfA6oB16YHieZ/R9xHAx9juz3eeh7CtQCG4CPAHXAi4HOrH1/E9gLrE5f23eBa0v9fzORf1yDsMMi4vsRsSMiBiPiOuAJkn/GjM0R8dVI2ravIfmwmTfCbj8dEW0RsQX4FfDsrHU7IuJfI6If6AUuBf42IjojYhPwT8BbRtj/bSQfUpDUYv7frPsvTNcTEfdExJ0R0Z/u+8tZ2w3VB6yU1BgR+yPi3hHKkMu7gL+LiMci8UBE7M1a/9mI6IiI9cBDwM8jYmNEtJPUVM4Z5fPtAGYfY90gcIakKRGxM33O4fwoIm5Pj4PuY2zz9xHRExG3AT8BXj/K8uZyPjCd5JjpjYhbSb5gXJa1zQ8i4g/pMfNdjjyerMAcEHaYpLdKuj+t3rcBZwBzsjZ5OnMjIg6mN6ePsNuns24fHLL91qzbc0i+QW7OWrYZWDTC/m8Dni9pAUnN5nrgAknLSb6N3w8g6RRJP5b0tKQO4B848rVley3wJ8BmSbdJet4IZchlCUlt5lh2Zd0+lOP+SO/rUIuAfUMXRkQXSW3wPcDOtHnmWSPsa+sI6/en+83YTFLzPF4Lga0RMThk39nHwHDHkxWYA8IAkLQM+CpwOUlTyEySb7bKcxdjGTc++zF7SL65L8tathTYPuwOIjaQfFC8D/hNRHSQfIi8m6RpI/Nh8yXgUeDkiGgkacbI+doi4u6IWAPMBX5IEjpDyzuSrcB4to+/BvhtrhURcXNEvIykxvcoyd8Zjv16Rnqds9I+mYylJDUYSJqbpmatmz/CvrLtAJYM6Xca8Riw4nFAWMY0kg+GVgBJ7yCpQeRrF0kb+pikzVbXA5+SNCMNrA8C38nj4beRBNtt6f1fD7kPSb9GB3Ag/Qb9F7l2JKku7cRtioi+9DGZkNkFNEtqyqNMXwM+IelkJc6S1JzH4/KWdoSvkPSvJG35H8+xzTxJa9IP9B7gAEe+nsX5dIjn8PH0vXo+8Crg++ny+4H/KWlq2un/ziGPG+44uYsk7P9GUq2kFwGvBq4dQ/msABwQBkBEPEzS5n8HyT/xmcDto9jFF4BLlJzh9C9jLMb7SL6BbgR+B/wH8PU8HncbSQD85hj3AT4MvJGk0/OrwHXD7O8twKa0Keo9wJsAIuJR4HvAxrQZbrhmlc+RBN7PSULmapKO+EJ4nqQD6X5/TdLZ/ZyIeDDHtlUkQbuDpAnqhTwTjrcC64GnJe0ZxfM/DexP9/ld4D3pewPweZL+pF0k/VTfHfLYjwHXpO/fEf0WEdFLEgivIKlR/hvw1qx92zjLnIFiZmZ2BNcgzMwsJweElT0lYwEdyPHz7yUu10+PUa6PlLJcZoXiJiYzM8tpwox/M2fOnFi+fHmpi2FmVlHuueeePRHRkmvdhAmI5cuXs3bt2lIXw8ysokjafKx17oMwM7OcHBBmZpaTA8LMzHJyQJiZWU4OCDMzy8kBYWZmOTkgzMwsJweEFZSvzDebOCbMhXJWOhHBZ29+jJ+s28nAYPCrD7+Iuhp/9zCrdP4vtuO2s72bL/36SYJge9sh1m1rK3WRzKwAHBB23B7a3g7Axy8+HQl+/+TeEpfIzAqhqAEh6SJJj0naIOmKHOtfIOleSf2SLsla/mxJd0haL2mdpDcUs5x2fB7a3k6V4HknzOG0+Y3c4YAwmxCKFhCSqoGrSKYPXAlcJmnlkM22AG8nmVoy20GSqQZPBy4C/lnSzGKV1Y7PQzs6OGnudKbUVfO8E5u5Z8t+uvsGSl0sMztOxaxBrAY2RMTGdK7Za4E12RtExKaIWMczk6hnlj8eEU+kt3cAu4Gcw9Fa6T24vZ0zFjUB8LwTmuntH+S+LW2lLZSZHbdiBsQiYGvW/W3pslGRtBqoA57Mse7dktZKWtva2jrmgtrY7e7oprWzhzMWJgGx+oTZVAnuesrNTGaVrqw7qSUtAL4NvCMiBoeuj4ivRMSqiFjV0uIKRik8tCPpoD5zcRIQjQ21LJ41lSdbu0pZLDMrgGIGxHZgSdb9xemyvEhqBH4C/J+IuLPAZbMCeWh7BxKsXNB4eNmy5qls3uuAMKt0xQyIu4GTJa2QVAdcCtyYzwPT7X8AfCsibihiGe04bdl3kPmNDUyrf+aayyQgDpawVGZWCEULiIjoBy4HbgYeAa6PiPWSrpR0MYCk50jaBrwO+LKk9enDXw+8AHi7pPvTn2cXq6w2drs6upnb2HDEsuXN02g/1Efbwd4SlcrMCqGoQ21ExE3ATUOWfTTr9t0kTU9DH/cd4DvFLJsVxu6OHpY1Tz1i2dLZyf3New8yc2pdKYplZgVQ1p3UVv52dXYzb2gNYs40ADa5H8KsojkgbMy6+wZoO9jHvMb6I5ZnahBb3A9hVtEcEDZmrZ09AEf1QTTUVjO/sYFNDgiziuaAsDHb1dENcFQTE/hUV7OJwAFhY7arI61BzKg/at2y5qls3ucahFklc0DYmA1fg5hGa2cPB3v7x7tYZlYgDggbs12d3dRWi1lTa49at2jmFCCZTMjMKpMDwsZsd0cPc2c0IOmodZlaxdMOCLOK5YCwMdvV0X3UKa4Z85scEGaVzgFhY5YExNH9DwDzMzWIDgeEWaVyQNiY7e7oOWZATKmrpmlKrWsQZhXMAWFjcrC3n86efuYeo4kJklqEaxBmlcsBYWOy90AyUuucacMERFPD4VNhzazyOCBsTNoP9QEwM8cprhnzGxt8mqtZBXNA2JjsT+d6GG4473lNDew50EPfwFGzxZpZBXBA2Ji0HRy5BrGgqYGIZwb1M7PK4oCwMWk7XIMYvokJfDW1WaVyQNiYHK5BTBmmiSkNCHdUm1UmB4SNSduhPqbVVVNXc+xDaIGvpjaraA4IG5P9B3tHnG965tRa6mqqfC2EWYVyQNiYtB/so2nKsfsfACQxr7HeTUxmFcoBYWPSdqiPWdOGDwiAuTMa2N3hs5jMKpEDwsZk/8HeYTuoM+Y11rO70zUIs0rkgLAxaT/YN+wprhmuQZhVLgeEjVpE0HYov4BomVFPZ08/h3oHxqFkZlZIRQ0ISRdJekzSBklX5Fj/Akn3SuqXdMmQdW+T9ET687ZiltNGp7Onn4HByKuJae6MZDA/NzOZVZ6iBYSkauAq4BXASuAySSuHbLYFeDvwH0MeOxv4v8BzgdXA/5U0q1hltdFpz2OYjYy56cVyuz3chlnFKWYNYjWwISI2RkQvcC2wJnuDiNgUEeuAoaO5vRz4RUTsi4j9wC+Ai4pYVhuFfAbqyzhcg3A/hFnFKWZALAK2Zt3fli4r2GMlvVvSWklrW1tbx1xQG53MMBuz8qhBzDtcg3ATk1mlqehO6oj4SkSsiohVLS0tpS7OpNGWx1wQGbOm1lJbLTcxmVWgYgbEdmBJ1v3F6bJiP9aKLDOSa1MendSSaJnuq6nNKlExA+Ju4GRJKyTVAZcCN+b52JuBCyXNSjunL0yXWRnIZy6IbC2NDZ4TwqwCFS0gIqIfuJzkg/0R4PqIWC/pSkkXA0h6jqRtwOuAL0tanz52H/AJkpC5G7gyXWZloO1gH9Pra6itzu/wmTuj3p3UZhWoppg7j4ibgJuGLPto1u27SZqPcj3268DXi1k+G5u2Q70jDtSXbe6MetZucr6bVZqK7qS20ujs7mdGQ/7fLebOaGD/wT56+z03tVklcUDYqHV299HYkH8NYl5jci1E6wE3M5lVEgeEjdqoaxBpQPhMJrPK4oCwUevo7ht1ExP4amqzSuOAsFFLahCj66QGaPXV1GYVxQFhoxIRo25iap5eT5U8YJ9ZpXFA2Kgc6htgYDBoHMVprtVVonm6r4UwqzQOCBuVzu5+gFHVIMBTj5pVIgeEjUpndzLMxmj6ICCdetRNTGYVxQFho9IxxhrE3Bn17HITk1lFcUDYqHSkQ303jiEg9nb10D/gq6nNKoUDwkblmT6I0TUxtTQ2EAF7u3qLUSwzKwIHhI1KJiBGM9QGeOpRs0rkgLBReaaTevRNTOCpR80qiQPCRqWzu5/qKjG1rnpUj3tmbmrXIMwqhQPCRqWzO5ksSNKoHjdnugfsM6s0DggbldEOs5FRV1PF7Gl1rkGYVRAHhI1KMpLr6DqoM+bOqGdXu2sQZpXCAWGj0jHGGgTAwplTeNpNTGYVwwFho9LZ3T/qU1wz5jc1sNM1CLOK4YCwUUmmGx1jDaKpgX1dvXT3DRS4VGZWDA4IG5WxdlIDzG+aAsDTrkWYVQQHhOUtIjjQM7rZ5LItbEquhXAzk1llcEBY3g72JpMFjb0GkQmIQ4UslpkVSVEDQtJFkh6TtEHSFTnW10u6Ll1/l6Tl6fJaSddIelDSI5L+tpjltPx0jHEuiIwFaROTaxBmlaFoASGpGrgKeAWwErhM0sohm70T2B8RJwGfBz6TLn8dUB8RZwLnAX+eCQ8rnbHOJpcxpa6amVNrXYMwqxDFrEGsBjZExMaI6AWuBdYM2WYNcE16+wbgJUrGcAhgmqQaYArQC3QUsayWh8xAfaOZj3qoBU1T3EltViGKGRCLgK1Z97ely3JuExH9QDvQTBIWXcBOYAvwjxGxb+gTSHq3pLWS1ra2thb+FdgRxjqbXLYFTQ3saHNAmFWCcu2kXg0MAAuBFcCHJJ0wdKOI+EpErIqIVS0tLeNdxknnmbkgji8gfDW1WWUoZkBsB5Zk3V+cLsu5Tdqc1ATsBd4I/Cwi+iJiN3A7sKqIZbU8dB5nJzUkAeGL5cwqQzED4m7gZEkrJNUBlwI3DtnmRuBt6e1LgFsjIkialV4MIGkacD7waBHLank43k5q8JlMZpWkaAGR9ilcDtwMPAJcHxHrJV0p6eJ0s6uBZkkbgA8CmVNhrwKmS1pPEjTfiIh1xSqr5afjUB/VVWJK7egmC8q2cGYSEDvafCaTWbkb+1fBPETETcBNQ5Z9NOt2N8kprUMfdyDXciutzDAbo50sKNuS2UlAbN13sFDFMrMiKddOaitDyUB9Y+9/AJjf2EB1ldi63wFhVu4cEJa34xmoL6OmuoqFMxvYtt9NTGblzgFheStEQAAsmTXVTUxmFcABYXk7nulGsy2eNYWtrkGYlb28AkLSf0l6pSQHyiRWyBpEa2ePr4UwK3P5fuD/G8nFa09I+rSkU4tYJitTHQXopAZYMnsqgPshzMpcXgEREb+MiDcB5wKbgF9K+r2kd0g6/k8MK3uDg5nJgo6/BrF4Vnqqq89kMitreTcZSWoG3g68C7gP+AJJYPyiKCWzstLV208ErkGYTSJ5fR2U9APgVODbwKsjYme66jpJa4tVOCsfhRhmI6Nlej11NVVs85lMZmUt3//2r6ZXRR8mqT4ieiLCg+hNAs8ExPHXIKqqxOKZU9zEZFbm8m1i+mSOZXcUsiBW3p4ZybUwo7MsbZ7K5r0OCLNyNux/u6T5JJP6TJF0DpAZhKcRmFrkslkZKWQTE8Dy5mms3bSfiDiusZ3MrHhG+m9/OUnH9GLgc1nLO4GPFKlMVoY6CjAXRLZlzVM50NPP3q5e5kyvL8g+zaywhg2IiLgGuEbSayPiP8epTFaGOgowm1y25XOmAbBpT5cDwqxMjdTE9OaI+A6wXNIHh66PiM/leJhNQJk+iMYphalBLG9OA2LvQVYtn12QfZpZYY30dXBa+nt6sQti5a2zu5/aalFfU5jRVhbPmkJ1ldi8t6sg+zOzwhupienL6e+Pj09xrFx1pgP1FapDuba6isWzpvDUHgeEWbnKd7C+z0pqlFQr6RZJrZLeXOzCWfko1EB92ZY1T/OprmZlLN/2ggsjogN4FclYTCcB/7tYhbLy03Gor+ABsbx5Kpv2dhERBd2vmRVGvgGR+WR4JfD9iGgvUnmsTHV29zOjvrDjMi5rnkZndz/7unoLul8zK4x8A+LHkh4FzgNukdQCdBevWFZuOrv7aZxS2BrEijnJtZab3FFtVpbyHe77CuB/AKsiog/oAtYUs2BWXjoLNJtcthVzkpPjNrY6IMzK0Wi+Ej6L5HqI7Md8q8DlsTJVjE7qJbOmUFstnnRAmJWlfIf7/jZwInA/kJknMnBATAqDg8GB3v6C1yBqqqtY3jyNJ1sPFHS/ZlYY+X4lXAWsDJ9uMikdODxZUGFrEAAntExjw24HhFk5yreT+iFg/mh3LukiSY9J2iDpihzr6yVdl66/S9LyrHVnSbpD0npJD0pqGO3zW2F0HCrsUN/ZTmyZzua9B+kbGCz4vs3s+OT7Hz8HeFjSH4CezMKIuPhYD5BUDVwFvAzYBtwt6caIeDhrs3cC+yPiJEmXAp8B3pD2c3wHeEtEPJBOd9o3mhdmhVPIyYKGOrFlOv2DwZZ9BzmxxSO6mJWTfAPiY2PY92pgQ0RsBJB0LcmZT9kBsSZr3zcAX1QylsOFwLqIeAAgIvaO4fmtQDoPj+Ra+IA4oSUZ7mtja5cDwqzM5Hua620kV1DXprfvBu4d4WGLgK1Z97ely3JuExH9QDvQDJwChKSbJd0r6W9yPYGkd0taK2lta2trPi/FxqDQs8llOyENBXdUm5WffMdi+l8k3/C/nC5aBPywSGWCpGbzR8Cb0t+vkfSSoRtFxFciYlVErGppaSlicSa3Qs8ml61pSi0tM+p50h3VZmUn307q9wIXAB0AEfEEMHeEx2wHlmTdX5wuy7lN2u/QBOwlqW38JiL2RMRB4Cbg3DzLagXWWeDZ5IY6sWUaG1yDMCs7+QZET0QcHjAn/TAf6ZTXu4GTJa2QVAdcCtw4ZJsbgbelty8Bbk1Ppb0ZOFPS1PS5XsiRfRc2jjqKWIMAOGXeDJ7YdcCD9pmVmXwD4jZJHwGmSHoZ8H3gv4d7QNqncDnJh/0jwPURsV7SlZIyZz9dDTRL2gB8ELgifex+kjmw7ya5OO/eiPjJqF6ZFUxHdx911VU01FYXZf+nzJvBgZ5+drR7eC+zcpLvV8IrSE5JfRD4c5Imn6+N9KCIuCndNnvZR7NudwOvO8Zjv0NyqquVWDGG2ch26vwZADz+dCeLZk4p2vOY2ejk9V8fEYOSfgj8MCJ8utAkk4zkWpz+B0hqEACPPt3JHz9rpK4tMxsvwzYxKfExSXuAx4DH0tnkPjrc42xiSUZyLV4NomlKLQuaGnh8V2fRnsPMRm+kPogPkJy99JyImB0Rs4HnAhdI+kDRS2dlodhNTJDUIh572gFhVk5GCoi3AJdFxFOZBemV0W8G3lrMgln56OzuK/hsckOdOn8GG1oP0O8xmczKxkgBURsRe4YuTPshivuJYWVjPGoQp86bQW//IJv2Hizq85hZ/kYKiOEmC/ZEwpNEx6HCzyY3VOZMpkef7ijq85hZ/kYKiLMldeT46QTOHI8CWmkNDAZdvQNFr0GcPG86NVVi/Q4HhFm5GPa/PiKKc2WUVYwDmZFci3iaK0B9TTUnz5vhgDArI/leSW2TVEcRR3Id6oyFjazf3u4hN8zKhAPChvXMXBDFD4jTFzayt6uXXR09I29sZkXngLBhtafTjRZjsqChzljUBMBD29uL/lxmNjIHhA3rcEAUuQ8C4LQFjUi4H8KsTDggbFgdaUA0jUNATKuvYcWcaTy0wzUIs3LggLBhZTqpm6aOz3WRZyxs4sFtDgizcuCAsGG1H+qjSjC9rvid1ADnLJ3J0x3d7Gw/NC7PZ2bH5oCwYbUf6qNxSi1VVRqX5zt36SwA7tvSNi7PZ2bH5oCwYbUf6huX/oeM0xY0UldTxX1b9o/bc5pZbg4IG9Z4B0RdTRVnLmpyDcKsDDggbFjth/rG5RqIbOcuncm67e309nvob7NSckDYsMa7BgFwztJZ9PYP8shOXw9hVkoOCBtWR9pJPZ7OWToTwP0QZiXmgLBjioiS1CAWNE1hQVMD97ofwqykHBB2TIf6BugbiHEPCEhqEfdtdQ3CrJQcEHZM7eM4zMZQ5yyZxdZ9h2jt9MiuZqVS1ICQdJGkxyRtkHRFjvX1kq5L198lafmQ9UslHZD04WKW03IrZUCcu2wmAPdvbRv35zazRNECQlI1cBXwCmAlcJmklUM2eyewPyJOAj4PfGbI+s8BPy1WGW147QdLFxCnL2yitlrc645qs5IpZg1iNbAhIjZGRC9wLbBmyDZrgGvS2zcAL5EkAEl/CjwFrC9iGW0YpaxBNNRWs3JBo89kMiuhYgbEImBr1v1t6bKc20REP9AONEuaDvw/wMeLWD4bQUc6m1wpAgLg3GWzuH9rmy+YMyuRcu2k/hjw+Yg4MNxGkt4taa2kta2treNTskmklDUIgPNPaKa7b5AHtrWV5PnNJrtiBsR2YEnW/cXpspzbSKoBmoC9wHOBz0raBLwf+Iiky4c+QUR8JSJWRcSqlpaWgr+Aya79UB8SzBiH+ahzee6K2Uhwx5N7S/L8ZpNdMQPibuBkSSsk1QGXAjcO2eZG4G3p7UuAWyPx/IhYHhHLgX8G/iEivljEsloOHYf6mF5fM25DfQ81c2odp81v5M6NDgizUihaQKR9CpcDNwOPANdHxHpJV0q6ON3sapI+hw3AB4GjToW10inFVdRDPe/EZu7ZvJ+e/oGSlsNsMipq20FE3ATcNGTZR7NudwOvG2EfHytK4WxE5RAQ55/QzNW/e4r7trRx/gnNJS2L2WRTrp3UVgb2H+xl1tS6kpZh9YrZVFeJ3zzukxDMxpsDwo5pf1cvs6aVNiCaptTynOWzuOWR3SUth9lk5ICwY9rX1UtziQMC4KWnzeOxXZ1s3Xew1EUxm1QcEJZT38AgHd39JW9iAnjZynkA/PKRXSUuidnk4oCwnNrScZhmTyttJzXAsuZpnDx3ugPCbJw5ICyn/Qd7AUreB5Fx4enzuHPjPnZ3dpe6KGaThgPCctp7IAmI2WXQxATwP89dzMBg8IN7h16Mb2bF4oCwnMqtBnFiy3RWLZvFdWu3EhGlLo7ZpOCAsJz2daU1iDIJCIDXP2cJG1u7PEeE2ThxQFhO+9OAmDm19J3UGa88cwHT6qr57p1bSl0Us0nBAWE57TvYy/T6GuprqktdlMOm1dfw2vMW8+N1O9l7wHNVmxWbA8Jy2t/VW1bNSxlvfd4yegcGufburSNvbGbHxQFhOe072Fc2HdTZTpo7gwtOaua7d25mYNCd1WbF5ICwnPZ19TC7jPofsr1x9TJ2tHfz+yf3lLooZhOaA8Jy2t9VnjUIgJecNpcZDTW+JsKsyBwQltO+rt6yuUhuqIbaal511gJ+tv5pDvb2l7o4ZhOWA8KOcqh3gEN9A2VbgwB4zTmLOdg7wM3rny51UcwmLAeEHSVzFXU5nsWUsWrZLBY2NfCTdQ4Is2JxQNhRMldRl8NQ38dSVSUuPH0+v9vQyqFez1dtVgwOCDvK3jQgmqeXb0BAMk9Ed98gv3nC05GaFYMDwo6yuyMZUnvejIYSl2R4q1fMprGhhl887HkizIrBAWFH2d2ZDGMxt7G+xCUZXm11FS9+1lxueWQX/QODpS6O2YTjgLCj7OropmlKLQ215TMO07FcePp89h/s457NHuHVrNAcEHaUXR3dzCvz2kPGC05poa66ip+7mcms4BwQdpRdHT3Mayzv/oeM6fU1XHBSM794eJcnEjIrsKIGhKSLJD0maYOkK3Ksr5d0Xbr+LknL0+Uvk3SPpAfT3y8uZjntSLs7uplb5h3U2V62cj5b9h3ksV2dpS6K2YRStICQVA1cBbwCWAlcJmnlkM3eCeyPiJOAzwOfSZfvAV4dEWcCbwO+Xaxy2pEGB4PdnT0V08QE8NKVc5Hg5+vdzGRWSMWsQawGNkTExojoBa4F1gzZZg1wTXr7BuAlkhQR90XEjnT5emCKpMr5xKpg+w720j8YFdPEBDB3RgPPWTabH9633c1MZgVUzIBYBGTP6rItXZZzm4joB9qB5iHbvBa4NyKOmkJM0rslrZW0trXVF0sVwq7MNRAVVIMAuOS8xWzc08W9W9pKXRSzCaOsO6klnU7S7PTnudZHxFciYlVErGppaRnfwk1Quzsy10BUTg0C4E/OWsCU2mpuuGdbqYtiNmEUMyC2A0uy7i9Ol+XcRlIN0ATsTe8vBn4AvDUinixiOS1LpgYxd0Zl1SCm19fwijPn8+MHdngIcLMCKWZA3A2cLGmFpDrgUuDGIdvcSNIJDXAJcGtEhKSZwE+AKyLi9iKW0YbYldYgWiosIADefP4yOnv6+cbtm0pdFLMJoWgBkfYpXA7cDDwCXB8R6yVdKenidLOrgWZJG4APAplTYS8HTgI+Kun+9Gduscpqz9jV2c3saXXU15T/VdRDnbt0Fi89bS7/ftuTtKVDlpvZ2BW1DyIiboqIUyLixIj4VLrsoxFxY3q7OyJeFxEnRcTqiNiYLv9kREyLiGdn/ewuZlktkVwDUXm1h4wPv/xUDvT08y+3bCh1UcwqXll3Utv429HWzfymyuqgzvas+Y28cfVSvn77U/zWw4CbHRcHhB0WEWze28Xy5mmlLspx+btXruTkudP5wHUP0Np51NnRZpYnB4Qd1nqgh67eAZY3Ty11UY7LlLpq/vWN59DZ3ceHvv8Ag4O+eM5sLBwQdthTrV0ALJ9T2TUISJqa/u5VK/nN46187XcbS10cs4rkgLDDNu1NAmLFBAgIgDc/dykvP30en/3ZYzywta3UxTGrOA4IO+ypPQeprRaLZk4pdVEKQhKfee1ZzJ1Rz/u+dx8HenwBndloOCDssE17ulgyeyo11RPnsJg5tY4vXHYOW/cf5J9+/lipi2NWUSbOJ4Edt017u1hR4Wcw5fKc5bN583OXcc3vN/HQ9vZSF8esYjggDEjmgXhqT9eE6KDO5cMvP5Xm6fV85AcPMuCzmszy4oAwAJ7u6Kanf3DCdFAP1TSllr9/1UrWbWvnu3dtLnVxzCqCA8KApP8BJs4ZTLm8+qwFPP/kOfx/P3vs8Ki1ZnZsDggD4KEdSdv8KfNmlLgkxSOJT6w5g77BQf7mhnWefc5sBA4IA+CezftZ1jy1Iof5Ho3lc6bxf165ktseb+Vbd7ipyWw4DggjIrhncxvnLZ1V6qKMizc/dyl/fGoLn7rpEdZtayt1cczKlgPC2LrvEHsO9HDusskREJL4p9c/m5bp9fzFd+5lX5fnjjDLxQFh3LNlHwDnTZKAAJg9rY4vvflcWjt7+Otr7/Opr2Y5OCCMezbvZ3p9zYTuoM7lrMUzuXLN6fz2iT18/hePl7o4ZmXHAWH84al9nL2kieoqlboo4+7S1Ut5w6olfPFXG/jFw7tKXRyzsuKAmOQe2dnB47sO8NLT5pW6KCXz8TWnc+aiJj543f082Xqg1MUxKxsOiEnuP+/ZRm21WPPsRaUuSsk01Fbzb286l9qaKt569R/Y0Xao1EUyKwsOiEmsf2CQH96/gxc/ay6zp9WVujgltWT2VL71Z6vpONTHZV+9k0d2dpS6SGYl54CYxH7x8C72HOjhkvOWlLooZeGMRU1c887VHOod4E+vup0v3voEXZ5DwiYxB8QkdaCnn0/8+GFOnjudF53aUurilI1zl87ipr9+Pi86tYV//PnjXPCZW7nyvx/m8V2dpS6a2birKXUBbPxFBJ/+6SPs7Ojmhvf8D2on0ARBhTBnej1ffssq7tuyn6t/9xTfvnMTX7/9Kc5e3MQLT53LBSc2c87SWdTV+H2ziU0TZcCyVatWxdq1a0tdjLI3OBh84icP843bN/FnF6zgo69eWeoilb19Xb38173b+O91O3lwWxuDAQ21VZy+sImzF8/k7CXJ72XNU5Em36nCVtkk3RMRq3KuK2ZASLoI+AJQDXwtIj49ZH098C3gPGAv8IaI2JSu+1vgncAA8FcRcfNwz+WAGF5P/wC/erSVf7nlCR7e2cE7LljO379yJVWT8NqH49F+qI+7Nu7lzo37WLetjYd2tNPdNwgkc06ctTgTGjM5e3ETLTPqHRpW1koSEJKqgceBlwHbgLuByyLi4axt/hI4KyLeI+lS4DUR8QZJK4HvAauBhcAvgVMiYuBYzzfWgIgIevoH0/KAUPo7GbNHmeXj9E8eEURAZG5Dej9ZzpD7me0GAw729nOgu5/Onn46u/vZe6CHTXsP8tD2du5+ah+dPf0snT2VD114ChefvdAfXAXQPzDI47sO8MC2NtZta+P+re08vqvz8NAdM+prWD5nGsvnTGPejHrmzKhnzvR6mqfXMXNKLbXVVdRWV1FdBQODMDAYDEYgQZWU/nD4OBiMoLpK1FZXUVdTRV118qO0tSvzF838bbP/wkcc3+n+RfIboHdgkJ6+QXoGBujtH6Snf3DI72R5b/8gvQODVFcpef6aKqbX1zC9oYZpdTXUVCf7O3y8ZpUhc3xD8lpyHus5jvvB9EF11VXU11TTUJv8rqkWkb2v9H8h532SGnRk3c+oEoff0/qaZ97b7P+RgcGgb2CQgcGgfyAYiORvNTgYSKKmSlRXJ7+r0vuSDv9NM78Hk4+bw9tWVz2zbT6yX9Ng1vvZUFud1+OHGi4gitkHsRrYEBEb00JcC6wBHs7aZg3wsfT2DcAXlbxLa4BrI6IHeErShnR/dxS6kPu6ejnvk7/Me/vM3/CoACFJlez70tEf5uT4cM/8MxSDBCe1TOdVZy/gwpXz+aOT57jPoYBqqqtYubCRlQsbuWz1UgAO9Q6wfkc767a189SeLjbt7eL+rftp7ew5XNuwylBXUwUBfYODRfsfzagS1FRVHR7RYPDwZ8ORoZfLs5fM5IfvvaDgZSpmQCwCtmbd3wY891jbRES/pHagOV1+55DHHnUll6R3A+9O7x6Q9NhxlHcOsOc4Hl+2NpFUwT49wnZZJux7MUZ+P47k9+MZZfFebAZ0+ZgfvuxYKyr6LKaI+ArwlULsS9LaY1WzJhu/F0fy+3Ekvx/PmOjvRTHbGrYD2VdgLU6X5dxGUg3QRNJZnc9jzcysiIoZEHcDJ0taIakOuBS4ccg2NwJvS29fAtwaSa/5jcClkuolrQBOBv5QxLKamdkQRWtiSvsULgduJjnN9esRsV7SlcDaiLgRuBr4dtoJvY8kREi3u56kQ7sfeO9wZzAVSEGaqiYIvxdH8vtxJL8fz5jQ78WEuVDOzMwKy+c7mplZTg4IMzPLadIHhKSLJD0maYOkK0pdnvEmaYmkX0l6WNJ6SX+dLp8t6ReSnkh/zyp1WceLpGpJ90n6cXp/haS70mPkuvSki0lB0kxJN0h6VNIjkp43yY+ND6T/Jw9J+p6khol8fEzqgEiHA7kKeAWwErgsHeZjMukHPhQRK4Hzgfem78EVwC0RcTJwS3p/svhr4JGs+58BPh8RJwH7ScYImyy+APwsIp4FnE3yvkzKY0PSIuCvgFURcQbJyTeXMoGPj0kdEGQNBxIRvUBmOJBJIyJ2RsS96e1Okg+ARSTvwzXpZtcAf1qSAo4zSYuBVwJfS+8LeDHJUDAwud6LJuAFJGcbEhG9EdHGJD02UjXAlPS6ranATibw8THZAyLXcCCTdnJmScuBc4C7gHkRsTNd9TQwr1TlGmf/DPwNkBk0qRloi4jM1HKT6RhZAbQC30ib3L4maRqT9NiIiO3APwJbSIKhHbiHCXx8TPaAsJSk6cB/Au+PiCMmZE4vXpzw50NLehWwOyLuKXVZykQNcC7wpYg4B+hiSHPSZDk2ANK+ljUkwbkQmAZcVNJCFdlkDwgP6QFIqiUJh+9GxH+li3dJWpCuXwDsLlX5xtEFwMWSNpE0N76YpA1+ZtqkAJPrGNkGbIuIu9L7N5AExmQ8NgBeCjwVEa0R0Qf8F8kxM2GPj8keEPkMBzKhpW3sVwOPRMTnslZlD4PyNuBH41228RYRfxsRiyNiOcmxcGtEvAn4FclQMDBJ3guAiHga2Crp1HTRS0hGN5h0x0ZqC3C+pKnp/03m/Ziwx8ekv5Ja0p+QtDtnhgP5VGlLNL4k/RHwW+BBnml3/whJP8T1wFKS0YRfHxH7SlLIEpD0IuDDEfEqSSeQ1ChmA/cBb07nKpnwJD2bpMO+DtgIvIPki+WkPDYkfRx4A8nZf/cB7yLpc5iQx8ekDwgzM8ttsjcxmZnZMTggzMwsJweEmZnl5IAwM7OcHBBmZpaTA8JsBOloty8fsuz9kr50jO1/LWnCTmRvk4cDwmxk3yOdDjfLpelyswnLAWE2shuAV2bG+U8HNVxIMjz82nR+gI/neqCkA1m3L5H0zfR2i6T/lHR3+nNBuvyFku5Pf+6TNKPIr83smGpG3sRscouIfZL+QDJvyI9Iag/XA/+QrqsGbpF0VkSsy3O3XyCZQ+B3kpYCNwOnAR8G3hsRt6cDKHYX/AWZ5ck1CLP8ZDczZZqXXi/pXpLhFU4nmXQqXy8FvijpfpKxjRrTQLgd+JykvwJmZg0jbTbuHBBm+fkR8BJJ55JMFLOP5Nv+SyLiLOAnQEOOx2WPZZO9vgo4PyKenf4siogDEfFpkvF9pgC3S3pWMV6MWT4cEGZ5iIgDJKN2fp2k9tBIMj9Cu6R5JM1PueySdJqkKuA1Wct/DrwvcycdFA9JJ0bEgxHxGZLRhh0QVjIOCLP8fY9kXubvRcQDJE1LjwL/QdI0lMsVwI+B35PMQpbxV8AqSeskPQy8J13+fkkPSVoH9AE/LfzLMMuPR3M1M7OcXIMwM7OcHBBmZpaTA8LMzHJyQJiZWU4OCDMzy8kBYWZmOTkgzMwsp/8ff8pbZn/1zv4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "(anthro_waist_cm 90.0\n", " dtype: float64,\n", " anthro_waist_cm 0.0\n", " dtype: float64)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.plot('target')\n", "data['target'].max(), data['target'].min()," ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yeah I don't know about that waist cm of 0 ...\n", "The below code can be used to try different values of outliers to drop, since it is not by default applied in place." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 33 Rows\n", "anthro_waist_cm: 10689 rows\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAA09ElEQVR4nO3dd3gc5bXA4d9RlyVLsoq7ZLniAsbYxhTTW4AAJrl0QklICLmQ3MAlCeEmhJDeINyEFAgJkEYvvmDiUIKpAReMu7FsS7JcVG1Vq+65f8ysWYuVtLZ3NCvteZ9nH03fs6PdPfuV+UZUFWOMMaa7BL8DMMYYE5ssQRhjjAnLEoQxxpiwLEEYY4wJyxKEMcaYsCxBGGOMCcsShDHGmLAsQRjPiIiKyCS/44iEiDSJyAS/44gGESlyX09ilI73OxH5tjt9iohUROO47vFOFJGN0TqeiS5LECYqROQ1Efm833EcLFXNVNUtvW0T7S/HgyEi14pIl5sAmkRkq4j8SUSmBLdR1XL39XRFcKw3+3pOVb1BVb8Xpfj3+9Ggqm+o6mHROLaJPksQJiaISJLfMQwg76hqJpANnAHsBZaLyOHRfqJolULMwGQJwuwjIreJyGYRaRSRdSLyqZB114rImyLycxHZ7f5yPcdd9wPgRODX7q/aX4cc9gwR2SQie0TkPhGRkOO9JSL3iEgtcKeIZIvIIyJSLSJlIvItEen1PepuN8edvtL9hTrDnb9ORJ51p+eJyDtuHDtF5NcikhJynH2/bEXkXPf1N4rIdhG5VUQygBeB0SG/3kf3EleiiNwecj6Xi0hhyHP9p3teGkXkeyIyUUTeFpEGEXk8NLaeqGqXqm5W1f8ElgB3uscvdp8jKeRcb3Gfa6t7nqYBvwOOc1/LHnfbh0TktyKySESagVPdZd/v9vpuF5EaESkVkStDlu9XkgwtpYjI6+7iD9znvLR7qUxEprnH2CMia0XkgpB1D7nvoRfc1/KuiEzs6zyZQ6Cq9rAHqgpwMTAa54fDpUAzMMpddy3QAXwBSAS+BOwAxF3/GvD5bsdT4HkgBygCqoGzQ47XCXwZSALSgUeA54ChQDHwIXBdHzE/Avy3O30/sBn4Usi6m93pOcCx7nMVA+uBr3aLdZI7vRM40Z0eBsx2p08BKiI8l18DVgOHAQIcCeSFPNdzQBYwA2gDXgEm4JQK1gHX9HDca4E3wyz/HFDpThe7z5EEZAANwGHuulHAjJ6OBTwE1APz3fdBmrvs+yHnoBO4G0gFTnbfJ8Hj7/c+6P4coee5+zkFkoES4HYgBTgNaAw59kNALTDPfW1/BR71+3MzmB9WgjD7qOoTqrpDVQOq+hiwCefDGFSmqg+oU7f9MM6XzYg+DvtjVd2jquXAv4BZIet2qOqvVLUTaAcuA76pqo2qWgr8Ariqj+MvwfmSAqcU86OQ+ZPd9ajqclX9t6p2usf+fch23XUA00UkS1V3q+qKPmII5/PAt1R1ozo+UNXakPU/VdUGVV0LrAH+qapbVLUep6Ry1AE+3w4gt4d1AeBwEUlX1Z3uc/bmOVV9y30ftPawzbdVtU1VlwAvAJccYLzhHAtk4rxn2lX1VZwfGJeHbPOMqr7nvmf+yv7vJxNlliDMPiJytYisdIv3e4DDgfyQTXYFJ1S1xZ3M7OOwu0KmW7ptvy1kOh/nF2RZyLIyYEwfx18CnCgio3BKNo8D80WkGOfX+EoAEZkiIs+LyC4RaQB+yP6vLdR/AOcCZSKyRESO6yOGcApxSjM9qQyZ3htmvq/z2t0YoK77QlVtxikN3gDsdKtnpvZxrG19rN/tHjeoDKfkeahGA9tUNdDt2KHvgd7eTybKLEEYAERkHPAAcBNOVUgOzi9bifAQBzNufOg+NTi/3MeFLCsCtvd6ANUSnC+KLwOvq2oDzpfI9ThVG8Evm98CG4DJqpqFU40R9rWp6lJVXQAMB57FSTrd4+3LNqA/68c/BbwRboWqLlbVM3FKfBtw/s/Q8+vp63UOc9tkgopwSjDgVDcNCVk3so9jhdoBFHZrd+rzPWC8YwnCBGXgfDFUA4jIZ3FKEJGqxKlDPyhutdXjwA9EZKibsG4B/hLB7ktwEtsSd/61bvPgtGs0AE3uL+gvhTuQiKS4jbjZqtrh7hNMMpVAnohkRxDTH4DvichkccwUkbwI9ouY2xA+XkR+hVOX/90w24wQkQXuF3ob0MT+r2dsJA3iYXzXPVcnAucBT7jLVwKfFpEhbqP/dd326+198i5Osv+6iCSLyCnA+cCjBxGfiQJLEAYAVV2HU+f/Ds6H+AjgrQM4xL3AReL0cPrfgwzjyzi/QLcAbwJ/A/4YwX5LcBLA6z3MA9wKXIHT6PkA8Fgvx7sKKHWrom4ArgRQ1Q3A34EtbjVcb9Uqd+MkvH/iJJkHcRrio+E4EWlyj/saTmP30aq6Osy2CTiJdgdOFdTJfJQcXwXWArtEpOYAnn8XsNs95l+BG9xzA3APTntSJU471V+77Xsn8LB7/vZrt1DVdpyEcA5OifI3wNUhxzb9LNgDxRhjjNmPlSCMMcaEZQnCxDxxxgJqCvP4nc9xvdhDXLf7GZcx0WJVTMYYY8IaNOPf5Ofna3Fxsd9hGGPMgLJ8+fIaVS0It27QJIji4mKWLVvmdxjGGDOgiEhZT+usDcIYY0xYliCMMcaEZQnCGGNMWJYgjDHGhGUJwhhjTFiWIIwxxoRlCcIYY0xYliCMMcaENWgulDMDTyCgPLp0Gw+8sYXWji4uPbqQL540kfSURL9DM8ZgJQjjo98u2cztz6wmKy2JKSOG8suXN3HzYysJBGx8MGNigSUI44sPtu3hnpc+5LyZo3j2xvk8/Ll5fPu86fxj7S5+8dJGv8MzxmAJwvhAVfnm06sZPjSVH1x4BCLOraE/N7+Yi+eM5XdLtrC5usnnKI0xliBMv3tvax3rdjbwldMnkz0ked9yEeEb50wlLSmBny+2UoQxfrMEYfrdw++Ukp2ezIJZYz62Lj8zlS+cNIEX1+xiVcWe/g/OGLOPpwlCRM4WkY0iUiIit4VZf5KIrBCRThG5KGT5LBF5R0TWisgqEbnUyzhN/9lZv5fFayu57OjCHnsrff7ECWSmJvHQ26X9G5wxZj+eJQgRSQTuA84BpgOXi8j0bpuVA9cCf+u2vAW4WlVnAGcDvxSRHK9iNf3n+Q920hVQrjimqMdtMlOTuPCo0bywaif1LR39GJ0xJpSXJYh5QImqblHVduBRYEHoBqpaqqqrgEC35R+q6iZ3egdQBYS945EZWF5aV8m0UVmMy8vodbvL5xXR1hng6fcr+ikyY0x3XiaIMcC2kPkKd9kBEZF5QAqwOcy660VkmYgsq66uPuhATf+oa25nWVkdZ04b3ue2M0Znc+TYbB59b1uf2xpjvBHTjdQiMgr4M/BZVQ10X6+q96vqXFWdW1BgBYxY9+qGKgIKZ04fGdH2n549lo2VjZRUWZdXY/zgZYLYDhSGzI91l0VERLKAF4D/UdV/Rzk244OX11UyMiuNw8dkRbT9J2Y4ieQfa3Z6GZYxpgdeJoilwGQRGS8iKcBlwMJIdnS3fwZ4RFWf9DBG00+6AspbJTWcOrVg34VxfRmZncbsohxeXLPL4+iMMeF4liBUtRO4CVgMrAceV9W1InKXiFwAICJHi0gFcDHwexFZ6+5+CXAScK2IrHQfs7yK1Xhv/c4GGts6OXZC3gHtd/bhI1m7o4Hy2haPIjPG9MTTNghVXaSqU1R1oqr+wF12h6oudKeXqupYVc1Q1Ty3Wyuq+hdVTVbVWSGPlV7Garz13tY6AI4uzj2g/c45fBQA/1xnpQhj+ltMN1KbweO9rXUU5qYzOif9gPYrzB3CpOGZLPnQeqkZ098sQRjPqSrvldYxr/jAqpeCTppcwHtb62jt6IpyZMaY3liCMJ4rqWqirrmdY8YfWPVS0ElT8mnrDPCuW01ljOkfliCM55aV7Qbg6INMEMdOyCM1KYElG62ayZj+ZAnCeG5VRT3Z6ckU5w05qP3TkhOZNz6X1zdZgjCmP1mCMJ5bvX0PR4zJjvj6h3BOnJxPSVUTVY2tUYzMGNMbSxDGU22dXWzc1cjhY7IP6TjzxjsN3Eu37o5GWMaYCFiCMJ7auKuRji5l5thDSxAzRmcxJCWR97bWRikyY0xfLEEYT62qqAfgiEMsQSQnJjBn3DDryWRMP7IEYTy1Zns9OUOSGTvswC6QC+eY8bls2NXInpb2KERmjOmLJQjjqVUV9YfcQB20rx2i1NohjOkPliCMZzq6AmyqamT66MiG9+7LkYXZpCQl8O4Wa4cwpj9YgjCe2VrTTEeXMnXk0KgcLzUpkaMKc3iv1NohjOkPliCMZzbuagTgsBHRKUGA0w6xZns9TW2dUTumMSY8SxDGMx9WNpKYIEwcnhG1Y84bn0dAYZmVIozxnCUI45kNuxoZn59BalJi1I45e1wOSQmy7/4SxhjvWIIwnvmwspHDRkSn/SFoSEoSR4zNtgRhTD+wBGE80dLeSXldC1OinCAA5o3P5YOKPXZ/CGM8ZgnCeGJTZROqcFiUejCFmjsul44uZfX2+qgf2xjzEUsQxhMbK90eTB4kiNlFOQAsL7ML5ozxkiUI44nNVU2kJCVQlHtw94DoTV5mKuPzMyxBGOMxSxDGE5urmxifl0FiwqEPsRHO7KJhvF++G1X15PjGGEsQxiObq5ujev1Dd7PH5VDT1E55XYtnz2FMvLMEYaKuvTNAeV0LEwsyPXuOOeOGAdYOYYyXPE0QInK2iGwUkRIRuS3M+pNEZIWIdIrIRd3WXSMim9zHNV7GaaKrvK6ZroAyocC7EsTk4UMZmppkCcIYD3mWIEQkEbgPOAeYDlwuItO7bVYOXAv8rdu+ucB3gGOAecB3RGSYV7Ga6CqpagbwtASRmCDMKsqxBGGMh7wsQcwDSlR1i6q2A48CC0I3UNVSVV0FBLrt+wngJVWtU9XdwEvA2R7GaqJoc3UTABM8TBDgNFR/WNlIY2uHp89jTLzyMkGMAbaFzFe4y7ze1/hsS3UzI7JSyUxN8vR55owbRkDhg212wZwxXhjQjdQicr2ILBORZdXV1X6HY1ybq5s8rV4KmlWUg4g1VBvjFS8TxHagMGR+rLssavuq6v2qOldV5xYUFBx0oCZ6VJXN1U2eNlAHZaUlc9iIoSwvtwRhjBe8TBBLgckiMl5EUoDLgIUR7rsYOEtEhrmN02e5y0yM29PSQWNrJ8V53icIgNnjnAvmAgG7YM6YaPMsQahqJ3ATzhf7euBxVV0rIneJyAUAInK0iFQAFwO/F5G17r51wPdwksxS4C53mYlxZe6Fa+P6K0EUDaOxtZMSt2HcGBM9nrYiquoiYFG3ZXeETC/FqT4Kt+8fgT96GZ+JvrJap4vruLzoj8EUTugFc14MLW5MPBvQjdQm9pTXOiWIwmH9kyCK84aQm5FiDdXGeMAShImq8roWhg9NJT0lercZ7Y2IMLtoGCusodqYqLMEYaKqrK6l36qXgmaPy2FLdTN1ze39+rzGDHaWIExUlde2UJTbPw3UQXOKnHaI960UYUxUWYIwUdPa0cWuhlZPbhLUm5ljc0hKEGuHMCbKLEGYqKnYHezi2r8JIj0lkRmjsyxBGBNlliBM1JS5PZiK+jlBgHPB3KqKejq6uo/7aIw5WJYgTNQEE8S4fq5iAueCub0dXWzY2djvz23MYGUJwkRNeV0LGSmJ5Gak9Ptzf3TBnF1wb0y0WIIwUVNe10JRXgYi0u/PPTonnVHZaSwv39Pvz23MYGUJwkRNWW2zL9VLQbPHDWOFNVQbEzWWIExUBALKtt17+70HU6jZRcPYvmcvu+pbfYvBmMHEEoSJil0NrbR3Bij0sQQRbIewYTeMiQ5LECYqyuv8uQYi1PRRWaQmJdj1EMZEiSUIExXl+7q49u8wG6FSkhI4cmyOlSCMiRJLECYqyuqaSUwQRuek+RrHUeNyWLO9ntaOLl/jMGYwsARhoqKstoUxOekkJfr7lppTNIyOLmXN9npf4zBmMLAEYaJimw/DfIczO+QOc8aYQ2MJwkRFWV1Lv4/iGk5+ZirFeUNYWmpXVBtzqCxBmENWv7eDPS0dMVGCADhuYj7vbqmj0wbuM+aQWIIwhyzYgykWShAA8yfl0djWySprhzDmkFiCMIcseA1Ef99JrifHT8wH4O2SGp8jMWZgswRhDllZXTPgz30gwsnNSGH6qCzeKqn1OxRjBjRLEOaQlde2kJ+ZQmZqkt+h7DN/Uh7Ly3azt92uhzDmYHmaIETkbBHZKCIlInJbmPWpIvKYu/5dESl2lyeLyMMislpE1ovIN72M0xya8roWX8dgCmf+pHzauwIss/tDGHPQPEsQIpII3AecA0wHLheR6d02uw7YraqTgHuAn7jLLwZSVfUIYA7wxWDyMLGnrLbF12G+w5k3PpfkROFNa4cw5qB5WYKYB5So6hZVbQceBRZ022YB8LA7/SRwujh3m1EgQ0SSgHSgHWjwMFZzkNo7A+ys30tRXmw0UAcNSUniqKJhvG3tEMYcNC8TxBhgW8h8hbss7Daq2gnUA3k4yaIZ2AmUAz9XVasriEEVu1sIqD/3oe7L/In5rNlRz56Wdr9DMWZAitVG6nlAFzAaGA/8t4hM6L6RiFwvIstEZFl1dXV/x2gI6eIaIz2YQp0wOQ9VeGezlSKMORheJojtQGHI/Fh3Wdht3OqkbKAWuAL4h6p2qGoV8BYwt/sTqOr9qjpXVecWFBR48BJMX/bdByIGSxAzx+aQkZJo7RDGHCQvE8RSYLKIjBeRFOAyYGG3bRYC17jTFwGvqqriVCudBiAiGcCxwAYPYzUHqay2hfTkRAqGpvodysckJyZw3MR8XttYjfO2MsYcCM8ShNumcBOwGFgPPK6qa0XkLhG5wN3sQSBPREqAW4BgV9j7gEwRWYuTaP6kqqu8itUcvLJaZ5A+p29B7Dlj2nC279nLxspGv0MxZsDx9MomVV0ELOq27I6Q6VacLq3d92sKt9zEnm0xeA1EqNOmDgfg5XWVTB2Z5XM0xgwssdpIbQYAVaU8Ru4D0ZPhWWkcWZjDy+ur/A7FmAHHEoQ5aNWNbezt6IrpBAFwxtThrNy2h6rGVr9DMWZAiShBiMjTIvJJEbGEYvYpq4utYb57cvq0EQD8a4OVIow5EJF+4f8Gp+vpJhH5sYgc5mFMZoCItftA9GTaqKGMyUnnpXWWIIw5EBElCFV9WVWvBGYDpcDLIvK2iHxWRJK9DNDErrK6FhIExg6L7QQhIpw+bThvllTT2mGjuxoTqYirjEQkD7gW+DzwPnAvTsJ4yZPITMwrr21mVHY6KUmxX/N4+rQRtHYEeMsumjMmYpG2QTwDvAEMAc5X1QtU9TFV/TKQ6WWAJnaVxXgPplDHTsglIyWRl9dX+h2KMQNGpD/9HlDV6ar6I1XdCc69HABU9WNDYJj4sK2uJebbH4JSkxI5+bACXlpXRVfArqo2JhKRJojvh1n2TjQDMQNLU1snNU3tMTlIX0/OOXwUNU1tvLfVBgY2JhK9XkktIiNxhuROF5GjgOB4Clk41U0mTgV7MI3Lja37QPTmtKnDSU1K4MU1OzluYp7f4RgT8/oaauMTOA3TY4G7Q5Y3Ard7FJMZAMrrmgEGTBsEQEZqEqceNpwX1+ziO+fPIDEhNsePMiZW9JogVPVh4GER+Q9VfaqfYjIDQHCY71gehymcc2eO4h9rd7GstI5jJlgpwpje9FXF9BlV/QtQLCK3dF+vqneH2c3EgbLaFnKGJJOdPrAugzndrWZatHqnJQhj+tBXI3WwgjkTGBrmYeJUeV1LTN4kqC8ZqUmcclgBL67ZRcB6MxnTq76qmH7v/v1u/4RjBoqy2haOLMzxO4yDcu4Ro1i8tpJlZbuZNz7X73CMiVmRXij3UxHJEpFkEXlFRKpF5DNeB2diU2dXgO179lKUm+53KAfl9GkjSHGrmYwxPYv0OoizVLUBOA9nLKZJwNe8CsrEth17WukK6IDq4hoqMzWJUw8rYNHqnXbRnDG9iDRBBKuiPgk8oar1HsVjBoAyt4vrQLpIrrvzjxxNVWMb726t9TsUY2JWpAnieRHZAMwBXhGRAsDuvhKnyoIXyQ3gBHH61BFkpCSycOUOv0MxJmZFOtz3bcDxwFxV7QCagQVeBmZi17a6FlKSEhgxNM3vUA5aekoiZ80YyaLVO2nrtCHAjQnnQMZpngpcKiJXAxcBZ3kTkol1ZbUtFA5LJ2GAX4l8wazRNLR28vqHNgS4MeH0NdQGACLyZ2AisBII/txS4BFvwjKxzBnme2A2UIc6YVI+uRkpPLdyO2dOH+F3OMbEnIgSBDAXmK6q1uUjzqkq5bXNHDMIrh9ITkzg3CNG8uTyCprbOslIjfTjYEx8iLSKaQ0w0stAzMBQ19xOc3vXgLkPRF8WzBpDa0eAl9bZjYSM6S7SBJEPrBORxSKyMPjoaycROVtENopIiYjcFmZ9qog85q5/V0SKQ9bNFJF3RGStiKwWkYHbIjqIlNUN/B5MoeYUDWN0dhrPrdzudyjGxJxIy9R3HuiBRSQRuA84E6gAlorIQlVdF7LZdcBuVZ0kIpcBP8FpCE8C/gJcpaofuPfD7jjQGEz0lQ+CLq6hEhKEC2aN4YE3tlDV2MrwAdwzy5hoi7Sb6xKcK6iT3emlwIo+dpsHlKjqFlVtBx7l411jFwAPu9NPAqeLiOD0kFqlqh+4z1+rqtYXMQYEr4EYO2xwJAiAi+aMpSugPPu+lSKMCRXpWExfwPkC/727aAzwbB+7jQG2hcxXuMvCbqOqnUA9kAdMAdSt0lohIl+PJE7jvfK6FkZmpZGWnOh3KFEzaXgmRxXl8MSyCqwfhjEfibQN4kZgPtAAoKqbgOFeBYVT9XUCcKX791Micnr3jUTkehFZJiLLqqurPQzHBJXXNQ/oITZ6csncQjZVNfFBhY0iY0xQpAmiza0mAsBtI+jrp9Z2oDBkfqy7LOw27jGzgVqc0sbrqlqjqi3AImB29ydQ1ftVda6qzi0oKIjwpZhDUVY7MO8D0ZfzZo4iLTmBJ5Zt63tjY+JEpAliiYjcDqSLyJnAE8D/9bHPUmCyiIwXkRTgMqB7z6eFwDXu9EXAq+61FouBI0RkiJs4TgbWYXy1t72Lqsa2QdNAHWpoWjLnHD6KhR/soLXDmruMgcgTxG1ANbAa+CLOL/pv9baD26ZwE86X/XrgcVVdKyJ3icgF7mYPAnkiUgLc4j4PqrobuBsnyawEVqjqCwfwuowHBup9qCN18ZyxNLZ2snjtLr9DMSYmRNTNVVUDIvIs8KyqRlzZr6qLcJJJ6LI7QqZbgYt72PcvOF1dTYzYWuMM8z0+f+APsxHOsRPyGDssnSeWVbBgVvf+FMbEn15LEOK4U0RqgI3ARvducnf0tp8ZnEprnQRRPEgTREKCcMncQt4sqdmXDI2JZ31VMd2M03vpaFXNVdVc4Bhgvojc7Hl0JqaU1jSTl5FCVlqy36F45rJ5hSQnCo+8U+p3KMb4rq8EcRVwuapuDS5Q1S3AZ4CrvQzMxJ6tNc2DtvQQNHxoGuceMYonlzkD+BkTz/pKEMmq+rHB8t12iMH7M9KEVVrbTPEgGOa7L9ccX0xjWydPr6jwOxRjfNVXgmg/yHVmkGlp76SyoY3x+YOzB1OoowpzOHJsNg+9XWpXVpu41leCOFJEGsI8GoEj+iNAExtKa5wuroO9iglARLjm+GI2VzfzZondbc7Er14ThKomqmpWmMdQVbUqpjiyrwdTHFQxAXxy5ijyM1N4+O1Sv0MxxjcHck9qE8eC3T7joQQBkJqUyBXzinhlQxVbqpv8DscYX1iCMBEprWmmYGgqmXF0W86rjy8mJTGB+1/f4ncoxvjCEoSJSGltM+PjpHopKD8zlUuPLuSpFRXsqm/1Oxxj+p0lCBORrTUtFMdBD6buvnDiBAIKD75ppQgTfyxBmD41tnZQ09QWN+0PoQpzh3D+zFH87d1y9rRYz24TXyxBmD4FbzMab1VMQTecMpHm9i7+/E6Z36EY068sQZg+xVsPpu6mjszitKnD+dPbpbS02/AbJn5YgjB9Kq2Jr2sgwrnx1EnUNbdbKcLEFUsQpk9ba5sZmZVGekqi36H4Zs64YZw8pYDfLdlMkw3iZ+KEJQjTp9Ka5rjswdTdzWdOYXdLh11dbeKGJQjTp9LaFsbnZ/odhu9mFeZw+tTh3P/6FhpbO/wOxxjPWYIwvarf20Fdc3tcjOIaiZvPnEL93g7+9Fap36EY4zlLEKZXW62Bej+Hj8nmrOkjeOCNLdS3WCnCDG6WIEyvSqqcgeomDbcqpqBbzppCU1snv1lS4ncoxnjKEoTpVUlVEymJCRTlWhVT0NSRWXxq1hgeequUnfV7/Q7HGM9YgjC9Kqlqojh/CEmJ9lYJdfOZU1CFX760ye9QjPGMfepNr0qqGpk8fKjfYcScwtwhXHXcOJ5Yvo1NlY1+h2OMJyxBmB61dnRRXtfCRGt/COvGUyeRkZLETxdv9DsUYzzhaYIQkbNFZKOIlIjIbWHWp4rIY+76d0WkuNv6IhFpEpFbvYzThLe1ppmAWgN1T3IzUvjiyRN4aV0ly0rr/A7HmKjzLEGISCJwH3AOMB24XESmd9vsOmC3qk4C7gF+0m393cCLXsVoehfswTTZEkSPPnfCeAqGpvLjFzegqn6HY0xUeVmCmAeUqOoWVW0HHgUWdNtmAfCwO/0kcLqICICIXAhsBdZ6GKPpRUlVEwkC4+N0FNdIDElJ4qtnTGZZ2W5eXl/ldzjGRJWXCWIMsC1kvsJdFnYbVe0E6oE8EckEvgF8t7cnEJHrRWSZiCyrrq6OWuDGUVLVRGHuENKS43eQvkhcMreQCfkZ/OjF9bR3BvwOx5ioidVG6juBe1S1qbeNVPV+VZ2rqnMLCgr6J7I4UlLVZNVLEUhOTOBb501jS3UzD7291e9wjIkaLxPEdqAwZH6suyzsNiKSBGQDtcAxwE9FpBT4KnC7iNzkYaymm86uAFtqmqwHU4ROmzqC06YO596XN1HV0Op3OMZEhZcJYikwWUTGi0gKcBmwsNs2C4Fr3OmLgFfVcaKqFqtqMfBL4Ieq+msPYzXdlNe10NGlTCqwBBGpO86bTkeX8uMXN/gdijFR4VmCcNsUbgIWA+uBx1V1rYjcJSIXuJs9iNPmUALcAnysK6zxx6ZgD6YRdpFcpIrzM/j8ieN5+v3tLC+zbq9m4Evy8uCqughY1G3ZHSHTrcDFfRzjTk+CM70KdnGdWGA9mA7EjadO4ukV2/nWs2tZeNN8km2IEjOA2bvXhLW5qomRWWkMTUv2O5QBJSM1ibsWzGD9zgbu+5eN9moGNksQJqxNVU1MHmHtDwfjrBkjuXDWaH79aglrttf7HY4xB80ShPmYQEDZXN3ERGugPmjfOX8G+Zmp3Pi3FXZ7UjNgWYIwH7Ojfi8t7V1WgjgEwzJS+NUVR1Gxey/feGoVgYANw2EGHksQ5mOCPZisi+uhObo4l2+cfRiLVu/inpc/9DscYw6Yp72YzMC0Yadzf4OpI7N8jmTg+8KJE9hc1cyvXi1heFYaVx07zu+QjImYJQjzMRt2NTA6O43sIdaD6VCJCN//1OHUNLXx7WfXkCBw5TGWJMzAYFVM5mM27Gxk6igrPURLcmICv/nMbE6fOpz/eWYNv3plkw0NbgYESxBmP22dXWyubmLqSLuCOppSkxL57Wfm8OmjxvCLlz7k28+tocsark2Msyoms5+SqiY6A8o0K0FEXUpSAr+45EiGZ6XxuyWbqWpo438vP8qGUzcxy0oQZj/BBuppo6wE4QUR4bZzpnLn+dN5aX0lV/7hXfa0tPsdljFhWYIw+9mwq4GUpASK82wMJi9dO388910xm9UV9Vz9x/fsYjoTkyxBmP1s2NXIlBGZJNkgc54794hR/O6q2azb0cB1Dy2jtaPL75CM2Y99C5h9VJU12+uZbu0P/ea0qSO4+9JZvFdax+3PrLbeTSamWCO12WdHfSu7Wzo4Yky236HElQuOHM2W6iZ++fImZozO5roTxvsdkjGAlSBMiNUVzsijMyxB9LuvnDaZM6aN4CcvbmDtDhsB1sQGSxBmnzXb60lMEKti8kFCgvDTi2aSMySZ/3p0JXvbrT3C+M8ShNlnzY56Jg/PtH75PsnNSOEXlxxJSVUTP1y03u9wjLEEYRzBBurDrXrJVydOLuDzJ4znz/8u45X1lX6HY+KcJQgDwK6GVmqa2q2BOgZ87ezDmDYqi288tYrapja/wzFxzBKEAT5qoLYShP9SkxK5+5IjadjbybeeXWNdX41vLEEYAFaU7yE5UZgx2hqoY8G0UVncfOYUXlyzi+dW7vA7HBOnLEEYAFaU72b66GxroI4h1580gTnjhvHt59aws36v3+GYOGQJwtDRFWBVxR5mF+X4HYoJkZgg/OLiI+nsUr7+5CqrajL9ztMEISJni8hGESkRkdvCrE8Vkcfc9e+KSLG7/EwRWS4iq92/p3kZZ7xbv7OB1o4As4uG+R2K6aY4P4P/+eQ03thUw1/+XeZ3OCbOeJYgRCQRuA84B5gOXC4i07ttdh2wW1UnAfcAP3GX1wDnq+oRwDXAn72K08CKst0AzB5nCSIWXXlMESdNKeCHizawqbLR73BMHPGyBDEPKFHVLaraDjwKLOi2zQLgYXf6SeB0ERFVfV9Vgy1za4F0EUn1MNa4tqJ8DyOyUhmdneZ3KCYMEeFnF80kIzWRL/55OQ02NLjpJ14miDHAtpD5CndZ2G1UtROoB/K6bfMfwApV/ViHcBG5XkSWiciy6urqqAUeT1SV5WW7mTNuGCLidzimByOy0rjvitmU17Xw5b+9T0dXwO+QTByI6UZqEZmBU+30xXDrVfV+VZ2rqnMLCgr6N7hBoryuhe179nLchO552cSaYybk8f0LD2fJh9Xc+sQHBOye1sZjXg73vR0oDJkf6y4Lt02FiCQB2UAtgIiMBZ4BrlbVzR7GGdfe3lwLwHET832OxETisnlF1Da387PFG+kMKHdfciSpSdY12XjDywSxFJgsIuNxEsFlwBXdtlmI0wj9DnAR8KqqqojkAC8At6nqWx7GGPfe3lzL8KGpTCywW4wOFDeeOomkBOFHL26guqGN/738KEZa+5HxgGdVTG6bwk3AYmA98LiqrhWRu0TkAnezB4E8ESkBbgGCXWFvAiYBd4jISvcx3KtY45Wq8s7mGo6fmGftDwPMF0+eyL2XzWLNjnrO/d83eG7ldrtOwkSdDJY31dy5c3XZsmV+hzGgbNzVyCd++To/vWgml8wt7HsHE3NKqpq45fGVrKqoZ974XG4+YwrHTbT2JBM5EVmuqnPDrYvpRmrjrTc2OT2/jrcvlAFr0vBMnvnP+XzvwsPZWtPM5Q/8m0t//w5vb66xEoU5ZJYg4tjL6yuZOnIoY4cN8TsUcwgSE4Srjh3HG18/le+cP52tNc1c8cC7XPr7f/PmJksU5uBZgohTe1raWVq6mzOmjfA7FBMlacmJfHb+eF7/+ql894IZlNe18JkH3+Xi372zbzh3Yw6EJYg49drGaroCyhnTLUEMNmnJiVxzfDFLvn4K37vwcEprm7ngvje5/ZnV7G5u9zs8M4BYgohTL62vpGBoKjPtBkGDVmpSIlcdO45Xbz2Fzx4/nseWbuO0X7zGP9bs9Ds0M0BYgohDe9u7eG1DFWdMG05CgnVvHeyy0pK54/zpLPrKiRTmDuGGv6zg1ic+oNHGdDJ9sAQRh/65bhfN7V0smNV9aCwzmB02cihPfel4vnLaJJ5eUcE5977B8rI6v8MyMcwSRBx6asV2xuSkM6841+9QTD9LTkzglrMO44kbjidBhIt/9w73vPQhnTb4nwnDEkScqWxo5c1N1Xx69hirXopjc8YN44WvnMCFs8Zw7yubuPT+f7OtrsXvsEyMsQQRZ55cXkFA4VNHWfVSvBualszdl87i3stm8eGuRs699w2eXF5h102YfSxBxJH2zgAPv13KiZPzmVCQ6Xc4JkYsmDWGRf91IlNHDeXWJz7g8gf+TUmV3bnOWIKIK8+v2kFVYxvXnTDe71BMjCnMHcJj1x/Hjz59BOt3NnLOvW/woxfXU7/XejrFM0sQcSIQUB54YyuTh2dy8hS7uZL5uIQE4fJ5Rbzy3yezYNYY7n99Cyf/7F/84Y0ttHV2+R2e8YEliDjxf6t2sH5nAzecPNGG9ja9ys9M5ecXH8kLXz6RmWNz+P4L6znt50t4ekWF3cUuzliCiANtnV38bPFGpo/K4kJrnDYRmj46i0c+N4+/fv4YhmUkc8vjH/DJX73Jkg+rrSE7TliCiAN/eGMrFbv3cvu500i0rq3mAM2flM/CG0/g3stm0dTWwTV/fI/PPPiuDQAYByxBDHLrdzZw78ubOOfwkZww2e47bQ5OQoKwYNYYXr7lZO44bzrrdjRw/q/f5Ct/f5/yWrt+YrCyO8oNYnvbu/jUb96ipqmdxV89kbzMVL9DMoNEQ2sHv1+ymQff3EpXQLnymHF8+bRJ9h4bgOyOcnEoEFBufmwlGysb+dnFM+2Da6IqKy2Zr31iKq/deir/MXssj7xTyok//Rd3/d86tu/Z63d4JkqsBDEIBQLKt59bw1/fLefb50236x6M5zZVNvKb1zaz8IMdCHDOEaP45BGjOGlKPkNSkvwOz/SitxKEJYhBprWji68/uYqFH+zghpMn8o2zD7NurabfVOxu4Y9vlvLUigrq93aQmpTA8RPzOKpoGDPHZnPEmGwrzcYYSxBxYt2Ohn3VSt84eypfOmWi3yGZONXRFWDp1jr+ua6S1zdVs6W6ed+6/MwUJg3PZMqIoUwenslk968lDn/0liCs7DcIbKtr4bdLNvPoe+XkZqTw8Ofm2dXSxlfJiQkcPymf4yc5PecaWztYs72BtTvq2VTZxKaqRp5ZsZ3Gts59++RlOIlj8ohMRmWnMyIrjeFDU/f9zRmSbKXhfmYJYoCqbmzj5fWV/HPtLpZ8WE2CCFcfV8xXz5hMzpAUv8MzZj9D05I5bmIex03M27dMValsaOPDykY2VTWxyf37/Kqd7Gn5+BhQKYkJFAxNZXhWKvmZqeRnppCfmUpeRgr5Q1PJy0ilYGgKeRmpZKcn23D2UeBpghCRs4F7gUTgD6r6427rU4FHgDlALXCpqpa6674JXAd0AV9R1cVexhqrugJKZUMr2+pa+LCykTXbG1i9vZ71uxpQhcLcdL548kSuPm4co7LT/Q7XmIiJCCOz0xiZncZJ3Uq8rR1dVDW0UdnY6vxtaKWysZXqhjaqGtvYVtfC++V7qGtuI9zoH0kJQm6Gm0AyUyhw/zrzHyWX/MxUhmUkk5qU2E+vemDxLEGISCJwH3AmUAEsFZGFqrouZLPrgN2qOklELgN+AlwqItOBy4AZwGjgZRGZoqpRHzFMVWlp70JD5oGQ+eBEyD7uTHBdT/t2BZT2zgDtXQHaOwN0uH9DlwX/NrZ2Utfczu6Wdna3dFDT2Mb2PXvZsWcvnSGfgGFDkjl8TDZfPX0KZ80YwdSRQ63YbQadtOREivKGUJQ3pNftugLKnpZ2apraqW1qo7qpjdqmdmpC/tY0t7OlupmapjbaOsPfOS81KYGs9GSy0pLISk9maNpH0+nJiaQlJ5CWlEhqcgJpyYmkJgX/ustC/qYlJ5CSlICqE19nQAmo0hX46KFAgkCCiPNIgEQRRJzEKTh/EwQEZ3lXQOlSRVXpCjjzAXUe6cmJTB4xNOr/By9LEPOAElXdAiAijwILgNAEsQC4051+Evi1ON92C4BHVbUN2CoiJe7x3ol2kHXN7cz5/svRPuxBy05PJjcjhdyMFGYV5nDezFGMHTaEscPSmVCQwZicdEsIxrgSE4Q8t1QAvX9BqirN7V3UNrU5icNNILub22ls7aShtYOGvc7f+pZ2KupaqN/bwd6OLlo7usKWVGLFrMIcnr1xftSP62WCGANsC5mvAI7paRtV7RSReiDPXf7vbvt+bJQ5EbkeuN6dbRKRjdEJPerygRq/g+hFLMcXy7FBbMcXy7GBxXco9outDJCbDvpY43paMaAbqVX1fuB+v+Poi4gs66kbWSyI5fhiOTaI7fhiOTaw+A5Ff8Xm5VAb24HCkPmx7rKw24hIEpCN01gdyb7GGGM85GWCWApMFpHxIpKC0+i8sNs2C4Fr3OmLgFfVaeldCFwmIqkiMh6YDLznYazGGGO68ayKyW1TuAlYjNPN9Y+qulZE7gKWqepC4EHgz24jdB1OEsHd7nGcBu1O4EYvejD1o1ivBovl+GI5Nojt+GI5NrD4DkW/xDZohtowxhgTXTbctzHGmLAsQRhjjAnLEkSUicgfRaRKRNaELLtTRLaLyEr3ca5PsRWKyL9EZJ2IrBWR/3KX54rISyKyyf07LMbi8/38iUiaiLwnIh+4sX3XXT5eRN4VkRIRecztkNHveonvIRHZGnLuZvkRnxtLooi8LyLPu/Mxce56iS+Wzl2piKx241jmLvP8c2sJIvoeAs4Os/weVZ3lPhb1c0xBncB/q+p04FjgRndYk9uAV1R1MvCKOx9L8YH/568NOE1VjwRmAWeLyLE4w8Pco6qTgN04w8f4oaf4AL4Wcu5W+hQfwH8B60PmY+XcBXWPD2Ln3AGc6sYRvP7B88+tJYgoU9XXcXpkxRxV3amqK9zpRpwPwxicoU0edjd7GLgwxuLznTqa3Nlk96HAaTjDxIC/566n+GKCiIwFPgn8wZ0XYuTcufHsF98A4fnn1hJE/7lJRFa5VVC+VOGEEpFi4CjgXWCEqu50V+0CRvgVV1C3+CAGzp9bBbESqAJeAjYDe1Q1eFODsEPC9Jfu8alq8Nz9wD1394gzgrIffgl8HQiOlpdHDJ07Ph5fUCycO3CS/T9FZLk7xBD0w+fWEkT/+C0wEafovxP4hZ/BiEgm8BTwVVVtCF3nXqjo6y/PMPHFxPlT1S5VnYVzZf88YKofcfSke3wicjjwTZw4jwZygW/0d1wich5QparL+/u5I9FLfL6fuxAnqOps4BycqteTQld69bm1BNEPVLXS/fAGgAdwvlx8ISLJOF++f1XVp93FlSIyyl0/CucXaMzEF0vnz41nD/Av4Dggxx0mBmJkSJiQ+M52q+3UHRn5T/hz7uYDF4hIKfAoTtXSvcTOuftYfCLylxg5dwCo6nb3bxXwjBuL559bSxD9IPhPdH0KWNPTth7HIThXr69X1btDVoUOeXIN8Fx/xwY9xxcL509ECkQkx51Ox7nPyXqcL+KL3M38PHfh4tsQ8gUiOHXU/X7uVPWbqjpWVYtxRkt4VVWvJEbOXQ/xfSYWzp37/BkiMjQ4DZzlxuL553ZAj+Yai0Tk78ApQL6IVADfAU5xu8gpUAp80afw5gNXAavdumqA24EfA4+LyHU4Iwdf4k94PcZ3eQycv1HAw+LcCCsBeFxVnxeRdcCjIvJ94H2cBOeHnuJ7VUQKAAFWAjf4FF843yA2zl1P/hoj524E8IyTp0gC/qaq/xCRpXj8ubWhNowxxoRlVUzGGGPCsgRhjDEmLEsQxhhjwrIEYYwxJixLEMYYY8KyBGFMH8QZYfYT3ZZ9VUR+28P2r4lITN7s3pgDYQnCmL79Hfd2uCEuc5cbM2hZgjCmb08Cnwzer8AdSHA0zgV8y0Lvv9CdiDSFTF8kIg+50wUi8pSILHUf893lJ4fcf+D94BW0xvjBrqQ2pg+qWici7+EMlPYcTunhceCH7rpE4BURmamqqyI87L0490J4U0SKgMXANOBW4EZVfcsdtLA16i/ImAhZCcKYyIRWMwWrly4RkRU4w0TMAKb3sG84ZwC/docUWQhkuQnhLeBuEfkKkBMyHLYx/c4ShDGReQ44XURmA0Nwbgp1K3C6qs4EXgDSwuwXOpZN6PoE4NiQu5WNUdUmVf0x8HkgHXhLRGJqSHETXyxBGBMB925t/wL+iFN6yAKagXoRGYFT/RROpYhME5EEnJFog/4JfDk44w5GiIhMVNXVqvoTYCkxds8JE18sQRgTub8DRwJ/V9UPcKqWNgB/w6kaCuc24HngbZybHQV9BZjr3q1sHR+NFPpVEVkjIquADuDF6L8MYyJjo7kaY4wJy0oQxhhjwrIEYYwxJixLEMYYY8KyBGGMMSYsSxDGGGPCsgRhjDEmLEsQxhhjwvp/TtAOAOmc9U4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data.filter_outliers_by_std(scope='target', n_std=5).plot('target')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5 std seems okay, so let's actually apply it." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 33 Rows\n" ] } ], "source": [ "data.filter_outliers_by_std(scope='target', n_std=5, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the distribution of skew values for the dti data" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dmri_rsi.nd_fiber.at_fmaj -2.420074\n", "dmri_rsi.nds2_fiber.at_fmaj -2.287048\n", "dmri_rsi.nd_fiber.at_cst.lh -2.213900\n", "dmri_rsi.nt_fiber.at_cst.lh -2.198591\n", "dmri_rsi.nds2_fiber.at_fmin -2.101969\n", " ... \n", "dmri_rsi.nts2_fiber.at_cst.rh 0.992159\n", "dmri_rsi.n0s2_fiber.at_fmin 1.005677\n", "dmri_rsi.nts2_fiber.at_fmin 1.006378\n", "dmri_rsi.n0s2_fiber.at_cst.lh 1.072836\n", "dmri_rsi.nts2_fiber.at_cst.lh 1.090904\n", "Length: 294, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['data'].skew().sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks okay, let's choose the variable with the most extreme skew to plot." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dmri_rsi.nd_fiber.at_fmaj: 10689 rows\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEWCAYAAABhffzLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAs5ElEQVR4nO3deXxcd33v/9dnRpKt1ZIteZMtO17i2IYkDiakzVKWBGj2FGgTCCUtNLf8gFBKuaVc2tL+oDftpRRa4JZA2WkCCRACJGyBkNgkIU7sJI7t2PK+S7Ita7G1zXzuH+fImSiSNZZm5szyfj4e89DMmTPf8/meI330ne/5nu8xd0dEREpHLOoAREQkt5T4RURKjBK/iEiJUeIXESkxSvwiIiVGiV9EpMQo8ecpM/uqmX08g+V9xMy+lKnywjIfMLN3ZKish8zsXWms93Ez6zCzQ2bWYmY9ZhY/kzKiZGbLzGyDmXWb2W0ZLvttZvazSZbxnJm9OhvxmJmb2ZJMlB2W12NmizJVXikpizoAyQ13/6cslPn7mS7zdMysBfggsMDd28LFNbmMYTRm9lVgn7t/NI3V/yfwK3c/P9NxuPu3gG+N9p6ZLQR2Ar3hol7gCeAz7v7zlDJWjredlLLK3X1oIvGcKTN7CPimu59qvLh75Me+UKnFXwLMbEL/4Cf6uSxqAY6kJP2ssEC2/jYWAM9lqex01IcJ8zzg58D3zeyWTG8kD393JJW765EHD2AV8BTQDXwbuAv4ePjeq4F9BK3FNuAgcD1wJbAVOAp8JKWsjwH3AN8EuoB3hcu+OU4MCwEH3gnsAR4GpoblHAE6CVqJs8L1HwLeNUZZXwU+B/w4rNPjwOKU968AtgDHgc8Cvx6rrHD9y4GTQBLoCcsfjrcsJZ7/Dfw2rPcPgOkpZVwE/Casx9PAq1Peewj4BLA23M6SEdu/GzgUxvswsDJcfiswCAyEcf3wNHX4JZAA+sJ1zw7r8XnggXDZWmA28GngWLiPVqWU8WFge7hPNwE3pLx3C7BmnGNbNmL5XwGHgVj4ehdwefj8QmBduC8PA58Kl+8Jy+oJH78Tbnst8G/h78rHR8YTfuY2YAfQAfyflO1+jJTfz9R4w+OSut8+m1LekvD5NODrQDuwG/hoStm3AGuAT4b7dCfw+1H/zUeab6IOQA8HqAh/WT8AlANvDpNJauIfAv4ufP/Pwl/w/wZqgZUEyeqscP2PhZ+/nuBbXeXIP6wx4hj+Y/s6UB1+7n8APwSqgDjwCqAuXP8hTp/4j4TJo4zgK/9d4XuNBInrzWF9PhDWb8zEn7If9o0Sb2ri3w+8LIz/u8N1BprDeK4M98kV4eumlM/uCfdlGUE3Ruq2/zTc11MIkvKGEXX9eJrH+kX7LPxsR7hfpxL8c9gJ/HG4vz9O0DU0vP5bgLlhHf6IoMtmTvjeLZx54l8ULl8evt7FC4n/UeDt4fMa4KKxygq3PQS8L9x/lSPjCT/zK2A6wbe3rcP7gtMk/rF+13hx4v86wT/62vCzW4F3psQ2SPB3EwfeDRwALOq//age6urJDxcRJMBPu/ugu99D0LJONQh8wt0HCb4NNBL0z3a7+3MErb/zUtZ/1N3vdfeku588w3g+5u694ecGgRkEf2AJd3/S3bvSLOf77v5bD/qBvwWcHy6/EnjO3e8J6/NpgtZ0JnzD3Te6ey/wt8Afhid/bwbud/f7w33yc4LW7JUpn/2quz/n7kNhXKe4+5fDfd1PkKTOM7NpGYr5++F+7QO+D/S5+9fdPUHw7W9VShx3u/uBsA7fBrYR/HOdqAPhz+mjvDcILDGzRnfvcffHxivL3f8j3H9j/c79s7sfdfc9BMf9pomF/YLw+N4I/E14jHYB/wq8PWW13e7+xXCffg2YA8ya7LYLlRJ/fpgL7PeweRLaPWKdI+EvLQStewi+fpOyLPVk195JxJP62W8APwXuMrMDZvYvZlaeZjmpyfxESnxzU7cR1nsy8aZKLWc3wT/URoK+9beYWefwA7iEIAGM9tlTzCxuZreb2XYz6yJoFROWmwkjj+OYx9XM/jgcFTRch5dNMo7m8OfRUd57J0F31BYze8LMrh6nrHSO4cjjMzeNz4ynkeA4p/7N7OaFukHK76K7nwifluzJYSX+/HAQaDYzS1nWMskyJzPt6qnPht9A/sHdVwC/C1xN0A0xGQeB+cMvwnrPH3v1M5JaTgtBq7WDIOF8w93rUx7V7n57yvpj7bO3AtcRnGeYRtCVADB8vHIyxa2ZLQC+CLwXmOHu9cDGlDgm4gaC80bPj3zD3be5+03ATOCfgXvMrJqx65vOfhh5fIa/cfQSdCcOm30GZXcQHOcFI8ren0Y8JUmJPz88StA/epuZlZvZHzC5r+8ZY2avMbOXh1+nuwj+wJKTLPbHwEoz+4Nw9MdtvPQPfaJuNrMVZlYF/CNwT/hN6ZvANWb2hrAFP9XMXm1m89IosxboJzgnUAWMHBp7mKCvPNuGk247gJn9CUGL/4yZ2Swzey/w9wRdJC85pmZ2s5k1he91houT4faTTKzOHzKzBjObD7yfoCsLYANwWXhtxjTgb0Z8bsx9HB7f7wCfMLPa8B/kXxIccxmFEn8ecPcB4A8ITkIdJThp970oY0oxm2CEUBewmWD0zTdGrmRml5pZTzoFunsHwUnK2wmS6VKCESGZ8A2CE6aHCE6W3hZucy9Bq/0jBIlrL/AhxvgbCC9O+0j48usEXQf7Cc6ljOzr/i9gRdj9cm+G6vES7r6JoO/6UYJE+HLOfL91mlkv8CzB+Y23uPuXx1j3jcBz4XH9DHCju58Mu0o+AawN63zRGWz/B8CTBIn+xwT7jvCcy7eBZ8L3fzTic58B3mxmx8zs30cp930E3xp2EIzg+W9grHqVPHtxt7KIFCoz+1PgZnd/bdSxSH5Ti1+keKwkGAoqclpK/CUmnD+lZ5RHlFeTnmJm/zlGfP8ZdWzpSJk/aLTHZE/Yn2679xJ0zfxrtrYhxUNdPSIiJUYtfhGRElMQEyk1Njb6woULow5DRKSgPPnkkx3u3jRyeUEk/oULF7Ju3bqowxARKShmNnIGAEBdPSIiJUeJX0SkxCjxi4iUGCV+EZESo8QvIlJilPhFREqMEr+ISIlR4heRkqJpagrkAi4RkUz4/vp9/O29zzGtspxrzpvLX79xGS++8V1pUItfRErClx7ZwQe+/TTLZtdy9qwa/vPX2/n2E5m61XNhUYtfRIrelkNd3P7AFl6/YhaffesFlMWMt3/5cf7hh5u48KzpLGoqrfuuq8UvIkUtkXT++rvPUldZzu1vOpeKshixmPGpPzwfx/mvNaV37xolfhEpat99ah9P7+3k769ZwfTqilPLZ9VN5cqXz+EHGw5wYmAowghzT4lfRIrWwFCSz/xiG+fOm8a15819yfs3vrKFnv4hfvzMwQiii44Sv4gUrW8/sYf9nSf54OtHH73zyoUNLGqq5q4SO8mrxC8iRalvMMF//LKVVy5s4LKljaOuY2bccH4zT+4+Rnt3f44jjI4Sv4gUpW8+tpu27v4xW/vDXr1sJgBrWttzFVrklPhFpOj09g/x+Ye2c+nSRi5aNOO0666cW8eM6goe3tqRo+iip8QvIkXnzt/u4WjvAB+44uxx143FjEuXNvLw1naSydKYziFrid/MvmxmbWa2MWXZdDP7uZltC382ZGv7IlKaBhNJvrxmJxeeNZ0LWtJLMZed3cSR3gE2HezKcnT5IZst/q8Cbxyx7MPAg+6+FHgwfC0ikjH3P3uQA8f7uPXSRWl/5tKlTQCsaS2N7p6sJX53fxg4OmLxdcDXwudfA67P1vZFpDR9Ze0uFjVV89pzZqb9mabaKZzVWM1Tu49lMbL8kes+/lnuPnylxCFg1lgrmtmtZrbOzNa1t5fO2XYRmbjt7T1s2NvJWy9sIRY7s1k3V82v56k9nSUxbXNkJ3c92Ltj7mF3v8PdV7v76qamphxGJiKF6gfr9xMzuGaUq3THs2pBAx09/ew7djILkeWXXCf+w2Y2ByD82Zbj7YtIkXJ3vr9hPxcvaWRW3dQz/vyq+fUAPLWn+Lt7cp347wPeET5/B/CDHG9fRIrUU3uOsffoSa4/v3lCnz9ndi2V5XHW7+nMbGB5KJvDOe8EHgWWmdk+M3sncDtwhZltAy4PX4uITNqDm9uIx4wrVo556vC0yuIxzp03jfUl0OLP2o1Y3P2mMd56Xba2KSKl65FtHVzQUk/d1PIJl3F+Sz1fXrOTgaEkFWXFe31r8dZMRErG0d4BNh44fmo8/kStnDuNwYTT2taTocjykxK/iBS8ta0duMOlY8zCma7ls2sB2FzkV/Aq8YtIwXtkWzt1U8s4d179pMo5q7GairKYEr+ISL5b23qEi5c0Ej/Di7ZGKovHWDarls2HlPhFRPJWW1cf+ztP8ooFmZnzcfmcWjYf7C7qK3iV+EWkoK3f2wnAqjRn4hzP8jl1HO0dKOo7cinxi0hBW7+nk/K4sXJuXUbKWz4nKKeYp2hW4heRgrZh7zGWz6ljank8I+UNJ/7NB7szUl4+UuIXkYKVSDrP7DvO+eE8O5kwrbKcWXVTinosvxK/iBSsrYe7OTGQYFVLfUbLXTKzhu3tSvwiInnn6fDE7nmTHL8/0uKmIPEX68geJX4RKVibDnZRXRFn4YzqjJa7uKmG7r4h2nuKc2SPEr+IFKwtB7s5Z07dGd9tazyLm2oAirafX4lfRAqSu7P5UBfL59RmvOzFM4NvENvbezNedj5Q4heRgrTv2Em6+4ZODb/MpNl1U6muiLNdLX4RkfwxPJHaObMzn/jNjMVFPLJHiV9ECtKWQ92YBbdMzIbFTTVq8YuI5JPNB7tYML2K6inZuZHgosZqDhzv48TAUFbKj5ISv4gUpM0Hu7LSvz9sYWNwgnfP0RNZ20ZUlPhFpOCcHEiw++gJlmWpmwc4dW3Arg4lfhGRyAVX1cLZs7KX+FtmVAGw52jxDelU4heRgjN8YdXSmTVZ28a0ynIaqsrZdUQtfhGRyG093E1ZzFiQ4akaRmqZUc3uI2rxi4hEbltbz6kbo2fTwhlV7FaLX0Qkeq1tPSydlb1unmELpldxoPMkA0PJrG8rl5T4RaSg9A0m2H2klyUzs3did9iCGdUkHfYdK65WvxK/iBSUHe29JB3OzkGLf2FjMLKn2Lp7lPhFpKBsawvuhbs0By3+lunhWP4iO8GrxC8iBaW1rYd4zE61xrOpsaaC6oq4WvwiIlHa3t5Dy/QqppTFs74tM2P+9Cr2HTuZ9W3lUiSJ38w+YGbPmdlGM7vTzKZGEYeIFJ7tbb0sbsru+P1U8xoqdXJ3ssysGbgNWO3uLwPiwI25jkNECk8i6ezs6D11a8RcmNdQxf5jJ4vqxutRdfWUAZVmVgZUAQciikNECsi+YycYSCRznPgr6e4foutk8UzPnPPE7+77gU8Ce4CDwHF3/9nI9czsVjNbZ2br2tvbcx2miOSh4TtiDd8TNxfmNVQCsLeIunui6OppAK4DzgLmAtVmdvPI9dz9Dndf7e6rm5qach2miOSh7W3BsMpFjbnt6gGK6gRvFF09lwM73b3d3QeB7wG/G0EcIlJgtrf3MKO6gobqipxtc7jFX0wneKNI/HuAi8ysyswMeB2wOYI4RKTAbG/vyWn/PgTTM9dMKVOLfzLc/XHgHuAp4NkwhjtyHYeIFJ7t7b057d+HYCx/sQ3pzM5disfh7n8P/H0U2xaRwnS0d4CjvQM5b/FD0M9fTIlfV+6KSEHYMTyiJ5LEX8m+IhrLr8QvIgVhe8SJv6d/iOMnB3O+7WxQ4heRgrC9vZeKshjN4SibXCq2IZ1K/CJSELa39bCosZp4zHK+7WIb0qnELyIFIYqhnMPmq8UvIpJb/UMJ9hw9kdNZOVPVVZZRW0Rj+ZX4RSTv7T5ygqTD4pnRtPjNjOYiGsuvxC8ieW97W3QjeoYFY/nV4hcRyYnhoZxnNUbT1QPBCd69R08UxVh+JX4RyXutbT0011dSPSWSyQYAmD+9it6BBJ0nCn8svxK/iOS91vaeyPr3h70wpLPwu3uU+EUkryWTTmtbD0si7N+H4hrLr8QvInltf+dJ+gaTLJ0VdeIvnrH8SvwiktdawxO7SyLu6plWWU7t1LKiuAWjEr+I5LXWw2Hij7irB6C5vpL9avGLiGRXa1vub7c4lnkNlezvVOIXEcmq1vaeyLt5hjXXK/GLiGSVu7PtcHf+JP6GSrr7hujqK+yx/Er8IpK32nv66eobypvEP7c+GNJZ6P38Svwikrdawzl6ls6sjTiSQLMSv4hIdg0n/nxp8Q/f/evAcSV+EZGsaG3roWZKGbPqpkQdCgCN1VOoKIupxS8iki2tbcEcPWa5v93iaGIxo7m+kn0FPrJHiV9E8ta2th6W5kk3z7C59VNLo8VvZt8zs6vMTP8oRCQnjp8cpL27P2/694cVw1j+dBP554G3AtvM7HYzW5bFmEREXjixmwdTNaRqrq+ivbufvsFE1KFMWFqJ391/4e5vAy4AdgG/MLPfmNmfmFl5NgMUkdLU2tYNEPmsnCMNj+w5dLwv4kgmLu2uGzObAdwCvAtYD3yG4B/Bz7MSmYiUtNa2HirKYqemQ84Xp8byF3B3T1r3MTOz7wPLgG8A17j7wfCtb5vZumwFJyKla8uhbpbOrCEey48RPcOK4SKudG9g+UV3vz91gZlNcfd+d1+dhbhEpMQ9f6ibS5Y2Rh3GS8yeNhUzCnpIZ7pdPR8fZdmjE92omdWb2T1mtsXMNpvZ70y0LBEpPkd7B2jr7mf57LqoQ3mJirIYs2oLe0jnaVv8ZjYbaAYqzWwVMPydqw6YTMfbZ4CfuPubzaxikmWJSJHZcqgLgGWz82OOnpGaGyo5UMAt/vG6et5AcEJ3HvCplOXdwEcmskEzmwZcFpaLuw8AAxMpS0SK05aDwYiec+bkaeKvr2TD3s6ow5iw0yZ+d/8a8DUze5O7fzdD2zwLaAe+YmbnAU8C73f33tSVzOxW4FaAlpaWDG1aRArB84e6mV5dQVNNfszRM9Lc+koe2HiQZNKJ5dnJ53Scto/fzG4Ony40s78c+ZjgNssIhoH+X3dfBfQCHx65krvf4e6r3X11U1PTBDclIoVoy6Euzpldmzdz9IzU3FDJYMJp6+6POpQJGe/kbnX4swaoHeUxEfuAfe7+ePj6HoJ/BCIiJJLO1sM9edu/DzDv1Fj+ExFHMjHjdfV8Ifz5D5naoLsfMrO9ZrbM3Z8HXgdsylT5IlLYdh/p5eRgIi9H9Awbvnp3f2cfr1gQcTATkO4kbf9iZnVmVm5mD5pZe0o30ES8D/iWmT0DnA/80yTKEpEisvFAMKJnZXP+Jv5CvwVjuuP4X+/uXcDVBHP1LAE+NNGNuvuGsP/+XHe/3t2PTbQsESkuG/cfpyIey5vbLY6mZkoZ0yrLC7arJ93EP9wldBVwt7sfz1I8IlLiNu4/zjlzaqkoy+9Z4JvrK4u+xf8jM9sCvAJ40MyagMKdmk5E8pK7s3H/cVbOnRZ1KONqbijcefnTnZb5w8DvAqvdfZBgCOZ12QxMRErP3qMn6eob4uXNBZD4wxa/u0cdyhlLd5I2gHMIxvOnfubrGY5HRErYxgNBL/LL8vjE7rB5DZX0DiToOjnEtKrCui1JutMyfwNYDGwAhm874yjxi0gGPbv/OGUxy+sx/MOGR/bs6zzBtKr8/4aSKt0W/2pghRfidxoRKRjP7Otk2exappTFow5lXKnz8hfCOYlU6Z7c3QjMzmYgIlLaEklnw55OLmhpiDqUtLxwEVfhneBNt8XfCGwys98CpyancPdrsxKViJScrYe76R1IcMGC+qhDScuM6gqmlscKcnrmdBP/x7IZhIjIU3uC6zgLpcVvZsytL8whnWklfnf/tZktAJa6+y/MrArI/044ESkY6/d0Mr26gpbphXNfpkK9iCvduXr+jGAWzS+Ei5qBe7MUk4iUoKf2HOOClvq8nYp5NM0F2uJP9+Tue4CLgS4Ad98GzMxWUCJSWjpPDLCjvZdVBdLNM6y5vpKOngH6BhPjr5xH0k38/eEtEgEIL+LS0E4RyYjHdx4F4JULp0ccyZkZHtlTaCd40038vzazjxDcdP0K4G7gh9kLS0RKyZptHVRVxDl/fn3UoZyRU2P5izTxf5jgPrnPAv8DuB/4aLaCEpHSsra1g4sWzcj7GTlHOjWWv8BO8KY7qidpZvcC97p7e3ZDEpFSsr/zJDs6ennbRYV3K6tZdVOJWZG1+C3wMTPrAJ4Hng/vvvV3uQlPRIrd2m0dAFy6tDHiSM5ceTzG7LqpBdfiH+971QcIRvO80t2nu/t04FXAxWb2gaxHJyJF75HWDmbWTmHpzJqoQ5mQ5oZK9hVTix94O3CTu+8cXuDuO4CbgT/OZmAiUvySSec3rR1csqSxoMbvp2quryy6UT3l7t4xcmHYz19YE1CLSN7ZfKiLI70DXLyk8Lp5hjU3VHLoeB+JZOGMcB8v8Q9M8D0RkXGtbQ3alZcUYP/+sLn1lQwlncNdhXM32vFG9ZxnZl2jLDdgahbiEZES8si2DpbOrGFWXeGmk9Sx/MM3Z8l3p23xu3vc3etGedS6u7p6RGTC+gYTPLHraEF380BwC0YorLH8hXW1hIgUjad2H6NvMFmQwzhTzS3Aq3eV+EUkEmtaOyiLGa9aNCPqUCalqqKM6dUVSvwiIuNZ09rBqpZ6aqakez+o/DW3vrAu4lLiF5Gc6zwxwLP7jxd8//6wQpuXX4lfRHLuN9uP4F6Y0zSMprm+iv3HTuJeGGP5lfhFJOfWtHZQM6WMc+fVRx1KRjQ3VHJyMEHnicGoQ0lLZInfzOJmtt7MfhRVDCISjeFpmMvjxdH2LLR5+aPc6+8HNke4fRGJwN6jJ9h95ASXLCns0TyphhP/vgI5wRtJ4jezecBVwJei2L6IRGfNqWkamiKOJHNO3ZBFLf7T+jTwP4HkWCuY2a1mts7M1rW3694vIsVizbYOZtdNZXFTddShZExDVTmV5fGCGdKZ88RvZlcDbe7+5OnWc/c73H21u69uaiqeloFIKUsmnbXbO7hkaeFOwzwaM6O5oXCmZ46ixX8xcK2Z7QLuAl5rZt+MIA4RybHnDnTReWKQS4pk/H6qQhrLn/PE7+5/4+7z3H0hcCPwS3e/OddxiEjuDffvF8uFW6nmKvGLiLzUmtZ2zpldS1PtlKhDybh5DZUc7R3gxMBQ1KGMK9LE7+4PufvVUcYgIrkRTMN8rCi7eeCFIZ2F0M+vFr+I5MQTu44yMJTk4iKZpmGkF4Z05v+duJT4RSQn1rR2UB43XnXW9KhDyYpTV+8WwJBOJX4RyYk12zq4oKWBqorCn4Z5NDNrpxCPGfs7T0QdyriU+EUk69q7+3nuQBeXnV281+SUxWPMriuMefmV+EUk69a0BlffX1ZE0zSMprmhMIZ0KvGLSNY9srWD6dUVrJxbF3UoWTWvvpIDOrkrIqUumXQe3tbBJUsaicWKZ5qG0TQ3VHKoq4+hxJjTkOUFJX4RyarNh7ro6Okv6v79YXPrK0kknUNd+d3qV+IXkax6eGswTcNlRTp+P1WhDOlU4heRrHpkWzBNw8y6qVGHknWFMi+/Er+IZM2JgSHW7TrG75VANw+oxS8iwmM7jjCQSHJpkQ/jHDa1PE5jTQUHjivxi0iJenhrB1PLY6xe2BB1KDkzt74y7++9q8QvIlnz8NZ2Llo0g6nl8ahDyZl5DUr8IlKitrf3sKOjl9eeMzPqUHJqwYxq9h07kddj+ZX4RSQrfrHpMACXL58VcSS5tXBGFYMJ5+Dx/B3Lr8QvIlnx802HWTm3jrnhSJdS0TK9GoBdR3ojjmRsSvwiknFHevp5cs+xkmvtAyxsrAJg95H8nZ5ZiV9EMu7BzW24wxUrSi/xz6qdypSyGLvV4heRUvKjZw/SMr2q6GfjHE0sZrRMr1KLX0RKx5Gefta2dnD1uXMwK+7ZOMeyYEa1Er+IlI6fPHeIRNK5+ty5UYcSmYUzqth9tBd3jzqUUSnxi0hG/fDpAyxuqmb5nNqoQ4nMghlV9A0maevujzqUUSnxi0jG7Dt2gsd3HuWa8+aWbDcPBF09ALs68vMErxK/iGTMd5/cD8CbXzEv4kiidVZjkPh3KvGLSDFLJp27n9zLxYsbmddQFXU4kZpbX8mUshjb23uiDmVUSvwikhGP7jjCvmMnecvq0m7tA8RjxlmN1WxvV4tfRIrY1x/dRX1VOW9YOTvqUPLC4pk1tLapxS8iRWrPkRP8bNNh3vaqlpKagvl0ljTVsPfYCfoGE1GH8hJK/CIyaV/5zU7iZrz9ooVRh5I3Fs+swT0/J2vLeeI3s/lm9isz22Rmz5nZ+3Mdg4hkTlffIN95Yi9XnzuH2dOK/4bq6VrcFIzsycfunrIItjkEfNDdnzKzWuBJM/u5u2+KIBYRmaTvPLGX3oEE77xkUdSh5JVFjTWYwfY2tfhx94Pu/lT4vBvYDDTnOg4RmbyhRJKvrN3FhQun8/J506IOJ69UVsRprq+kNQ+HdEbax29mC4FVwOOjvHerma0zs3Xt7e05j01ExvezTYfZ33mSP71kYdSh5KXFTfk5sieyxG9mNcB3gb9w966R77v7He6+2t1XNzU15T5AETktd+ezv2xlwYwqrlihIZyjWTa7lu1tPXl3/91IEr+ZlRMk/W+5+/eiiEFEJuenzx1m08Eu3vfapcRjpTsvz+ksn1PLQCLJjjybuiGKUT0G/Bew2d0/levti8jkJZPOp3+xlbMaq7n+/NKdfnk8y+cEN6LZfPAlnRqRiqLFfzHwduC1ZrYhfFwZQRwiMkE/fe4QWw51c9vrllAW1+VAY1ncVENFPMamPEv8OR/O6e5rAH0vFClQQWt/G4uaqrn2PA3IO53yeIwlM2vYfLA76lBeRP+qReSMPLDxEM8f7ub9r1PffjqWz6lTV4+IFK7+oQT/8tMtLJ1ZU9K3VjwTy+fU0t7dT0dP/tyNS4lfRNL2lbW72H3kBH93zQq19tO0Ig9P8Crxi0haDh3v47O/bOXy5bO4dKmurUnXirlB4n9m3/GII3mBEr+IjMvd+ei9zzKUTPK3Vy+POpyCUl9VwaLGajbs7Yw6lFOU+EVkXD985iC/2NzGB69YdupG4pK+VS0NrN9zDHePOhRAiV9ExrH36An+1/ef5bz59fzJxQujDqcgrWqpp6NngH3HTkYdCqDELyKnMTCU5L13rgeH/7hxlS7WmqBVLfUAPLXnWLSBhHQURWRMn/zZ8zy9t5Pb33QuLTOqog6nYC2bVUtleZz1ezqjDgVQ4heRMfxqSxt3PLyDmy9q4apz50QdTkEri8c4d9401qvFLyL56uDxk/zldzawfE4dH71qRdThFIXVCxvYeKCL7r7BqENR4heRFxtKJHn/XRvoH0ryubeuYmp5POqQisIlS5pIJJ3HdhyNOhQlfhF5sU/+bCu/3XmUT9zwMhY11UQdTtG4YEE9leVx1myL/o6CSvwicspPNh7iP3+9nbe9qoUbVs2LOpyiMqUszkWLpvPIto6oQ1HiF5HAjvYe/urupzlvfj1/d4369bPhkqVN7OjoZX9ntOP5lfhFhN7+If78m09SHjc+/7YLmFKmfv1suHRpIwAPb422u0eJX6TEDSaS/H/feorWth7+/aZVNNdXRh1S0Vo6s4aW6VXc/+zBSONQ4hcpYUOJJB+6+2l+vbWdT9zwcs26mWVmxtXnzuE3249wJML5+ZX4RUrUyYEEf/7Np7h3wwE+9IZl3HRhS9QhlYRrzptLIuk8sPFQZDEo8YuUoK2Hu7nuc2t4cMth/vG6lbznNUuiDqlknDO7liUza/jh0wcii0GJX6SEdPcN8r/v38xV//4IR3sH+NqfXMgf/87CqMMqKWbGdefN5fGdR9nR3hNJDEr8IiUgmXTuXreX13zy13zh4R1cd34zD7z/Mi47W336UbjxwhYq4jG+snZXJNsvi2SrIpIT7sEUAbf/ZAtP7+3k/Pn1fOkdqzl/fn3UoZW0ptopXHf+XO5+ci8ffP3Z1FdV5HT7avGLFCF3Z822Dv7oC49x0xcf42DnSf71LefxvXf/rpJ+nnjnpWfRN5iMpNWvFr9IEdnZ0ct9Gw5w39P72d7ey+y6qXzsmhXceGGLJlvLM+fMruPKl8/mCw9v549eOZ+5Obx+QolfpMC1d/fzgw37+cGGAzy7/zhmcOHC6fzZpYu44YJmXYWbxz5y5XIe3NzGJ+7fzOfeekHOtqvEL1KABhNJfrWljbuf3Mcvt7SRSDrnzpvGR69azlXnzmHONF19WwjmNVTx7lcv5tO/2MYVy/dz/armnGxXiV+kQLg7G/d38d2n9nHf0wc42jtAU+0U3nXpWbzlFfNZMlNTKBei97xmCb/ZfoQPf+8Zzp5Vy4q5dVnfprl71jcyWatXr/Z169ZFHYZIznX3DbK29QgPPd/GQ8+3c6irj4p4jCtWzOIPLmjm985u0g3Qi0Bbdx/X/sdaBhJJvv6nF/Ky5mkZKdfMnnT31S9ZrsQvkh96+4do6+5n88EuNuztZP2eY6zf08lQ0qmdUsYlSxt5zbKZvGHlbKZVlUcdrmTYzo5ebv7S43SdHOQfr1/J9ec3Y2aTKjOvEr+ZvRH4DBAHvuTut59ufSV+iZq70z+UpLd/iJ7+Ibr7gp+9/UOcGEjQP5SkfyhB/2Dy1POTgwlODiTo7U9wYmCI3oEEJ/qHODmYIJF0hpJOIukMJpJ0nhikp3/o1PYqymK8bG4dF541g9csa+KCBQ2Uq2Vf9A50nuS2O9ezbvcxLl3ayF9cfjavWNAw4fLyJvGbWRzYClwB7AOeAG5y901jfWaiiX/r4W6O9g6k/JElGUo48ZhRFo9RFjPiMaM8bsRjweuyuFEWi1ERjwXP40Z5LEY8brgDDkn38AGO4+GyMX8CL+zmYPnwSw/LSCZfXG7SnWRy9OeJ8HUiGayfSAZlGEbMgkvCzSBmhgGxGCSTkHDHw/WHy7Bw/bgZ8dgLz2Ox4PPxmBGz4GEGFhxDAOIxTu23eMxSfgb7K3V5WTxGPCxjNIOJJIMJZyiRZCARHKfB0zwfXn8wXDYQLht+/qJjnnSGEh4uS4bJ1ukbTNA3+EKCPjmYpG8wwcDQ8LZe2MZgIknyDP9UKuIxqqfEqaooe9HPqWXxU79nw/unrrKcWXVTmVk7hSUza1g+p46KMiX6UpRIOl9Zu5PPP7Sdo70D3Pueiyd87cVYiT+Kk7sXAq3uvgPAzO4CrgPGTPwT9U/3b+ah56O/v6VEZ+Q/pfL4C8l2anmcqeVxKiuCpDy9OnheEY9RURYk5vJ4jPKy4J9/ZUWcmillwWNq2annlRVBMp9SHmNKWYwpZXEqyoLtiJypeMx416WLeOurWnjg2UOcNy8z/f2pokj8zcDelNf7gFeNXMnMbgVuDV/2m9nGHMSWK41A9DfezKxiq1Ox1QdUp0KQ6fosGG1h3g7ndPc7gDsAzGzdaF9XClWx1QeKr07FVh9QnQpBruoTRSfifmB+yut54TIREcmBKBL/E8BSMzvLzCqAG4H7IohDRKQk5byrx92HzOy9wE8JhnN+2d2fG+djd2Q/spwqtvpA8dWp2OoDqlMhyEl9CuICLhERyRwNFBYRKTFK/CIiJSavEr+ZvdHMnjezVjP78Cjv/6WZbTKzZ8zsQTMbdYxqvkijPn9uZs+a2QYzW2NmK6KI80yMV6eU9d5kZm5meT3ULo1jdIuZtYfHaIOZvSuKOM9EOsfIzP4w/Ft6zsz+O9cxnok0jtG/pRyfrWbWGUGYZySNOrWY2a/MbH2Y767MaAAeXsYf9YPgRO92YBFQATwNrBixzmuAqvD5u4FvRx33JOtTl/L8WuAnUcc92TqF69UCDwOPAaujjnuSx+gW4LNRx5rhOi0F1gMN4euZUcc9mfqMWP99BANGIo99ksfoDuDd4fMVwK5MxpBPLf5TUzm4+wAwPJXDKe7+K3c/Eb58jOAagHyVTn26Ul5W88IUPvlq3DqF/n/gn4G+XAY3AenWp5CkU6c/Az7n7scA3L0txzGeiTM9RjcBd+YksolLp04ODE/MPw04kMkA8inxjzaVw+luR/NO4IGsRjQ5adXHzN5jZtuBfwFuy1FsEzVunczsAmC+u/84l4FNULq/c28Kv27fY2bzR3k/n6RTp7OBs81srZk9Fs6Wm6/Szgth1+9ZwC9zENdkpFOnjwE3m9k+4H6CbzIZk0+JP21mdjOwGvg/UccyWe7+OXdfDPw18NGo45kMM4sBnwI+GHUsGfRDYKG7nwv8HPhaxPFkQhlBd8+rCVrIXzSz+igDypAbgXvcPRF1IBlwE/BVd58HXAl8I/z7yoh8SvxpTeVgZpcD/wu41t37cxTbRJzp1BR3AddnM6AMGK9OtcDLgIfMbBdwEXBfHp/gHfcYufuRlN+zLwGvyFFsE5XO790+4D53H3T3nQTTpC/NUXxn6kz+jm4k/7t5IL06vRP4DoC7PwpMJZjALTOiPtGRcjKjDNhB8FVt+ITHyhHrrCI4KbI06ngzVJ+lKc+vAdZFHfdk6zRi/YfI75O76RyjOSnPbwAeizruDNTpjcDXwueNBN0OM6KOfaL1Cdc7B9hFeFFqPj/SPEYPALeEz5cT9PFnrG55MzunjzGVg5n9I0FCvI+ga6cGuDu8Gcged782sqBPI836vDf8BjMIHAPeEV3E40uzTgUjzfrcZmbXAkPAUYJRPnkrzTr9FHi9mW0CEsCH3P1IdFGP7Qx+524E7vIwU+azNOv0QYIuuA8QnOi9JZN105QNIiIlJp/6+EVEJAeU+EVESowSv4hIiVHiFxEpMUr8IiIlRolfSlo4A+IbRiz7CzP7v2Os/1AeX5AmkhYlfil1dxKMAU9VKFeAikyIEr+UunuAq8ysAsDMFgJzgZvMbF04X/0/jPZBM+tJef5mM/tq+LzJzL5rZk+Ej4vD5b+XMm/8ejOrzXLdREaVN1fuikTB3Y+a2W+B3wd+QNDa/w7wT+F7ceBBMzvX3Z9Js9jPAP/m7mvMrIXgCs3lwF8B73H3tWZWQ/5PWy1FSi1+kRd39wx38/yhmT1FcMOSlQQ3w0jX5cBnzWwDcB9QFyb6tcCnzOw2oN7dhzIUv8gZUeIXCVr6rwvvJVBFMCfPXwGv82A65h8TzI44Uup8J6nvx4CL3P388NHs7j3ufjvwLqASWGtm52SjMiLjUeKXkufuPcCvgC8TtPbrgF7guJnNIugGGs1hM1sezpN+Q8ryn5Fy4wwzOz/8udjdn3X3fwaeIJhRUiTnlPhFAncC5wF3uvvTBF08W4D/JuiiGc2HgR8BvwEOpiy/DVgd3rVrE/Dn4fK/MLONZvYMwYys+XwHOSlimp1TRKTEqMUvIlJilPhFREqMEr+ISIlR4hcRKTFK/CIiJUaJX0SkxCjxi4iUmP8H5gTHjjvCKn4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data.plot(scope='dmri_rsi.nd_fiber.at_fmaj')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How about we apply just a strict criteria of say 10 std." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 26 Rows\n" ] } ], "source": [ "data.filter_outliers_by_std(scope='data', n_std=10, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define a Test set. \n", "\n", "In this example project we are going to test a bunch of different Machine Learning Pipeline's. In order to avoid meta-issues of overfitting onto our dataset, we will therefore define a train-test split. The train set we will use to try different pipelines, then only with the best final pipeline will we use the test set. \n", "\n", "We will impose one extra constraint when applying the test split, namely that members of the same family, i.e., those with the same family id, stay in the same training or testing fold." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Performing test split on: 10663 subjects.\n", "random_state: 6\n", "Test split size: 0.2\n", "\n", "Performed train/test split\n", "Train size: 8562\n", "Test size: 2101\n" ] }, { "data": { "text/html": [ "
\n", "

Data

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dmri_rsi.n0_fiber.at_allfib.lhdmri_rsi.n0_fiber.at_allfib.rhdmri_rsi.n0_fiber.at_allfibersdmri_rsi.n0_fiber.at_allfibnocc.lhdmri_rsi.n0_fiber.at_allfibnocc.rhdmri_rsi.n0_fiber.at_atr.lhdmri_rsi.n0_fiber.at_atr.rhdmri_rsi.n0_fiber.at_ccdmri_rsi.n0_fiber.at_cgc.lhdmri_rsi.n0_fiber.at_cgc.rh...dmri_rsi.vol_fiber.at_scs.lhdmri_rsi.vol_fiber.at_scs.rhdmri_rsi.vol_fiber.at_sifc.lhdmri_rsi.vol_fiber.at_sifc.rhdmri_rsi.vol_fiber.at_slf.lhdmri_rsi.vol_fiber.at_slf.rhdmri_rsi.vol_fiber.at_tslf.lhdmri_rsi.vol_fiber.at_tslf.rhdmri_rsi.vol_fiber.at_unc.lhdmri_rsi.vol_fiber.at_unc.rh
00.3276230.3234200.3259570.3405590.3323640.3478370.3360720.3068030.3113470.304854...23672.013056.09648.09528.010152.011504.08384.08024.04968.07176.0
20.3253740.3114650.3190270.3412130.3263340.3466510.3353620.2881240.3264160.300990...33112.019256.011928.08688.013144.015344.010488.010936.06904.09480.0
30.3050950.3043570.3051700.3154770.3128660.3139720.3167290.2887420.2891660.290347...28480.016016.013024.011960.013600.014880.011416.010592.06952.08736.0
40.3168600.3152380.3163990.3282510.3272590.3339980.3181620.2940080.2978000.299230...29904.017968.012720.011336.013528.015672.011096.011816.05912.07336.0
50.3235210.3267410.3254660.3360030.3352910.3262430.3373670.3053820.3118430.315721...23048.012032.09056.09248.09672.011048.07848.07520.05088.07448.0
..................................................................
118700.3357410.3360480.3363720.3498060.3477320.3499660.3456920.3120560.3240540.336676...28328.015400.09656.010080.011312.013496.08728.09176.04960.07392.0
118710.3205630.3175250.3194290.3273020.3221610.3330860.3154820.3085540.2999380.298093...23792.013632.09928.08912.09152.012288.07128.08912.05744.07376.0
118720.3270510.3253860.3265220.3409180.3348540.3454350.3356100.3057200.3086300.330612...28640.016384.09496.011216.012168.012312.09520.08952.04568.09056.0
118730.3235790.3193770.3218050.3349450.3294330.3322000.3340170.3043990.3038310.307037...26216.014672.09408.08872.010960.012584.08880.09176.03696.06168.0
118740.3835370.3714830.3778220.3944130.3734860.4046840.3744350.3662700.4153420.418280...26544.015624.09904.010360.09904.012216.07712.08000.05208.08816.0
\n", "

10663 rows × 294 columns

\n", "

8562 rows × 294 columns - Train Set

2101 rows × 294 columns - Test Set

\n", "
\n", "

Target

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
anthro_waist_cm
031.00
226.75
323.50
430.00
528.00
......
1187026.00
1187130.00
1187219.00
1187325.00
1187432.00
\n", "

10663 rows × 1 columns

\n", "

8562 rows × 1 columns - Train Set

2101 rows × 1 columns - Test Set

\n", "
\n", "

Non Input

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rel_family_idsex
073210
239711
331391
445431
519331
.........
1187031281
1187121110
1187259070
1187355940
1187462380
\n", "

10663 rows × 2 columns

\n", "

8562 rows × 2 columns - Train Set

2101 rows × 2 columns - Test Set

\n" ], "text/plain": [ " anthro_waist_cm dmri_rsi.n0_fiber.at_fx.rh \\\n", "0 31.00 0.246540 \n", "2 26.75 0.146416 \n", "3 23.50 0.229894 \n", "4 30.00 0.192228 \n", "5 28.00 0.223994 \n", "... ... ... \n", "11870 26.00 0.236385 \n", "11871 30.00 0.247628 \n", "11872 19.00 0.224581 \n", "11873 25.00 0.212500 \n", "11874 32.00 0.237271 \n", "\n", " dmri_rsi.n0_fiber.at_fx.lh dmri_rsi.n0_fiber.at_cgc.rh \\\n", "0 0.240964 0.304854 \n", "2 0.241515 0.300990 \n", "3 0.225981 0.290347 \n", "4 0.201559 0.299230 \n", "5 0.230152 0.315721 \n", "... ... ... \n", "11870 0.233723 0.336676 \n", "11871 0.244926 0.298093 \n", "11872 0.181725 0.330612 \n", "11873 0.225045 0.307037 \n", "11874 0.232131 0.418280 \n", "\n", " dmri_rsi.n0_fiber.at_cgc.lh dmri_rsi.n0_fiber.at_cgh.rh \\\n", "0 0.311347 0.255081 \n", "2 0.326416 0.230479 \n", "3 0.289166 0.204329 \n", "4 0.297800 0.249119 \n", "5 0.311843 0.210526 \n", "... ... ... \n", "11870 0.324054 0.245577 \n", "11871 0.299938 0.201506 \n", "11872 0.308630 0.251494 \n", "11873 0.303831 0.254441 \n", "11874 0.415342 0.241162 \n", "\n", " dmri_rsi.n0_fiber.at_cgh.lh dmri_rsi.n0_fiber.at_cst.rh \\\n", "0 0.244332 0.378148 \n", "2 0.232169 0.386119 \n", "3 0.200217 0.364850 \n", "4 0.248674 0.365131 \n", "5 0.197801 0.400478 \n", "... ... ... \n", "11870 0.259280 0.388121 \n", "11871 0.192919 0.386154 \n", "11872 0.265312 0.382624 \n", "11873 0.257420 0.380283 \n", "11874 0.263211 0.394936 \n", "\n", " dmri_rsi.n0_fiber.at_cst.lh dmri_rsi.n0_fiber.at_atr.rh ... \\\n", "0 0.388728 0.336072 ... \n", "2 0.400546 0.335362 ... \n", "3 0.365397 0.316729 ... \n", "4 0.359856 0.318162 ... \n", "5 0.395570 0.337367 ... \n", "... ... ... ... \n", "11870 0.388715 0.345692 ... \n", "11871 0.390666 0.315482 ... \n", "11872 0.379022 0.335610 ... \n", "11873 0.377752 0.334017 ... \n", "11874 0.414786 0.374435 ... \n", "\n", " dmri_rsi.vol_fiber.at_ifsfc.lh dmri_rsi.vol_fiber.at_fxcut.rh \\\n", "0 13224.0 2720.0 \n", "2 18080.0 2728.0 \n", "3 18768.0 2776.0 \n", "4 18112.0 2528.0 \n", "5 14224.0 2216.0 \n", "... ... ... \n", "11870 15848.0 2760.0 \n", "11871 14368.0 2728.0 \n", "11872 17616.0 2928.0 \n", "11873 16024.0 2728.0 \n", "11874 17096.0 1976.0 \n", "\n", " dmri_rsi.vol_fiber.at_fxcut.lh dmri_rsi.vol_fiber.at_allfibers \\\n", "0 1928.0 264000.0 \n", "2 2264.0 339840.0 \n", "3 1784.0 331024.0 \n", "4 2408.0 327192.0 \n", "5 1536.0 259936.0 \n", "... ... ... \n", "11870 2192.0 292304.0 \n", "11871 2072.0 271624.0 \n", "11872 2072.0 317816.0 \n", "11873 2032.0 286832.0 \n", "11874 1776.0 298280.0 \n", "\n", " dmri_rsi.vol_fiber.at_allfibnocc.rh \\\n", "0 91368.0 \n", "2 119280.0 \n", "3 114912.0 \n", "4 115360.0 \n", "5 91944.0 \n", "... ... \n", "11870 101352.0 \n", "11871 98328.0 \n", "11872 113016.0 \n", "11873 97896.0 \n", "11874 105096.0 \n", "\n", " dmri_rsi.vol_fiber.at_allfibnocc.lh dmri_rsi.vol_fiber.at_allfib.rh \\\n", "0 92144.0 133816.0 \n", "2 123784.0 166808.0 \n", "3 115472.0 167400.0 \n", "4 114592.0 165224.0 \n", "5 92208.0 130792.0 \n", "... ... ... \n", "11870 103848.0 145272.0 \n", "11871 94352.0 138504.0 \n", "11872 106280.0 162352.0 \n", "11873 97496.0 143704.0 \n", "11874 101088.0 152848.0 \n", "\n", " dmri_rsi.vol_fiber.at_allfib.lh sex rel_family_id \n", "0 131856.0 0 7321 \n", "2 174480.0 1 3971 \n", "3 165336.0 1 3139 \n", "4 164176.0 1 4543 \n", "5 130960.0 1 1933 \n", "... ... ... ... \n", "11870 148776.0 1 3128 \n", "11871 134832.0 0 2111 \n", "11872 157344.0 0 5907 \n", "11873 144768.0 0 5594 \n", "11874 147488.0 0 6238 \n", "\n", "[10663 rows x 297 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We use this to say we want to preserve families\n", "preserve_family = bp.CVStrategy(groups='rel_family_id')\n", "\n", "# Apply the test split\n", "data = data.set_test_split(size=.2,\n", " cv_strategy=preserve_family,\n", " random_state=6)\n", "\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate Different Pipelines\n", "\n", "First let's save some commonly used parameters in an object called the ProblemSpec, we will use all defaults except for the number of jobs, for that let's use n_jobs=8." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "ps = bp.ProblemSpec(n_jobs=8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function we will use to evaluate different pipelines is bp.evaluate, let's start with an example with just a linear regression model." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicting target = anthro_waist_cm\n", "Using problem_type = regression\n", "Using scope = all defining an initial total of 294 features.\n", "Evaluating 8562 total data points.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "662b37ffb59e4f7fb9573c17e5829b25", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Folds: 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dmri_rsi.n0_fiber.at_allfib.lhdmri_rsi.n0_fiber.at_allfib.rhdmri_rsi.n0_fiber.at_allfibersdmri_rsi.n0_fiber.at_allfibnocc.lhdmri_rsi.n0_fiber.at_allfibnocc.rhdmri_rsi.n0_fiber.at_atr.lhdmri_rsi.n0_fiber.at_atr.rhdmri_rsi.n0_fiber.at_ccdmri_rsi.n0_fiber.at_cgc.lhdmri_rsi.n0_fiber.at_cgc.rh...dmri_rsi.vol_fiber.at_scs.lhdmri_rsi.vol_fiber.at_scs.rhdmri_rsi.vol_fiber.at_sifc.lhdmri_rsi.vol_fiber.at_sifc.rhdmri_rsi.vol_fiber.at_slf.lhdmri_rsi.vol_fiber.at_slf.rhdmri_rsi.vol_fiber.at_tslf.lhdmri_rsi.vol_fiber.at_tslf.rhdmri_rsi.vol_fiber.at_unc.lhdmri_rsi.vol_fiber.at_unc.rh
013.65871710.18291711.1137032.7411032.0961370.652331-19.68933318.1335936.17015618.584436...0.000343-0.0005890.0005510.000348-0.0002160.0002781.902580e-04-0.0005440.0006430.000572
113.386769-7.8071205.5908722.0748261.611921-5.200155-17.35084018.9543274.41804418.556810...0.000263-0.0005050.0005230.000272-0.0003710.0004741.721382e-04-0.0005820.0005300.000525
214.6010056.4474036.5665935.1058032.144557-0.249971-17.50551619.81559210.12197215.690188...0.000319-0.0006000.0004550.000446-0.0003030.0006181.192093e-07-0.0006150.0003780.000662
312.66046416.193197-0.1935202.1844923.968235-7.643031-16.10378115.5393729.86713314.523899...0.000341-0.0004410.0005870.000453-0.0002840.0006081.287460e-04-0.0007010.0003590.000616
416.21234911.5189057.9051566.0858971.294112-8.058172-11.78694213.22719710.81222912.374775...0.000080-0.0005150.0006140.000411-0.0003420.0007842.737045e-04-0.0007380.0005550.000540
\n", "

5 rows × 294 columns

\n", "" ], "text/plain": [ " dmri_rsi.n0_fiber.at_allfib.lh dmri_rsi.n0_fiber.at_allfib.rh \\\n", "0 13.658717 10.182917 \n", "1 13.386769 -7.807120 \n", "2 14.601005 6.447403 \n", "3 12.660464 16.193197 \n", "4 16.212349 11.518905 \n", "\n", " dmri_rsi.n0_fiber.at_allfibers dmri_rsi.n0_fiber.at_allfibnocc.lh \\\n", "0 11.113703 2.741103 \n", "1 5.590872 2.074826 \n", "2 6.566593 5.105803 \n", "3 -0.193520 2.184492 \n", "4 7.905156 6.085897 \n", "\n", " dmri_rsi.n0_fiber.at_allfibnocc.rh dmri_rsi.n0_fiber.at_atr.lh \\\n", "0 2.096137 0.652331 \n", "1 1.611921 -5.200155 \n", "2 2.144557 -0.249971 \n", "3 3.968235 -7.643031 \n", "4 1.294112 -8.058172 \n", "\n", " dmri_rsi.n0_fiber.at_atr.rh dmri_rsi.n0_fiber.at_cc \\\n", "0 -19.689333 18.133593 \n", "1 -17.350840 18.954327 \n", "2 -17.505516 19.815592 \n", "3 -16.103781 15.539372 \n", "4 -11.786942 13.227197 \n", "\n", " dmri_rsi.n0_fiber.at_cgc.lh dmri_rsi.n0_fiber.at_cgc.rh ... \\\n", "0 6.170156 18.584436 ... \n", "1 4.418044 18.556810 ... \n", "2 10.121972 15.690188 ... \n", "3 9.867133 14.523899 ... \n", "4 10.812229 12.374775 ... \n", "\n", " dmri_rsi.vol_fiber.at_scs.lh dmri_rsi.vol_fiber.at_scs.rh \\\n", "0 0.000343 -0.000589 \n", "1 0.000263 -0.000505 \n", "2 0.000319 -0.000600 \n", "3 0.000341 -0.000441 \n", "4 0.000080 -0.000515 \n", "\n", " dmri_rsi.vol_fiber.at_sifc.lh dmri_rsi.vol_fiber.at_sifc.rh \\\n", "0 0.000551 0.000348 \n", "1 0.000523 0.000272 \n", "2 0.000455 0.000446 \n", "3 0.000587 0.000453 \n", "4 0.000614 0.000411 \n", "\n", " dmri_rsi.vol_fiber.at_slf.lh dmri_rsi.vol_fiber.at_slf.rh \\\n", "0 -0.000216 0.000278 \n", "1 -0.000371 0.000474 \n", "2 -0.000303 0.000618 \n", "3 -0.000284 0.000608 \n", "4 -0.000342 0.000784 \n", "\n", " dmri_rsi.vol_fiber.at_tslf.lh dmri_rsi.vol_fiber.at_tslf.rh \\\n", "0 1.902580e-04 -0.000544 \n", "1 1.721382e-04 -0.000582 \n", "2 1.192093e-07 -0.000615 \n", "3 1.287460e-04 -0.000701 \n", "4 2.737045e-04 -0.000738 \n", "\n", " dmri_rsi.vol_fiber.at_unc.lh dmri_rsi.vol_fiber.at_unc.rh \n", "0 0.000643 0.000572 \n", "1 0.000530 0.000525 \n", "2 0.000378 0.000662 \n", "3 0.000359 0.000616 \n", "4 0.000555 0.000540 \n", "\n", "[5 rows x 294 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Beta weights\n", "results.get_fis()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
predicty_true
2824.34583121.25
3324.80690224.50
3626.96858423.00
4027.80594830.80
4725.69167920.00
.........
1184828.00659825.50
1185123.92553524.00
1185530.38943738.80
1186125.68760926.00
1186829.83816035.00
\n", "

1713 rows × 2 columns

\n", "
" ], "text/plain": [ " predict y_true\n", "28 24.345831 21.25\n", "33 24.806902 24.50\n", "36 26.968584 23.00\n", "40 27.805948 30.80\n", "47 25.691679 20.00\n", "... ... ...\n", "11848 28.006598 25.50\n", "11851 23.925535 24.00\n", "11855 30.389437 38.80\n", "11861 25.687609 26.00\n", "11868 29.838160 35.00\n", "\n", "[1713 rows x 2 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Raw predictions made from each fold\n", "results.get_preds_dfs()[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All options are listed under: 'Saved Attributes' and 'Available Methods'.\n", "\n", "Anyways, let's continue trying different models. We will use a ridge regression model. Let's also use the fact that the jupyter notebook is defining variables in global scope to clean up the evaluation code a bit so we don't have to keep copy and pasting it." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def eval_pipe(pipeline, **kwargs):\n", " return bp.evaluate(pipeline=pipeline,\n", " dataset=data,\n", " problem_spec=ps,\n", " **kwargs)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0a633cae7c9145159bb4f267f1b13b7b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Folds: 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean_scores_explained_variancemean_scores_neg_mean_squared_errorstd_scores_explained_variancestd_scores_neg_mean_squared_errormean_timing_fitmean_timing_score
pipeline
sgd0.226867-13.6320930.0155470.28724916.1824050.007756
ridge0.244241-13.1055430.0143480.33520511.0246280.461483
elastic0.214935-13.6120860.0174570.3045218.5842990.036844
lgbm0.159819-14.6281750.0086120.31766430.4178820.063121
\n", "" ], "text/plain": [ " mean_scores_explained_variance mean_scores_neg_mean_squared_error \\\n", "pipeline \n", "sgd 0.226867 -13.632093 \n", "ridge 0.244241 -13.105543 \n", "elastic 0.214935 -13.612086 \n", "lgbm 0.159819 -14.628175 \n", "\n", " std_scores_explained_variance std_scores_neg_mean_squared_error \\\n", "pipeline \n", "sgd 0.015547 0.287249 \n", "ridge 0.014348 0.335205 \n", "elastic 0.017457 0.304521 \n", "lgbm 0.008612 0.317664 \n", "\n", " mean_timing_fit mean_timing_score \n", "pipeline \n", "sgd 16.182405 0.007756 \n", "ridge 11.024628 0.461483 \n", "elastic 8.584299 0.036844 \n", "lgbm 30.417882 0.063121 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Define a set of bp.Options as wrapped in bp.Compare\n", "compare_pipes = bp.Compare([bp.Option(sgd_pipe, name='sgd'),\n", " bp.Option(ridge_search_pipe, name='ridge'),\n", " bp.Option(elastic_pipe, name='elastic'),\n", " bp.Option(lgbm_pipe, name='lgbm')])\n", "\n", "# Pass as before as if a pipeline\n", "results = eval_pipe(compare_pipes)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Applying the Test Set\n", "\n", "So far we have been only running internal 5-fold CV on the training set. What if we say we are done with exploration, and now we want to essentially confirm that our best model we have found through internal CV on the training set generalizes to a set of unseen data. To do this, we re-train the best model tested on the full training set and test it on the testing set. In BPt this is done by just passing cv='test' to evaluate. " ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicting target = anthro_waist_cm\n", "Using problem_type = regression\n", "Using scope = all defining an initial total of 294 features.\n", "Evaluating 10663 total data points.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "59efb8760a3e4133bdf31f90c8102051", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Folds: 0%| | 0/1 [00:00