Target Variables
A collection of 45 target phenotypic variables (23 binary and 22 continuous), used to gauge predictive performance, was sourced from the second ABCD Study release. Variables were sourced directly from the rds file made available by the DAIRC (specifically on a version of the rds file saved as a csv, See: github link and data repository. All collected variables, both target and brain, are from the baseline time point on the study. Best efforts were made to source a list of representative, diverse and predictive variables. Extra pre-processing beyond done by the DEAP team, and the creation of the targets.csv is conducted in the script setup/process_targets.py
All target variables used in the final project are listed below with clickable links to a more detailed description of each measure. See also distribution info for each target.
Is Predictive
In order to establish if potential target variables were predictive or not, we conducted a front-end test on a subset of data. First a larger list of possible representative variables was sourced from Recalibrating expectations about effect size: A multi-method survey of effect sizes in the ABCD study. A subset of around 2000 participants was then identified as participants with no missing values across all possible target variables. Next, the Destrieux FreeSurfer extracted ROIs were used used as input features within a 5-fold cross validation framework to try and predict out of sample each potential variable. A ridge regression model with nested random choice over 32 values of regularization, along with front-end robust input scaling was used as the predictive ML pipeline (implemented and evaluated with BPt). Regression models with R2 as the metrics of interest were used for continuous variables, ROC AUC for binary variables and matthews correlation coef. for categorical variables (these types were auto-detected by BPt). Within this framework we then established variables as ‘predictive’ only if they had a performance metric > than the null for that metric + the standard deviation across five folds (e.g., for R2 needs an R2 > R2 std, but for ROC AUC needs ROC AUC > .5 + ROC AUC std).
Why Threshold
A number of the binary variables listed above were not originally binary variables, and instead were converted to binary variables through a static threshold. This is often considered a poor statistical practice, so why did we do it in this context? First, thresholding variables in this way while perhaps not best practice, does happen frequently in the literature, we therefore wanted to mimic actually used practices in this sense. Secondly, we wanted to try and ensure that the number of continuous variables and binary variables where roughly equal. Lastly, in a number of cases the continuous version of the variable was not at all predictive, but the binarized version was (likely due to the highly skewed nature of these variables true underlying distribution).
Targets Stats
Continuous
target | count | mean ± std | nan count | min | max |
---|---|---|---|---|---|
Standing Height (inches) | 9427 | 55.268 ± 3.325 | 5 | 0.000 | 81.000 |
Waist Circumference (inches) | 9420 | 26.513 ± 4.737 | 12 | 0.000 | 90.000 |
Measured Weight (lbs) | 9425 | 82.175 ± 23.242 | 7 | 0.000 | 272.000 |
CBCL RuleBreak Syndrome Scale | 9425 | 1.156 ± 1.821 | 7 | 0.000 | 18.000 |
Parent Age (yrs) | 9364 | 40.083 ± 6.717 | 68 | 23.000 | 76.000 |
Motor Development | 9340 | 2.366 ± 0.798 | 92 | 0.000 | 4.000 |
Birth Weight (lbs) | 9032 | 6.569 ± 1.477 | 400 | 2.000 | 14.000 |
Age (months) | 9432 | 119.173 ± 7.448 | 0 | 108.000 | 131.000 |
Little Man Test Score | 9163 | 0.593 ± 0.17 | 269 | 0.000 | 1.000 |
MACVS Religion Subscale | 9429 | 3.322 ± 1.42 | 3 | 1.000 | 5.000 |
Neighborhood Safety | 9426 | 3.928 ± 0.951 | 6 | 1.000 | 5.000 |
NeuroCog PCA1 (general ability) | 8791 | 0.043 ± 0.755 | 641 | -3.171 | 3.038 |
NeuroCog PCA2 (executive function) | 8791 | 0.015 ± 0.761 | 641 | -3.227 | 2.608 |
NeuroCog PCA3 (learning / memory) | 8791 | 0.026 ± 0.699 | 641 | -2.231 | 2.220 |
NIH Card Sort Test | 9312 | 92.888 ± 9.27 | 120 | 50.000 | 120.000 |
NIH List Sorting Working Memory Test | 9282 | 97.146 ± 11.723 | 150 | 36.000 | 136.000 |
NIH Comparison Processing Speed Test | 9297 | 88.395 ± 14.482 | 135 | 30.000 | 140.000 |
NIH Picture Vocabulary Test | 9315 | 84.876 ± 7.931 | 117 | 36.000 | 119.000 |
NIH Oral Reading Recognition Test | 9305 | 91.082 ± 6.715 | 127 | 63.000 | 119.000 |
WISC Matrix Reasoning Score | 9234 | 18.031 ± 3.774 | 198 | 0.000 | 32.000 |
Summed Performance Sports Activity | 9432 | 0.988 ± 1.034 | 0 | 0.000 | 4.000 |
Summed Team Sports Activity | 9432 | 1.172 ± 1.18 | 0 | 0.000 | 7.000 |
Binary
target | count | freq | nan count |
---|---|---|---|
Speaks Non-English Language | 9429 | 1.000 | 3 |
'No' | 6399 | 0.679 | 0 |
'Yes' | 3030 | 0.321 | 0 |
Thought Problems ASR Syndrome Scale | 9432 | 1.000 | 0 |
0 | 7585 | 0.804 | 0 |
1 | 1847 | 0.196 | 0 |
CBCL Aggressive Syndrome Scale | 9432 | 1.000 | 0 |
0 | 7032 | 0.746 | 0 |
1 | 2400 | 0.254 | 0 |
Born Premature | 9333 | 1.000 | 99 |
'No' | 7541 | 0.808 | 0 |
'Yes' | 1792 | 0.192 | 0 |
Incubator Days | 9432 | 1.000 | 0 |
0 | 8203 | 0.870 | 0 |
1 | 1229 | 0.130 | 0 |
Months Breast Feds | 9432 | 1.000 | 0 |
0 | 6497 | 0.689 | 0 |
1 | 2935 | 0.311 | 0 |
Has Twin | 9401 | 1.000 | 31 |
'No' | 7507 | 0.799 | 0 |
'Yes' | 1894 | 0.201 | 0 |
Planned Pregnancy | 9236 | 1.000 | 196 |
'No' | 3511 | 0.380 | 0 |
'Yes' | 5725 | 0.620 | 0 |
Distress At Birth | 9432 | 1.000 | 0 |
0 | 6853 | 0.727 | 0 |
1 | 2579 | 0.273 | 0 |
Mother Pregnancy Problems | 9432 | 1.000 | 0 |
0 | 5492 | 0.582 | 0 |
1 | 3940 | 0.418 | 0 |
Any Alcohol During Pregnancy | 9432 | 1.000 | 0 |
0 | 7470 | 0.792 | 0 |
1 | 1962 | 0.208 | 0 |
Any Marijuana During Pregnancy | 9432 | 1.000 | 0 |
0 | 9102 | 0.965 | 0 |
1 | 330 | 0.035 | 0 |
KSADS OCD Composite | 9432 | 1.000 | 0 |
0 | 8457 | 0.897 | 0 |
1 | 975 | 0.103 | 0 |
KSADS ADHD Composite | 9432 | 1.000 | 0 |
0 | 7486 | 0.794 | 0 |
1 | 1946 | 0.206 | 0 |
Detentions / Suspensions | 9248 | 1.000 | 184 |
'No' | 8786 | 0.950 | 0 |
'Yes' | 462 | 0.050 | 0 |
Mental Health Services | 9376 | 1.000 | 56 |
0.0 | 7923 | 0.845 | 0 |
1.0 | 1453 | 0.155 | 0 |
KSADS Bipolar Composite | 9432 | 1.000 | 0 |
0 | 8056 | 0.854 | 0 |
1 | 1376 | 0.146 | 0 |
Parents Married | 9361 | 1.000 | 71 |
'no' | 2856 | 0.305 | 0 |
'yes' | 6505 | 0.695 | 0 |
Prodromal Psychosis Score | 9432 | 1.000 | 0 |
0 | 7689 | 0.815 | 0 |
1 | 1743 | 0.185 | 0 |
Screen Time Week | 9432 | 1.000 | 0 |
0 | 8265 | 0.876 | 0 |
1 | 1167 | 0.124 | 0 |
Screen Time Weekend | 9432 | 1.000 | 0 |
0 | 7430 | 0.788 | 0 |
1 | 2002 | 0.212 | 0 |
Sex at Birth | 9427 | 1.000 | 5 |
'F' | 4532 | 0.481 | 0 |
'M' | 4895 | 0.519 | 0 |
Sleep Disturbance Scale | 9432 | 1.000 | 0 |
0 | 5295 | 0.561 | 0 |
1 | 4137 | 0.439 | 0 |