Skip to the content.

Target Variables

A collection of 45 target phenotypic variables (23 binary and 22 continuous), used to gauge predictive performance, was sourced from the second ABCD Study release. Variables were sourced directly from the rds file made available by the DAIRC (specifically on a version of the rds file saved as a csv, See: github link and data repository. All collected variables, both target and brain, are from the baseline time point on the study. Best efforts were made to source a list of representative, diverse and predictive variables. Extra pre-processing beyond done by the DEAP team, and the creation of the targets.csv is conducted in the script setup/process_targets.py

All target variables used in the final project are listed below with clickable links to a more detailed description of each measure. See also distribution info for each target.

Continuous Variables Binary Variables
Standing Height (inches) Speaks Non-English Language
Waist Circumference (inches) Thought Problems ASR Syndrome Scale
Measured Weight (lbs) CBCL Aggressive Syndrome Scale
CBCL RuleBreak Syndrome Scale Incubator Days
Parent Age (yrs) Born Premature
Motor Development Has Twin
Birth Weight (lbs) Planned Pregnancy
Age (months) Distress At Birth
Little Man Test Score Mother Pregnancy Problems
MACVS Religion Subscale Any Alcohol During Pregnancy
Neighborhood Safety Any Marijuana During Pregnancy
NeuroCog PCA1 (general ability) KSADS OCD Composite
NeuroCog PCA2 (executive function) KSADS ADHD Composite
NeuroCog PCA3 (learning / memory) KSADS Bipolar Composite
NIH Card Sort Test Mental Health Services
NIH List Sorting Working Memory Test Detentions / Suspensions
NIH Comparison Processing Speed Test Parents Married
NIH Picture Vocabulary Test Prodromal Psychosis Score
NIH Oral Reading Recognition Test Screen Time Week (hrs)
WISC Matrix Reasoning Score Screen Time Weekend (hrs)
Summed Performance Sports Activity Sex at Birth
Summed Team Sports Activity Sleep Disturbance Scale
  Months Breast Fed

Is Predictive

In order to establish if potential target variables were predictive or not, we conducted a front-end test on a subset of data. First a larger list of possible representative variables was sourced from Recalibrating expectations about effect size: A multi-method survey of effect sizes in the ABCD study. A subset of around 2000 participants was then identified as participants with no missing values across all possible target variables. Next, the Destrieux FreeSurfer extracted ROIs were used used as input features within a 5-fold cross validation framework to try and predict out of sample each potential variable. A ridge regression model with nested random choice over 32 values of regularization, along with front-end robust input scaling was used as the predictive ML pipeline (implemented and evaluated with BPt). Regression models with R2 as the metrics of interest were used for continuous variables, ROC AUC for binary variables and matthews correlation coef. for categorical variables (these types were auto-detected by BPt). Within this framework we then established variables as ‘predictive’ only if they had a performance metric > than the null for that metric + the standard deviation across five folds (e.g., for R2 needs an R2 > R2 std, but for ROC AUC needs ROC AUC > .5 + ROC AUC std).

Why Threshold

A number of the binary variables listed above were not originally binary variables, and instead were converted to binary variables through a static threshold. This is often considered a poor statistical practice, so why did we do it in this context? First, thresholding variables in this way while perhaps not best practice, does happen frequently in the literature, we therefore wanted to mimic actually used practices in this sense. Secondly, we wanted to try and ensure that the number of continuous variables and binary variables where roughly equal. Lastly, in a number of cases the continuous version of the variable was not at all predictive, but the binarized version was (likely due to the highly skewed nature of these variables true underlying distribution).

Targets Stats

Continuous

target count mean ± std nan count min max
Standing Height (inches) 9427 55.268 ± 3.325 5 0.000 81.000
Waist Circumference (inches) 9420 26.513 ± 4.737 12 0.000 90.000
Measured Weight (lbs) 9425 82.175 ± 23.242 7 0.000 272.000
CBCL RuleBreak Syndrome Scale 9425 1.156 ± 1.821 7 0.000 18.000
Parent Age (yrs) 9364 40.083 ± 6.717 68 23.000 76.000
Motor Development 9340 2.366 ± 0.798 92 0.000 4.000
Birth Weight (lbs) 9032 6.569 ± 1.477 400 2.000 14.000
Age (months) 9432 119.173 ± 7.448 0 108.000 131.000
Little Man Test Score 9163 0.593 ± 0.17 269 0.000 1.000
MACVS Religion Subscale 9429 3.322 ± 1.42 3 1.000 5.000
Neighborhood Safety 9426 3.928 ± 0.951 6 1.000 5.000
NeuroCog PCA1 (general ability) 8791 0.043 ± 0.755 641 -3.171 3.038
NeuroCog PCA2 (executive function) 8791 0.015 ± 0.761 641 -3.227 2.608
NeuroCog PCA3 (learning / memory) 8791 0.026 ± 0.699 641 -2.231 2.220
NIH Card Sort Test 9312 92.888 ± 9.27 120 50.000 120.000
NIH List Sorting Working Memory Test 9282 97.146 ± 11.723 150 36.000 136.000
NIH Comparison Processing Speed Test 9297 88.395 ± 14.482 135 30.000 140.000
NIH Picture Vocabulary Test 9315 84.876 ± 7.931 117 36.000 119.000
NIH Oral Reading Recognition Test 9305 91.082 ± 6.715 127 63.000 119.000
WISC Matrix Reasoning Score 9234 18.031 ± 3.774 198 0.000 32.000
Summed Performance Sports Activity 9432 0.988 ± 1.034 0 0.000 4.000
Summed Team Sports Activity 9432 1.172 ± 1.18 0 0.000 7.000

Binary

target count freq nan count
Speaks Non-English Language 9429 1.000 3
'No' 6399 0.679 0
'Yes' 3030 0.321 0
Thought Problems ASR Syndrome Scale 9432 1.000 0
0 7585 0.804 0
1 1847 0.196 0
CBCL Aggressive Syndrome Scale 9432 1.000 0
0 7032 0.746 0
1 2400 0.254 0
Born Premature 9333 1.000 99
'No' 7541 0.808 0
'Yes' 1792 0.192 0
Incubator Days 9432 1.000 0
0 8203 0.870 0
1 1229 0.130 0
Months Breast Feds 9432 1.000 0
0 6497 0.689 0
1 2935 0.311 0
Has Twin 9401 1.000 31
'No' 7507 0.799 0
'Yes' 1894 0.201 0
Planned Pregnancy 9236 1.000 196
'No' 3511 0.380 0
'Yes' 5725 0.620 0
Distress At Birth 9432 1.000 0
0 6853 0.727 0
1 2579 0.273 0
Mother Pregnancy Problems 9432 1.000 0
0 5492 0.582 0
1 3940 0.418 0
Any Alcohol During Pregnancy 9432 1.000 0
0 7470 0.792 0
1 1962 0.208 0
Any Marijuana During Pregnancy 9432 1.000 0
0 9102 0.965 0
1 330 0.035 0
KSADS OCD Composite 9432 1.000 0
0 8457 0.897 0
1 975 0.103 0
KSADS ADHD Composite 9432 1.000 0
0 7486 0.794 0
1 1946 0.206 0
Detentions / Suspensions 9248 1.000 184
'No' 8786 0.950 0
'Yes' 462 0.050 0
Mental Health Services 9376 1.000 56
0.0 7923 0.845 0
1.0 1453 0.155 0
KSADS Bipolar Composite 9432 1.000 0
0 8056 0.854 0
1 1376 0.146 0
Parents Married 9361 1.000 71
'no' 2856 0.305 0
'yes' 6505 0.695 0
Prodromal Psychosis Score 9432 1.000 0
0 7689 0.815 0
1 1743 0.185 0
Screen Time Week 9432 1.000 0
0 8265 0.876 0
1 1167 0.124 0
Screen Time Weekend 9432 1.000 0
0 7430 0.788 0
1 2002 0.212 0
Sex at Birth 9427 1.000 5
'F' 4532 0.481 0
'M' 4895 0.519 0
Sleep Disturbance Scale 9432 1.000 0
0 5295 0.561 0
1 4137 0.439 0