How to perform back projection of feature weights?

One of the potential issues of employing high dimensional parcellations is they can become difficult to interpret. While it would be great if there were existing high dimensional easy to interpret parcellations, this is not the case in practice. Instead, we propose that feature importances generated from parcellations that are not well suited to being discussed easily, (e.g., randomly generated or parcellations with thousands of regions), be back projected onto their original surface representation. Once represented at the vertex / surface level, it should be possible for researchers to interpret their findings as there exists an extensive literature of results presented and interpreted in the standard space. One could even re-parcellate results into a familiar anatomical atlas if desired.

This example covers the back projection of feature weights to native surface space for a number of different pipeline / parcellation pairs explored in the main project.

We use the already saved dataset, but with one tweak to make out lives a little easier. Instead of using the consolidated data files, we load each seperately so that they are easier to plot and keep track of.

Next, we setup some common varaibles, as well as wrap the evaluate function in a helper function, and put the code used to plot the average inverse transformed feature weights for each modality.

In this example, we will we predicting the regression target variable - 'anthro_waist_cm'. That said, all of the above code is designed to work with binary based targets as well (if running the code yourself, you can try changing the variable).

Base Elastic-Net

The first example we will look at is simply back projecting the weights from an elastic-net based pipeline from one of the randomly generated parcellations. Since this regressor generates beta-weights, it is relatively simple to back project this values according to the values of the parcellation.

The back projected feature importancescan be obtained with special BPt function get_inverse_fis.

Looking a little closer, we see this returns us a list of pandas Series

We can further see that the first fold, first modality has the correct shape / number of values per vertex.

Now, if we are interested in plotting, we can generate average values across all 5 folds to plot as such:

This logic is already wrapped up in the plotting function, let's try it here.

Base - LGBM

As out next example, we are just using an LGBM based pipeline instead of the elastic-net one. In this case we just repeat the same steps, except this time we are plotting the automatically computed LGBM based metrics of feature importance, instead of beta weights.

What about a voting ensemble of Elastic-Net ?

In this case, what goes on behind the scenes is that the coef from each of the base models are averaged. In a more detailed sense, when get inverse fis is called, the coef from each base ensemble are first back projected to the original space, then they are averaged. Even though each base estimator has 100 coef, it would be wrong to average them in that space since each coef refers to a different parcellation, it is therefore neccisary to average only after back projection. This is taken care of internally since the nested estimators of the voting ensemble have Loader objects. Let's try it below:

This method is also dynamic enough to support averaging across parcellations of different sizes / with lgbm feature importances instead of coef_

SVM / Permutation Feature Importance

For the Elastic-Net and LGBM, they both have default already calculated feature importances which we can back project and plot right away. For the SVM based pipelines though, this isn't possible. Instead we need to calculate feature importances in another way, in this example we will use permutation based feature importances.

SVM based voting ensemble

So the way we had the voting ensemble feature importances set up before, we took the average of each once back projected. Now what about when we have an ensemble of SVM classifiers? In this case we need to do the permutation feature importances again, but we want to ensure that the features being permuted are the fully transformed features. Let's get an example going before getting into more details...

So what happens internally when we are going to call permutation_importance is the following internal function is called to setup the proper X_val, and locate the proper sub estimator for each fold. In this case we want just just_model and nested_model to be True.

Looking at X_val, we notice the shape might look a little funny - but we can confirm that it lines up with the feat names.

So what's going on here? First note that we have a random ensemble with two random parcellations each with size 100, and we also have 4 modalities. So that means after being loaded and transformed, each parcellation will yield 400 features, then two sub SVM models with front-end feature selection will be fed in 400 features each. So why do we have 598 features here? We can get a better idea of what these features represent if we look at the feat names

So two things are going on here, the first you'll note is that there are some features missing. That is because these feat names and Xval represent the transformed data after any features that were selected to be removed by the feature selection step were already removed. The other piece you'll note is the 0 appended to the features above, and the '1_' appended to the last feature we printed. That is to say the features and X_val represent the concatenated fully transformed output from each of the two sub-svm models.

Basically, this is exactly what we want, as we need all avaliable features to be already present (i.e., not waiting to be transformed) when we pass them to the function responsible for calculating the permutation based feature importances. Essentially the last step that happens internally, that we don't need to worry about, is the 'predict' function for BPtVotingEstimator is designed to automatically detect this alternative output, we can confirm that here:

Now that we know whats going on behind the scenes, we with BPt can just do the same exact thing as before, and it will take care of the details!

Stacked Ensembles - Elastic-Net

Okay, now what about with a stacking based ensemble? Well, we can do something simmilar to voting, but instead of just taking the mean of the existing feature importances, we want to take a weighted average according to the feature importances of the stacking regressor itself. Though note that in order for this to work we have to make a simplification, that being that we throw away information on magnitude, and instead consider only the absolute average values according to the absolute weights of the stacker.

We can also in the same manner as before consider the generation and plotting of permutation based feature importances, e.g., in the case of using an SVM based classifier instead of elastic-net