0. Background

Welcome! This document serves 2 purposes:

  1. Provide a walkthough for anyone potentially interested in conducting multiverse analyses (or specification curves), especially when considering Bayesian models or fMRI data.
  2. Serve as documentation for analysis code for Bloom et al., 2021 with code that can be run using simulated data (i.e. fake data), since we cannot share the data publicly. Note: the simulated data & corresponding code walkthrough are specifically for amygdala reactivity analyses for the fear > baseline contrast.

Want to try the code out yourself? You can clone the Github repo, or download the simulated data and the into_the_bayesian_multiverse.Rmd file to run & experiment with the code.

What is a multiverse anyway? What is a specification curve?

A multiverse analysis is an analytical tool that helps provide an understanding of whether study results hinge on decisions made during the analysis process. Briefly, a multiverse analysis means that one identifies a set of analyses methods that are all theoretically justified then conducts all of these analysis “specifications” in parallel. Then, a specification curve can be used for visualization of all of the different analysis specifications at once, as well as statistical inference.

The specification curve analyses here were particularly inspired by work from Dani Cosme and Amy Orben.

See more background on the statistics here and here.

A fun way to think about multiverse analyses…

If you’ve ever seen Spider Man: Into The Spiderverse, first of all – its a great movie. But more specifically, there are many “parallel universes” with different versions of “spider people” in each one. The “spidey-essence” is the same in all universes, but each spider-person (or pig…) is a little different based on the specifics of that universe. So, we can think of the raw data as the “spidey-essence” in a multiverse analysis, and each model specification is a different “parallel universe” of how the results of the analysis could come out when combined with a certain set of analytical decisions.


I’m not an expert, but “the multiverse” is now part of the canon of the Marvel Cinematic Universe

In what situation should I run a multiverse analysis?

Multiverse analyses are particularly useful when there are decisions we have to make during the data preparation and analysis process where we aren’t sure what the “right” answer is. If we can think of different methods for cleaning, preprocessing, excluding, transforming, or modeling the data that are theoretically justifiable (but we don’t have a certain answer for which is ‘best’) multiverse analysis allows us to run all of these theoretically justifiable analyses at the same time. This allows us to look at 2 main questions:

  1. Do the results from differing, theoretically justifiable, versions of the analyses converge on a consistent finding?
  2. How do different choices we make as analysts of the data influence the results? Which decisions are most influential?

These analyses have been particularly useful with neuroimaging data where there are MANY decisions to be made where we don’t have consensus for an “optimal” method. However, multiverse analyses and specification curves equally useful in any topic of research where there are multiple theoretically justifiable ways an analysis can be done.

Specification curve analysis steps

Simonsohn et al (2020) describe 3 steps for specification curve analyses:

  1. define the set of reasonable specifications to estimate
  2. estimate all specifications and report the results in a descriptive specification curve
  3. conduct joint statistical tests using an inferential specification curve.

Because the analyses presented here are considered exploratory, we aren’t going to cover the inferential procedures in step #3. For more info on this step, including applying bootstrapping & permutation testing to spec curves, see this great tutorial from Dani Cosme.

Note: simulated data here only!

Note: these data are fake! Under our Institutional Review Board protocol and for the purpose of creating participant identities private, we cannot share our actual data publicly. However, we’ve simulated an amygdala reactivity dataset for the purposes of creating a multiverse analysis where the code can actually be run! You can find the simulated data here

  • Briefly, to create simulated data that decently approximates the real data without risk of identifying participants, we created a multivariate regression model using brms, then drew samples from the model’s posterior predictive distribution for fear > baseline amygdala reactivity estimates for each study time point for each real participant.
  • We also scrambled the order of participant IDs and added noise to ages.
  • So, the data here should do a reasonable job of mimicking the structure of some of the data analyzed in Bloom et al. (2021) without compromising participant data privacy and security. Some of the relationships among variables may differ somewhat from the real data, however. For more on data synthesis for these purposes, check out the synthpop R package.

1. Read in the data

library(tidyverse)
library(specr)
library(brms)
library(cowplot)

Here’s what is in each column:

  • id - participant ID, identifies a participant across time points
  • wave - the study time point (either 1, 2, or 3)
  • age - participant age at the given time point, in years
  • block - the temporal position of the task run relative to other tasks in the scanner (1 = first, 2 = second, 3 = third)
  • motion - head motion (mean framewise displacement), which has been z-scored here
  • scanner - whether the data were collected on a first MRI scanner (1= time points 1 & 2) or a second (2 = time point 3). Both were Siemens Tim Trio
  • prev_studied- whether this scan was previously analyzed in similar work by Gee et al (2013). 1 indicates a scan was previously studied

All of the rest of the columns are measurements of amygdala reactivity to fear faces > baseline for each scan, labeled such that:

  • columns with the ho prefix are from amygdala ROIs defined by the Harvard-Oxford subcortical atlas, native prefix columns are in native space defined through Freesurfer
  • columns with the right prefix are the right amygdala, and left are the left
  • columns with the beta prefix denote raw beta estimates of amygdala reactivity magnitude, while the tstat prefix denote t-statistic measurements of amygdala reactivity scaled by estimation uncertainty (the standard error)
fake_data = readr::read_csv('simulated_amygdala_reactivity.csv')
## Parsed with column specification:
## cols(
##   id = col_double(),
##   wave = col_double(),
##   age = col_double(),
##   block = col_double(),
##   motion = col_double(),
##   scanner = col_double(),
##   prev_studied = col_double(),
##   ho_right_amyg_beta = col_double(),
##   ho_left_amyg_beta = col_double(),
##   ho_right_amyg_tstat = col_double(),
##   ho_left_amyg_tstat = col_double(),
##   native_right_amyg_beta = col_double(),
##   native_left_amyg_beta = col_double(),
##   native_right_amyg_tstat = col_double(),
##   native_left_amyg_tstat = col_double()
## )
head(fake_data[,1:8])
## # A tibble: 6 x 8
##      id  wave   age block motion scanner prev_studied ho_right_amyg_beta
##   <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>        <dbl>              <dbl>
## 1     1     3 16.1      1  0.293       2           NA            0.785  
## 2     2     1 15.2      3  0.08        1            1            0.375  
## 3     2     2 17.3      1 -1.32        1           NA           -0.168  
## 4     2     3 19.4      1  0.063       2           NA           -0.230  
## 5     3     1 22.7      2 -0.986       1           NA            0.00908
## 6     7     1  5.83     1  1.18        1           NA           -0.817

1A. Accelerated Longitudinal Study Design

Here’s a plot displaying when each (fake) participant came in for visits. The x-axis indicates the age of the participants at each visit, and participants (on the y-axis) are ordered by the age at their first study visit (min_age). Horizontal lines connecting dots indicate the multiple study visits over time for the same participant (although some participants only completed 1 visit)

designPlot = dplyr::mutate(fake_data, wave = as.factor(wave)) %>%
  dplyr::group_by(id) %>%
  dplyr::mutate(min_age = min(age)) %>%
  ggplot(aes(x = age, y = forcats::fct_reorder(as.factor(id), min_age))) +
    geom_line(aes(group = id)) + 
    geom_point(aes(color = wave)) +
    theme_bw() + theme(axis.text.y = element_blank(), panel.grid.major = element_blank()) +
    labs(y = 'Participants', x = 'Age at visit') +
    guides(color = guide_legend(title = 'Study Wave')) +
    theme(text = element_text(face = 'bold'),
          axis.ticks.y=element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank())

designPlot

2. Specification curve: specr version

specr

specr is a great package with a lot of nice functionality for running specification curve analyses. This wasn’t the software we used for the main analyses (because it hadn’t yet been released yet), but it’s probably the most straightforward way to go for regression-based specification curve analyses.

2A. Define all ‘reasonable’ specifications

First, we’ll need to define what different ‘reasonable’ and theoretically justifiable ways there are for doing this analysis of age-related change in amygdala reactivity to fear faces > baseline. For the sake of this project, it isn’t computationally feasible to try ALL reasonable ways of doing this analysis, so we can think of this approach as attempting to sample from the space of possible analyses specifications.

  • Covariates: it might make sense to adjust for factors like scanner and block in our group-level model, or different subsets of these.
  • Amygdala ROI: there are several justifiable ways of defining an ‘amygdala’ brain region, either using a “native space” mask generated through Freesurfer, or by using Harvard-Oxford amygdala estimates in standard space. We could also us right, left, or bilateral amygdala estimates
  • Estimate type: Here we pull both from the beta images and tstat images for 2 different amygdala reactivity estimates. The beta is the contrast parameter estimate and represents the point estimate (beta estimate) of the regression coefficient. Thus, the beta is the magnitude of the estimated relationship between the presence of the task stimuli and the BOLD signal. The tstat represents the estimate scaled by the uncertainty (i.e. by the standard error of the estimate), and so it is a standardized effect size measure of the relationship between the presence of the task stimuli and the BOLD signal
  • Data inclusion: Some of the scans studied here were previously analyzed for age-related change in amygdala reactivity by Gee et al (2013).. So, to get an estimate more independent from the previous work, we might want to exclude these. However, we might have higher statistical power if we include all the available data.

First, a little data wrangling before modeling.

  • We’ll create a variable called grp to index whether scans were previously studied for age-related changes in amygdala reactivity in Gee et al. (2013). The not_studied group here can be considered a more independent sample from the previous study.
  • We’ll also create a vector of the outcomes (different measures of amygdala reactivity)
fake_data = dplyr::mutate(fake_data, grp = ifelse(is.na(prev_studied), 'not_studied', 'prev_studied'))
outcomes = names(fake_data)[grepl('amyg', names(fake_data))]

2B. Specify the custom model

Specr has a nice feature of allowing a custom model function. For more detailed info on this, see here.

Here, we set up a multilevel model using the lme4 package’s lmer() function to take advantage of varying intercepts for each participant (as denoted by the (1|id) syntax in the model formula). We also include a covariate for motion in the custom formula itself (rather than later on in the spec curve), because we want all models to include this covariate for head motion during the scan. We also need to load the lme4 and broom.mixed packages inside the custom model function so that specr can use them to run the model and parse the results.

lmer_model_with_motion <- function(formula, data,...) {
  require(lme4)
  require(broom.mixed)
  # set up the model base formula (basically specr will past all other model info in here)
  formula <- paste(formula, "+ motion + (1|id)")
  lme4::lmer(formula, data)
}

2C. Run all model specifications with run_specs()

How does this run_specs() function work?

  • df=fake_data: specifies the data frame to fit the model to
  • x = 'age': the ‘x’ variable, or predictor of interest in the model. You can have multiple x variables if you want by specifying a vector with multiple values here
  • y = outcomes: here we specify a vector of multiple outcome variables, the 8 different measures of amygdala reactivity
  • controls = c('block', 'scanner'): covariates to include in the model.
  • model = 'lmer_model_with_motion': specifies the custom model. If you don’t need a custom model, there are also defaults, like "lm" to run OLS regression
  • subsets = list(grp = 'not_studied'): subsets of the data to run the model on, expressed as a list. Here, we want to run the model specifically on the not_studied group, in addition to the full dataset.
  • Because we have all.comb=TRUE, this will run specifications with all possible combinations of covariates. Otherwise, it will run models with no covariates, each single covariate, and then all together. Note: random slopes are not included for these covariates by default unless we add this to the custom model

Here, 8 outcomes X 4 possible covariate specifications X 2 subsets = 64 total specifications in our multiverse

specs = specr::run_specs(df = fake_data,
                         x = 'age', y = outcomes, 
                         controls = c('block', 'scanner'),
                         model = 'lmer_model_with_motion',
                         subsets = list(grp = 'not_studied'), 
                         all.comb = TRUE)

2D. Check out results summaries across all models

The output of run_specs() is a data frame with each row representing one specification from the multiverse analysis we have just run. The x, y, model, controls and subsets columns give us information on the setup of each model, then we get a variety of statistical outputs from the fit models. Because we have 64 specifications here, the dataframe is 64 rows.

  • estimate: the beta estimate in the regression model for the given x term of interest (or the ‘slope’ for a continuous predictor). Here, these beta estimates represent estimates for age-related change in amygdala reactivity, such that negative estimates indicate age-related decreases in amygdala reactivity.
  • std.error: The standard error about the beta estimate
  • statistic: The t-statistic for the beta estimate
  • conf.low: The lower bound of the 95% confidence interval for the beta estimate
  • conf.high: The upper bound of the 95% confidence interval for the beta estimate
  • all of the fit_ columns provide information on the goodness-of-fit of the model more generally (i.e. log likelihood, AIC, BIC), as well as the residual degrees of freedom.
specs
## # A tibble: 64 x 18
##    x     y     model controls effect group estimate std.error statistic conf.low
##    <chr> <chr> <chr> <chr>    <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>
##  1 age   ho_r… lmer… block    fixed  <NA>   -0.0416    0.0145     -2.87  -0.0700
##  2 age   ho_l… lmer… block    fixed  <NA>   -0.0390    0.0146     -2.67  -0.0677
##  3 age   ho_r… lmer… block    fixed  <NA>   -0.0423    0.0142     -2.97  -0.0702
##  4 age   ho_l… lmer… block    fixed  <NA>   -0.0298    0.0135     -2.21  -0.0562
##  5 age   nati… lmer… block    fixed  <NA>   -0.0313    0.0159     -1.97  -0.0624
##  6 age   nati… lmer… block    fixed  <NA>   -0.0303    0.0156     -1.94  -0.0609
##  7 age   nati… lmer… block    fixed  <NA>   -0.0318    0.0157     -2.03  -0.0624
##  8 age   nati… lmer… block    fixed  <NA>   -0.0269    0.0142     -1.89  -0.0548
##  9 age   ho_r… lmer… scanner  fixed  <NA>   -0.0405    0.0146     -2.78  -0.0690
## 10 age   ho_l… lmer… scanner  fixed  <NA>   -0.0386    0.0147     -2.63  -0.0674
## # … with 54 more rows, and 8 more variables: conf.high <dbl>, fit_sigma <dbl>,
## #   fit_logLik <dbl>, fit_AIC <dbl>, fit_BIC <dbl>, fit_REMLcrit <dbl>,
## #   fit_df.residual <int>, subsets <chr>

2E. Plot the specification curve

specr::plot_specs(specs)

This type of plot can be overwhelming at first because it shows us a lot of information, but it is really helpful for looking at several different things!

  • Panel A shows us the regression beta estimates for our x predictor (plus 95% confidence intervals) for all of our specifications. Specifications are ordered from least to greatest, and colored such that red indicates a significant negative association, gray indicates a non-significant association, and blue (not shown on this plot) indicates a significant positive association. So, here, the majority of the specifications indicate a significant negative estimate (under \(\alpha=0.05\), such that the 95% confidence interval excludes 0), suggesting a negative relationship here between age and amygdala reactivity. Here, we can see that the point estimates of all the models are negative, indicating that all the models agreed on the direction of age-related change, if not the significance.
  • Panel B shows us the details for each specification, keeping the coloring schema from Panel A based on estimate direction/significance. So, looking vertically across the both panels tells us specifics about a single model specification. On any given line, a dash (|) indicates that the choice marked on the left was made, where a blank space it wasn’t made.
    • The top section age indicates that that the x variable (the term being examined) in all models was age.
    • In the y section below that, dashes on a given line indicate what y variable for amygdala reactivity was used in a given specification (note that different choices for y are mutually exclusive).
    • The model section indicates that the lmer_model_with_motion was the model used in all specifications here.
    • The controls section shows which covariates were used in each specification, although we should remember here that ALL models had a covariate for head motion built into them (so no covariates really means no covariates other than head motion)
    • The bottom section subsets indicates which subset of the data the model was run on, either all, or just the not_studied group.
  • So, what is Panel B telling us practically? We can see how estimates term of interest differ as a function of different model specification decision points! In general, if specifications tend to show up towards the left side of the graph, this means that the specifications resulted in more negative beta estimates (more negative age-related change in amygdala reactivity here), while specifications that tend to show up towards the right side of the graph tended to result in more positive beta estimates. Most specifically, we can see that here in the simulated data:
    • Specifications with ho_right_amyg_tstat or ho_right_amyg_beta as the y variable tended to result in the most strongly negative estimates compared to the others (they show up towards the left side)
    • Specifications with native_left_amyg_tstat as the y variable tended to result in the most strongly positive estimates compared to the others (they show up towards the right side)

2F. View the multiverse decision tree

specr also gives us a nice plot_decisiontree() function for summarizing what our specifications are, and how many of them there are

specr::plot_decisiontree(specs, legend = TRUE)

3. A bayesian specification curve with specr and brms

For many analyses, we might want to use Bayesian estimation for models in our specification curve, particularly if we are getting convergence issues or singular fits with lmer. In this cases, lmer will refuse to let us fit a fully specified longitudinal model with varying terms (slopes) for age for each participant because of the high number of model parameters relative to data points. However, the brms R package can fit these models with nearly identical syntax to lmer using Stan for fully Bayesian estimation under the hood (note: the high number of parameters relative to the number of observations is definitely still a problem, but with Bayesian estimation we can at least fit the model and quantify uncertainty for each parameter).

While they have many advantages, the main drawback to Bayesian models it that they take longer to fit and require more RAM than do frequentist models.

  • When fitting these yourself, you may want to plan to run 1 model first to get an approximate timing estimate before you run a Bayesian specification curve.
  • If available, servers or computing clusters (especially if you can parallelize across specifications) can vastly speed up Bayesian specification curve analyses.

3A. Define a custom brms model with participant-varying age terms

The brm() function from the brms package fits models using Bayesian estimation using mostly the same syntax as lme4 models. Here, we set up a custom model function, making sure to include (age|id) to allow age terms to vary for each participant. Also, inside the call to fit the model, we’ll add cores=4 to parallelize running of each model across 4 cores. Most laptops have 4-8 cores, so this should work on most computers, and should speed up each model by a factor of about 4.

brms_model_with_motion <- function(formula, data,...) {
  require(brms)
  require(broom.mixed)
  # set up the model base formula (basically specr will paste all other model info in here)
  formula <- paste(formula, "+ motion + (age|id)")
  brms::brm(formula, data, cores = 4)
}

3B. Run all Bayesian model specifications with run_specs()

We’ll run the equivalent specifications to the previous curve, only now with model = 'brms_model_with_motion' for our custom brms model

brms_specs = specr::run_specs(df = fake_data,
                         x = 'age', y = outcomes, 
                         controls = c('block', 'scanner'),
                         model = 'brms_model_with_motion',
                         subsets = list(grp = 'not_studied'), 
                         all.comb = TRUE)

3C. Check out Bayesian model summaries

The summary here is quite similar to when we used the lme4 model, only now the estimate is the median estimate for the x term (age) across posterior samples, and conf.low and conf.high represent the bounds of the 95% posterior interval.

brms_specs
## # A tibble: 64 x 15
##    x     y     model controls effect component group estimate std.error conf.low
##    <chr> <chr> <chr> <chr>    <chr>  <chr>     <chr>    <dbl>     <dbl>    <dbl>
##  1 age   ho_r… brms… block    fixed  cond      <NA>   -0.0386    0.0156  -0.0681
##  2 age   ho_l… brms… block    fixed  cond      <NA>   -0.0383    0.0154  -0.0677
##  3 age   ho_r… brms… block    fixed  cond      <NA>   -0.0405    0.0152  -0.0704
##  4 age   ho_l… brms… block    fixed  cond      <NA>   -0.0287    0.0147  -0.0568
##  5 age   nati… brms… block    fixed  cond      <NA>   -0.0264    0.0175  -0.0601
##  6 age   nati… brms… block    fixed  cond      <NA>   -0.0284    0.0165  -0.0601
##  7 age   nati… brms… block    fixed  cond      <NA>   -0.0294    0.0171  -0.0622
##  8 age   nati… brms… block    fixed  cond      <NA>   -0.0261    0.0153  -0.0563
##  9 age   ho_r… brms… scanner  fixed  cond      <NA>   -0.0385    0.0156  -0.0696
## 10 age   ho_l… brms… scanner  fixed  cond      <NA>   -0.0374    0.0157  -0.0677
## # … with 54 more rows, and 5 more variables: conf.high <dbl>,
## #   fit_algorithm <chr>, fit_pss <dbl>, fit_nobs <int>, subsets <chr>

3D. Plot Bayesian specification curve

With the Bayesian models including participant-varying age terms, results are largely similar to before. The main difference is that a slightly larger proportion of the 95% posterior intervals overlap with 0 than with the previous models without participant-varying age terms.

specr::plot_specs(brms_specs)

3E. Plot specification curve with brms and lme4 models together

We can also bind the two output data frames with all the specifications we’ve already fit together, then plot a curve across all of them to more directly compare the models. Now, in the models section of Panel B in the plot, we can tell whether the model used for each specification was the lme4 one (lmer_model_with_motion) or the brms one (brms_model_with_motion)

# columns are slightly different depending on the model type, so use rbind.fill() to deal with this
all_specs = plyr::rbind.fill(specs, brms_specs)
specr::plot_specs(all_specs)

4. Bayesian specification curves using custom code

specr is a super useful tool, but when we were first running specification curve analyses, the package hadn’t been released yet. For the analyses in the manuscript, we used mostly functions from the tidyverse, especially the purrr package for dealing with nested data, for estimation of many Bayesian models within multiverse analyses (in fact, specr is built using purrr). For more background on extreme tidyverse usage and purrr, check out this walkthrough from Monica Thieu.

While the methods we used required writing more lines of code, one very helpful thing that specr doesn’t do automatically is do is to save the fitted model object for each specification. Storing the fitted model objects will allow us to do a few useful things:

  1. Extract fitted model predictions (i.e. ‘predicting the regression line’) for each model within a specification curve. In this case, we might want to be able to plot each model’s predictions for amygdala reactivity as a function of age.
  2. Make specification curves for different parameters within the models, without having to rerun them all. For example, here we’ll look at the regression intercept term in a specification curve
  3. Evaluate models more generally, including using posterior predictive checks](http://paul-buerkner.github.io/brms/reference/pp_check.brmsfit.html), or getting cross-validated model performance metrics for individual models

4A. Convert the data to long format

First, we’ll create a grand mean-centered version of our age variable called age_center. Now, a 0 for age_center represents the grand mean age in the dataset, or an age of 12.251. We’ll use this centered age variable for modeling.

Then, we use tidyr::pivot_longer() to reshape the data into long form, such that all amygdala reactivity estimates are in 1 column called amygdala_reactivity, with a variable called amygdala_measure indicating which amygdala reactivity measure it is.

fake_data_long = fake_data %>%
  dplyr::mutate(age_center = age - mean(age)) %>%
  tidyr::pivot_longer(contains('amyg'), names_to = 'amygdala_measure', values_to = 'amygdala_reactivity') 

head(fake_data_long)
## # A tibble: 6 x 11
##      id  wave   age block motion scanner prev_studied grp         age_center
##   <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>        <dbl> <chr>            <dbl>
## 1     1     3  16.1     1  0.293       2           NA not_studied       3.87
## 2     1     3  16.1     1  0.293       2           NA not_studied       3.87
## 3     1     3  16.1     1  0.293       2           NA not_studied       3.87
## 4     1     3  16.1     1  0.293       2           NA not_studied       3.87
## 5     1     3  16.1     1  0.293       2           NA not_studied       3.87
## 6     1     3  16.1     1  0.293       2           NA not_studied       3.87
## # … with 2 more variables: amygdala_measure <chr>, amygdala_reactivity <dbl>

4B. Nest the data within each amygdala measure

In this step, we first group the data by amygdala_measure, then use tidyr::nest() to collapse the data within each amygdala measure into a single cell containing a tibble object(basically a fancy dataframe). Now, our dataset has only 8 rows, with the data column containing an entire dataset for each amygdala measure in each cell. We also add an index variable to keep track of the data shape here.

fake_data_nested = fake_data_long %>%
  dplyr::group_by(amygdala_measure) %>%
  tidyr::nest() %>%
  dplyr::ungroup() %>%
  dplyr::mutate(., index = 1:nrow(.))

fake_data_nested
## # A tibble: 8 x 3
##   amygdala_measure        data                index
##   <chr>                   <list>              <int>
## 1 ho_right_amyg_beta      <tibble [178 × 10]>     1
## 2 ho_left_amyg_beta       <tibble [178 × 10]>     2
## 3 ho_right_amyg_tstat     <tibble [178 × 10]>     3
## 4 ho_left_amyg_tstat      <tibble [178 × 10]>     4
## 5 native_right_amyg_beta  <tibble [178 × 10]>     5
## 6 native_left_amyg_beta   <tibble [178 × 10]>     6
## 7 native_right_amyg_tstat <tibble [178 × 10]>     7
## 8 native_left_amyg_tstat  <tibble [178 × 10]>     8

4C. Run all model specifications!

To keep things a little more concise for the sake of this walkthrough, we’ll pare down to a total of 16 specifications for this analysis: 8 amygdala measures (right vs. left, native space vs. Harvard-Oxford, beta vs. tstat) X 2 model types (with participant-varying slopes + intercepts vs. only participant-varying intercepts)

Now, we’ll use the purrr:map() function inside dplyr::mutate(). This is a lot at once, but it allows us to make new columns with brms model objects stored in each cell. We use data as the first argument to the purrr:map() function to tell brms::brm() to fit the model to the nested dataset stored in each cell of the `data column.

Since we are making 2 new columns with dplyr::mutate(), one for each of the two model formulations (model_varying_slopes, and model_varying_intercepts), we’ll use tidyr::pivot_longer() again to put all models objects into one model_object column, with the model_type column now indicating whether there are varying slopes or just intercepts. Note: this step takes about 15 min (on a 2019 Macbook Pro) because we are running 16 Bayesian models in a row!

nested_model_frame = fake_data_nested %>%
    dplyr::group_by(amygdala_measure) %>%
    dplyr::mutate(., model_varying_slopes = purrr::map(data, ~brms::brm(amygdala_reactivity ~ age_center + motion + (age|id), 
                                                  data = ., cores = 4, chains = 4)),
                    model_varying_intercepts = purrr::map(data, ~brms::brm(amygdala_reactivity ~ age_center + motion + (1|id), 
                                                  data = ., cores = 4, chains = 4))) %>%
    tidyr::pivot_longer(contains('model'), names_to = 'model_type', values_to = 'model_object')

4D. Pull coefficients for each model using broom.mixed::tidy()

Now that we’ve run lots of models, we can use broom.mixed::tidy() to pull out all the coefficients for each one. This function is just like the broom::tidy() function, except that it allows us to work with multilevel models (lme4, brms, or rstanarm, for example).

So, well make a new dataframe called purrr_specs where we pull the coefficients for each model (again using purrr::map() for storing the output in individual cells). Finally, we’ll use tidyr::unnest() to ‘flatten’ the coefficient output for each model back into regular dataframe columns. In other words, we’re taking all of the coefficient information, and making 1 column for each piece of information.

purrr_specs = nested_model_frame %>%
  mutate(., coefs = purrr::map(model_object, ~broom.mixed::tidy(.))) %>%
  dplyr::select(., -data, -model_object) %>%
  tidyr::unnest(coefs)

The output looks very similar to what we previously were working with from specr EXCEPT that we have multiple rows for each specification here. In fact, we have 1 row for each of the coefficients in each model pulled by broom.mixed::tidy() (we are pulling the Intercept, age_center, motion fixed effects terms from the model, parameter estimates for some of the variances/covariances of the participant-varying terms).

purrr_specs
## # A tibble: 96 x 11
## # Groups:   amygdala_measure [8]
##    amygdala_measure  index model_type    effect component group term    estimate
##    <chr>             <int> <chr>         <chr>  <chr>     <chr> <chr>      <dbl>
##  1 ho_right_amyg_be…     1 model_varyin… fixed  cond      <NA>  (Inter…   0.597 
##  2 ho_right_amyg_be…     1 model_varyin… fixed  cond      <NA>  age_ce…  -0.0412
##  3 ho_right_amyg_be…     1 model_varyin… fixed  cond      <NA>  motion   -0.298 
##  4 ho_right_amyg_be…     1 model_varyin… ran_p… cond      id    sd__(I…   0.168 
##  5 ho_right_amyg_be…     1 model_varyin… ran_p… cond      id    sd__age   0.0127
##  6 ho_right_amyg_be…     1 model_varyin… ran_p… cond      id    cor__(…  -0.330 
##  7 ho_right_amyg_be…     1 model_varyin… ran_p… cond      Resi… sd__Ob…   0.741 
##  8 ho_right_amyg_be…     1 model_varyin… fixed  cond      <NA>  (Inter…   0.593 
##  9 ho_right_amyg_be…     1 model_varyin… fixed  cond      <NA>  age_ce…  -0.0419
## 10 ho_right_amyg_be…     1 model_varyin… fixed  cond      <NA>  motion   -0.299 
## # … with 86 more rows, and 3 more variables: std.error <dbl>, conf.low <dbl>,
## #   conf.high <dbl>

If we want to make this look almost like the specr output, we can filter for the model term we’re looking for. In this case, age_center is the parameter of primary interest for looking at age-related change in amygdala reactivity for the fear > baseline contrast.

purrr_specs_age = dplyr::filter(purrr_specs, term == 'age_center')
purrr_specs_age
## # A tibble: 16 x 11
## # Groups:   amygdala_measure [8]
##    amygdala_measure   index model_type    effect component group term   estimate
##    <chr>              <int> <chr>         <chr>  <chr>     <chr> <chr>     <dbl>
##  1 ho_right_amyg_beta     1 model_varyin… fixed  cond      <NA>  age_c…  -0.0412
##  2 ho_right_amyg_beta     1 model_varyin… fixed  cond      <NA>  age_c…  -0.0419
##  3 ho_left_amyg_beta      2 model_varyin… fixed  cond      <NA>  age_c…  -0.0417
##  4 ho_left_amyg_beta      2 model_varyin… fixed  cond      <NA>  age_c…  -0.0423
##  5 ho_right_amyg_tst…     3 model_varyin… fixed  cond      <NA>  age_c…  -0.0422
##  6 ho_right_amyg_tst…     3 model_varyin… fixed  cond      <NA>  age_c…  -0.0431
##  7 ho_left_amyg_tstat     4 model_varyin… fixed  cond      <NA>  age_c…  -0.0323
##  8 ho_left_amyg_tstat     4 model_varyin… fixed  cond      <NA>  age_c…  -0.0321
##  9 native_right_amyg…     5 model_varyin… fixed  cond      <NA>  age_c…  -0.0297
## 10 native_right_amyg…     5 model_varyin… fixed  cond      <NA>  age_c…  -0.0311
## 11 native_left_amyg_…     6 model_varyin… fixed  cond      <NA>  age_c…  -0.0301
## 12 native_left_amyg_…     6 model_varyin… fixed  cond      <NA>  age_c…  -0.0311
## 13 native_right_amyg…     7 model_varyin… fixed  cond      <NA>  age_c…  -0.0304
## 14 native_right_amyg…     7 model_varyin… fixed  cond      <NA>  age_c…  -0.0319
## 15 native_left_amyg_…     8 model_varyin… fixed  cond      <NA>  age_c…  -0.0260
## 16 native_left_amyg_…     8 model_varyin… fixed  cond      <NA>  age_c…  -0.0275
## # … with 3 more variables: std.error <dbl>, conf.low <dbl>, conf.high <dbl>

4E. Custom SCA Plot Function

Now, we want to make our descriptive specification curve plot! We’ve created a function for this study that is very heavily inspired by Dani Cosme’s spec curve R code.

It’s a chunky function with a lot of code, but overall what it does is this:

  • Make sca_decision_frame, a dataframe with 1 column for each design point coded as a binary decision, such that there is a | to indicate a ‘yes’ and NA to indicate a ‘no’ (this is important for the bottom panel). This dataframe also creates a rank variable to rank specifications by their point estimates, and an overlap_0 variable to indicate the sign of the point estimate, plus whether the 95% posterior interval overlaps 0
  • Convert sca_decision_frame to long format, then group decision points into categories for visualization
  • For the top panel, plot sca_decision_frame using ggplot2 with the rank for each specification on the x-axis and each specification’s point estimate + 95% posterior interval on the y axis. We also color each specification according to the sign of the point estimate and whether the 95% posterior interval overlaps 0. Also, add a horizontal dashed line and an additional point/error bar in black to indicate the median estimate across specifications.
  • For the bottom panel, plot sca_decision_frame_long using ggplot2 as well, with specification rank again on the x-axis. This time, the y axis is categorical, with each level representing a different decision point. Then, we use geom_text(label=choice), where choice is the variable coded as either | or NA for each binarized decision point. This will give us a little vertical tick on the plot for each row every time the choice was ‘yes’, and a blank space every time the choice was ‘no’. So, we’ll get 1 row each on the y-axis for each binarized choice, then vertical ticks marking if that choice was made or not. For each choice, we’ll also plot a black point + horizontal interval representing the median rank and 25%-75% quantiles of the rank when the choice was ‘yes’.
  • Lastly, we use cowplot::plot_grid() to vertically align the two panels and label them. Then, we return the data frames & plot objects in a named list.

Note: this is a slightly shortened version of the function intended specifically for amygdala reactivity analyses in the current study. It will have to be somewhat modified to work more generally in other contexts.

make_sca_plot = function(specs, fork_list, plot_title, y_label, term_choice){
  # compile sca decision frame
  sca_decision_frame = specs %>%
    ungroup %>% 
    dplyr::filter(term == term_choice) %>%
    dplyr::arrange(estimate) %>%
    mutate(., rank = 1:nrow(.),
           tstat = ifelse(grepl('tstat', amygdala_measure), '|', NA),
           random_slopes = ifelse(grepl('intercepts', model_type), '|', NA),
           amyg_right = ifelse(grepl('right', tolower(amygdala_measure)), '|', NA),
           amyg_left = ifelse(grepl('left', tolower(amygdala_measure)), '|', NA),
           native_space = ifelse(grepl('native', amygdala_measure), '|', NA),
           overlap_0 = case_when(
             conf.low < 0 & conf.high < 0 ~ 'neg_y',
             conf.low < 0 & estimate < 0 & conf.high > 0 ~ 'neg_n',
             conf.low < 0 & estimate > 0 & conf.high > 0 ~ 'pos_n',
             conf.low  > 0 & conf.high > 0 ~ 'pos_y',
           ))

  # median model
  median_model_frame = sca_decision_frame %>%
    summarise(estimate = median(estimate), 
              conf.low = median(conf.low), conf.high = median(conf.high), 
              rank= median(rank))
  
  # order factor levels for overlap_0 variable
  sca_decision_frame$overlap_0 = factor(sca_decision_frame$overlap_0, 
                                        levels = c("neg_y", "neg_n", "pos_n", "pos_y"))
  
  # convert to long, assign decition types
  sca_decision_frame_long = sca_decision_frame %>%
    tidyr::pivot_longer(names_to = 'fork', values_to = 'choice', all_of(fork_list)) %>%
    mutate(decisiontype = case_when(
      grepl('amyg', fork) ~ 'Amygdala\n Roi',
      fork == 'native_space' ~ 'Amygdala\n Roi',
      fork %in% c('random_slopes', 'model_type') ~ 'Group-Level\nModel',
      fork %in% c('tstat') ~ 'Subject-Level\nModel'
    ))
  
  # get average rank of each amygdala_measure by beta estimate
  sca_decision_frame_long_ranks = sca_decision_frame_long %>%
    dplyr::filter(choice == '|') %>%
    dplyr::group_by(fork) %>%
    summarise(mean_rank = -1*mean(rank))
  
  # join ranks with decision frame
  sca_decision_frame_long = left_join(sca_decision_frame_long, sca_decision_frame_long_ranks)
  
  # rename variables to be human-interpretable
  sca_decision_frame_long$fork = 
    dplyr::recode(sca_decision_frame_long$fork, 
                  'tstat' = 'use tstats (vs. beta estimates)',
                  'native_space' = 'native space (vs. mni space)',
                  'model_type' = 'random intercepts only (vs. random slopes)',
                  'amyg_right' = 'right amygdala',
                  'amyg_left' =  'left amygdala')
  
  # reorder forks by mean rank
  sca_decision_frame_long$fork_ordered = reorder(sca_decision_frame_long$fork,  
                                                 sca_decision_frame_long$mean_rank)
  
  # color palette to code the following:
  # blue = negative, distinct from 0
  # red = negative, not distinct from 0
  # green = positive, not distinct from 0
  # purple = positive, distinct from 0
  if('neg_y' %in% sca_decision_frame$overlap_0){
    my_colors <- RColorBrewer::brewer.pal(4, "Set1")[1:4]
  }else if('neg_n' %in% sca_decision_frame$overlap_0){
    my_colors <- RColorBrewer::brewer.pal(4, "Set1")[2:4]
  }else if('pos_n' %in% sca_decision_frame$overlap_0){
    my_colors <- RColorBrewer::brewer.pal(4, "Set1")[3:4]
  }else{
    my_colors <- RColorBrewer::brewer.pal(4, "Set1")[4:4]
  }
  
  # recode overlap 0 markings for informative legend
  sca_decision_frame$overlap_0 = dplyr::recode(sca_decision_frame$overlap_0,
                                               'neg_y' = '-, 95% PI excluding 0',
                                               'neg_n' = '-, 95% PI including 0',
                                               'pos_n' = '+, 95% PI including 0',
                                               'pos_y' = '+, 95% PI excluding 0')
  
  
  # summary for lower plot (median + IQR)
  decision_summary = sca_decision_frame_long %>% 
    group_by(decisiontype, choice, fork, fork_ordered) %>%
    summarise(n = n(), median_rank = median(rank), 
              lwr_rank = quantile(rank, .25), 
              upr_rank = quantile(rank, .75)) %>%
    dplyr::filter(choice =='|')
  
  # top panel
  sca_top = 
    ggplot(sca_decision_frame, aes(x = rank, y = estimate, color = overlap_0)) +
    geom_hline(yintercept = 0, lty = 1, color = 'black') +
    geom_hline(yintercept = median(sca_decision_frame$estimate), color = 'black', lty = 2) +
    geom_point(alpha = .5) + 
    geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0, lwd = .15, alpha = .5) +
    geom_point(data = median_model_frame,aes(x = rank, y = estimate), 
               color = 'black') +
    geom_errorbar(data = median_model_frame, 
                  aes(x = rank, y = estimate, ymin = conf.low, ymax = conf.high), 
                  color = 'black', width = 0) +
    labs(x = '', y = y_label, title = plot_title) +
    theme_classic() +
    theme(legend.position = 'top', legend.title = element_blank()) +
    scale_color_manual(values = my_colors)
  
  # lower panel
  sca_bottom = 
    ggplot(sca_decision_frame_long, aes(x = rank, y = fork_ordered, color = overlap_0)) +
    geom_text(aes(label = choice), alpha = .4) +
    facet_grid(rows = vars(decisiontype), drop = TRUE, 
               scales = 'free_y', space = 'free_y') +
    geom_point(data = decision_summary, aes(x = median_rank, y = fork_ordered), 
               color = 'black') +
    geom_errorbarh(data = decision_summary, 
                   aes(x = median_rank, y = fork_ordered, xmin = lwr_rank, xmax = upr_rank), 
                   color = 'black', width = 0) +
    labs(x = "Analysis specifications ranked by beta estimates", y = "Decision Points") + 
    theme_bw() + 
    theme(legend.title = element_text(size = 10),
          legend.text = element_text(size = 8),
          axis.text = element_text(color = "black", size = 8),
          legend.position = "none",
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          panel.background = element_blank(),
          strip.text.y = element_text(size = 8)) +
    scale_color_manual(values = my_colors) 
    
  
  # put together using cowplot
  sca_panel = cowplot::plot_grid(sca_top, sca_bottom, ncol = 1, align = "v", axis = 'lr', 
                                 labels = c('A', 'B'))
  
  # return a list including formatted data, as well as plots
  return(list('sca_decision_frame' = sca_decision_frame, 
              'sca_decision_frame_long' = sca_decision_frame_long,
              'sca_top' = sca_top,
              'sca_bottom' = sca_bottom,
              'sca_panel' = sca_panel))
}

4F. Plot custom brms spec curve

First, we run make_sca_plot() to generate the plot object

age_center_spec_curve = make_sca_plot(specs = purrr_specs, 
                  fork_list = c('tstat', 'random_slopes', 'amyg_right', 'amyg_left', 'native_space'), 
                  y_label = 'Age-related change in amyg reactivity\nFear > Neutral Faces', 
                  term_choice = 'age_center', 
                  plot_title = 'Custom plot for simulated data spec curve')

Then display the sca_panel. Although this is a slightly different set of specifications to the previous curves, we can see that the results are pretty similar – most specifications show negative age-related change estimates.

age_center_spec_curve$sca_panel

4G. A spec curve for the model intercept

Here, we can also plot a specification curve for the regression intercept term. In this case, the intercept represents the average estimated amygdala reactivity to fear faces > baseline for a participant of age_center=0 (age ~12.3 years) and motion=0 (average head motion amounts). We can do this by setting term_choice = '(Intercept)' and re-running make_sca_plot()

intercept_spec = make_sca_plot(specs = purrr_specs, 
                    fork_list = c('tstat', 'random_slopes', 'amyg_right', 'amyg_left', 'native_space'), 
                    y_label = 'Estimated Amygdala Reactivity at Age 12.3\nFear > Neutral Faces', 
                    term_choice = '(Intercept)', 
                    plot_title = 'Custom plot for the spec curve for the model *intercepts*')

This plot shows that all specifications estimate positive average amygdala reactivity a participant at age 12.3 and mean head motion levels, with 95% posterior intervals excluding 0.

intercept_spec$sca_panel

5. Get fitted model predictions from the custom brms multiverse

Because we’ve saved the fitted model objects into a nested data frame for each specification in our custom multiverse, we can also extract fitted predictions for each model. Here, we’ll do this to visualize each model’s predictions for average amygdala reactivity (for fear faces > baselines) as a function of age.

First, we’ll define a grid of predictor values for which to make predictions. Here, we’ll set motion=0 (average head motion across the dataset), and age_center=-8:10 to be able to generate a prediction for a range of ages

pred_grid = expand.grid(age_center = -8:10, motion = 0) %>%
  dplyr::mutate(age = age_center + mean(fake_data$age))

head(pred_grid)
##   age_center motion      age
## 1         -8      0 4.251169
## 2         -7      0 5.251169
## 3         -6      0 6.251169
## 4         -5      0 7.251169
## 5         -4      0 8.251169
## 6         -3      0 9.251169

Now, we can use purrr::map() again with stats::fitted() to generate fitted values (estimates of the linear predictor) for each specification.

  • We’ll use the arguments newdata = pred_grid to specify which values of the predictors to generate predictions for, and re_formula = NA to indicate not to worry about participant-varying random effects here
  • Just like when we pulled the model coefficients with broom.mixed::tidy(), the predictions for each specification are nested into cells of the model_preds column. We use tidyr::unnest(model_preds) to flatten them back out and get a data frame (in long form) of predictions for each model
purrr_specs_preds = nested_model_frame %>%
  dplyr::mutate(model_preds = purrr::map(model_object, ~stats::fitted(., newdata = pred_grid, re_formula = NA) %>%
               cbind(pred_grid, .))) %>%
  dplyr::select(-data, -model_object) %>%
  tidyr::unnest(model_preds)

For each specification, and for each level of age_center, we get an Estimate (the median posterior estimate), as well as the upper/lower bounds of the 95% posterior interval.

head(purrr_specs_preds)
## # A tibble: 6 x 10
## # Groups:   amygdala_measure [1]
##   amygdala_measure  index model_type  age_center motion   age Estimate Est.Error
##   <chr>             <int> <chr>            <int>  <dbl> <dbl>    <dbl>     <dbl>
## 1 ho_right_amyg_be…     1 model_vary…         -8      0  4.25    0.926    0.123 
## 2 ho_right_amyg_be…     1 model_vary…         -7      0  5.25    0.885    0.111 
## 3 ho_right_amyg_be…     1 model_vary…         -6      0  6.25    0.844    0.0987
## 4 ho_right_amyg_be…     1 model_vary…         -5      0  7.25    0.803    0.0875
## 5 ho_right_amyg_be…     1 model_vary…         -4      0  8.25    0.762    0.0774
## 6 ho_right_amyg_be…     1 model_vary…         -3      0  9.25    0.720    0.0688
## # … with 2 more variables: Q2.5 <dbl>, Q97.5 <dbl>

5A. Spaghetti plot of model predictions across specifications

Now that we’ve extracted the fitted predictions for each model, we can plot the group-average ‘regression line’ for each one.

  • Here we’ll color the specifications by whether they included participant-varying slopes, or just intercepts.
  • Here, each line represents the model predictions for 1 specification.
  • This can be a good way to summarize fitted model predictions across a multiverse, even if we can’t see the uncertainty about the regression line for any given specification.
ggplot(data = purrr_specs_preds, aes(x = age, y = Estimate, 
                                     group = interaction(amygdala_measure, model_type), 
                                     color = model_type)) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_line() +
  theme_bw() +
  labs(y = 'Estimated Amygdala Reactivity\nFear > Baseline')

5B. Panel plot of fitted predictions

We can also plot each specification here as a separate panel, and include the 95% posterior interval for the fitted model predictions at each level of age. This is just about possible with 16 specifications, but many more than that and we probably won’t have room on the screen to view them all.

ggplot(data = purrr_specs_preds, aes(x = age, y = Estimate)) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_line() +
  geom_ribbon(aes(ymin = `Q2.5`, ymax = `Q97.5`), alpha = .3) + 
  theme_bw() +
  labs(y = 'Estimated Amygdala Reactivity\nFear > Baseline', 
       title = 'Panel plot with fitted model predictions for each specification') +
  facet_grid(rows = vars(model_type), cols = vars(amygdala_measure)) +
  theme(strip.text = element_text(size = 6))

5C. Plot an individual model’s fitted predictions with raw data

As a check of a model’s fit, it is often helpful to look at the fitted predictions overplotted on top of the raw data points. We can do that by filtering our predictions to just look at one single specification (amygdala_measure == 'ho_right_amyg_beta', model_type == 'model_varying_slopes'), then plotting with ggplot() with age on the x-axis and estimated amygdala-reactivity on the y-axis, along with the raw data used by that specification.

one_spec_preds = dplyr::filter(purrr_specs_preds, amygdala_measure == 'ho_right_amyg_beta', model_type == 'model_varying_slopes')

ggplot(fake_data, aes(x = age, y = ho_right_amyg_beta)) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_point() + 
  geom_line(aes(group = id), alpha = .1) +
  geom_line(data = one_spec_preds, aes(x = age, y = Estimate)) +
  geom_ribbon(data = one_spec_preds, aes(x = age, y = Estimate, ymin = `Q2.5`, ymax = `Q97.5`), alpha = .3) +
  theme_bw() +
  labs(y = 'Estimated Amygdala Reactivity\nFear > Baseline', 
       title = 'Model predictions & raw data for one specification', 
       subtitle = 'Harvard-Oxford right amygdala betas, brms model with varying slopes')

This is a nice check of our model fits! However, the more specifications we have, the more difficult it is to look at so many plots for individual specifications like this. To that end, for this study we have made an interactive multiverse explorer Shiny app for browsing model fits across specifications.

6. Multiverse model checks

6A. Graphical posterior predictive checks

In addition to looking at the coefficients and predictions from our different specifications, we may also want to perform posterior predictive checks for each one.

What is a posterior predictive check? * We draw a “simulated dataset” from the model’s posterior distribution, then compare the overall distribution to the real observed data (see Gabry et al., 2019). * Basically, a posterior predictive check is helpful in answering the question “is the model giving us a distribution of predictions that matches the real data”. If the distributions match, this doesn’t mean the model is good per se (the individual predictions could still be very bad), but if the distributions don’t match, then we know that the model is definitely not doing a good job at describing the data generation process.

To get these graphical summaries of the posterior predictive check, we use the pp_check() function for brms models (I know…). Here, we’ll first combine this with purrr_map() to draw 10 samples from the posterior predictive distribution for all observations in the real dataset.

pp_check_grid = nested_model_frame %>%
  dplyr::mutate(chex = purrr::map(model_object, ~brms::pp_check(., nsamples = 10)))

# Make a new column to label all specifications
nested_model_frame = mutate(nested_model_frame,
                            spec = paste0(amygdala_measure, ', ', model_type))

Now, we can use cowplot::plot_grid() again to make a panel of all 16 graphical posterior predictive checks, one for each model in our specification curve. We can use the labels argument to label each plot for each specification. Here, \(y\) represents the distribution of actual observations, and each \(y_{rep}\) distribution is one set of the posterior draws.

cowplot::plot_grid(plotlist = pp_check_grid$chex, 
                   labels = as.vector(nested_model_frame$spec), 
                   label_size = 4, label_x = 0, hjust = 0)

Again, this is a more descriptive analysis – it allows us to visually compare which specifications result in a model posterior distribution that most closely matches the observed data.

6B. Approximate leave-one-out (LOO) cross-validation

We might often want to to get a specification curve for the relative model ‘goodness-of-fit’, or an approximate of predictive performance, for all specifications. We can use brms::loo() to perform approximate leave-one-out (LOO) cross-validation for each model. So, here, well use purrr::map() again with brms:loo() to get these estimates

  • This function will give us a cross-validated estimate for the expected log pointwise predictive density, or ELPD, as a metric for the “expected predictive performance” for a model. ELPD as estimated by LOO cross-validation is similar in purpose to the Akaike Information Criterion (AIC), although it takes model priors into account and does not make distributional assumptions about the posterior (see Vehtari, Gelman, & Gabry, 2017)
  • This function uses Pareto smoothed importance sampling (Vehtari et al., 2015) for cross-validation here, which means that unless results are very bad, we won’t have to repeatedly refit the Bayesian models (which would take a while)
loo_cv_grid = nested_model_frame %>%
  dplyr::mutate(loo_cv = purrr::map(model_object, ~brms::loo(.)$estimates))

Now, we can unnest our ELPD estimates as well as their standard errors, then plot them for each specification.

# unnest and wrangle elpd estimates into the data frame 
loo_unnest = unnest(loo_cv_grid, loo_cv)
loo_cv_grid$elpd_loo = as.numeric(loo_unnest$loo_cv[,1][row.names(loo_unnest$loo_cv)=='elpd_loo'])
loo_cv_grid$elpd_loo_se = as.numeric(loo_unnest$loo_cv[,2][row.names(loo_unnest$loo_cv)=='elpd_loo'])

The raw value of ELPD by itself doesn’t mean much intrinsically, but since higher (or less negative) ELPD is better, we can use this to compare expected model predictive performance. So, we can see here that model predictive performance is mostly impacted here by the amygdala measure used, although for most measures, the models with varying intercepts have slightly higher (better) ELPD than models with varying slopes.

ggplot(loo_cv_grid, aes(x = fct_reorder(spec, elpd_loo), y = elpd_loo)) +
  geom_point() +
  geom_errorbar(aes(ymin = elpd_loo - elpd_loo_se, ymax = elpd_loo + elpd_loo_se)) +
  coord_flip() +
  theme_bw() +
  labs(x = 'Specification', y = 'elpd_loo')

While the values of elpd_loo don’t tell us so much intrinsically, the relative ranking of the values tells us which models (i.e. the ones with larger elpd_loo, towards the right of the plot) have better expected predictive performance.

7. Recap

Though we can’t share the real data used in the current study, hopefully these walkthroughs of multiverse analyses using publicly available simulated data have been helpful! Overall, we find that specification curves can a super useful as a way to examine the robustness of results when we’re faced with many analysis decision points. For anyone who is interested, good luck traveling the multiverse!

Questions? Feel free to reach out to

Resources

Readings on specification curves & multiverse analyses

Resources for conducting specification curve analyses with R

R session info

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] cowplot_1.0.0      brms_2.12.0        Rcpp_1.0.7         specr_0.2.2       
##  [5] forcats_0.4.0      stringr_1.4.0      dplyr_1.0.3        purrr_0.3.3       
##  [9] readr_1.3.1        tidyr_1.0.2        tibble_3.1.2       ggplot2_3.3.3.9000
## [13] tidyverse_1.3.0   
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1         backports_1.1.5      plyr_1.8.5          
##   [4] igraph_1.2.4.2       splines_3.6.2        crosstalk_1.0.0     
##   [7] rstantools_2.0.0     inline_0.3.15        digest_0.6.27       
##  [10] htmltools_0.4.0      viridis_0.5.1        rsconnect_0.8.16    
##  [13] fansi_0.5.0          magrittr_2.0.1       graphlayouts_0.7.1  
##  [16] modelr_0.1.5         RcppParallel_4.4.4   matrixStats_0.55.0  
##  [19] xts_0.12-0           prettyunits_1.1.1    colorspace_2.0-1    
##  [22] rvest_0.3.5          ggrepel_0.8.1        haven_2.2.0         
##  [25] xfun_0.19            callr_3.4.2          crayon_1.4.1        
##  [28] jsonlite_1.6.1       lme4_1.1-21          zoo_1.8-7           
##  [31] glue_1.4.2           polyclip_1.10-0      gtable_0.3.0        
##  [34] V8_3.0.1             pkgbuild_1.0.6       rstan_2.21.1        
##  [37] abind_1.4-5          scales_1.1.1         mvtnorm_1.1-0       
##  [40] DBI_1.1.0            miniUI_0.1.1.1       viridisLite_0.4.0   
##  [43] xtable_1.8-4         stats4_3.6.2         StanHeaders_2.21.0-1
##  [46] DT_0.11              htmlwidgets_1.5.1    httr_1.4.1          
##  [49] threejs_0.3.3        RColorBrewer_1.1-2   ellipsis_0.3.2      
##  [52] pkgconfig_2.0.3      loo_2.2.0            farver_2.1.0        
##  [55] dbplyr_1.4.2         utf8_1.2.1           tidyselect_1.1.0    
##  [58] labeling_0.4.2       rlang_0.4.11         reshape2_1.4.3      
##  [61] later_1.0.0          munsell_0.5.0        cellranger_1.1.0    
##  [64] tools_3.6.2          cli_2.5.0            generics_0.0.2      
##  [67] broom_0.7.4          ggridges_0.5.2       evaluate_0.14       
##  [70] fastmap_1.0.1        yaml_2.2.1           processx_3.4.2      
##  [73] knitr_1.30           fs_1.3.1             tidygraph_1.2.0     
##  [76] ggraph_2.0.4         nlme_3.1-142         mime_0.9            
##  [79] xml2_1.2.2           compiler_3.6.2       bayesplot_1.7.1     
##  [82] shinythemes_1.2.0    rstudioapi_0.10      curl_4.3            
##  [85] reprex_0.3.0         tweenr_1.0.1         stringi_1.4.6       
##  [88] ps_1.3.2             Brobdingnag_1.2-6    lattice_0.20-38     
##  [91] Matrix_1.2-18        nloptr_1.2.1         markdown_1.1        
##  [94] shinyjs_1.1          vctrs_0.3.8          pillar_1.6.1        
##  [97] lifecycle_1.0.0      bridgesampling_1.0-0 httpuv_1.5.2        
## [100] R6_2.5.0             promises_1.1.0       gridExtra_2.3       
## [103] codetools_0.2-16     boot_1.3-23          colourpicker_1.0    
## [106] MASS_7.3-51.4        gtools_3.8.1         assertthat_0.2.1    
## [109] withr_2.4.2          shinystan_2.5.0      parallel_3.6.2      
## [112] hms_0.5.3            grid_3.6.2           coda_0.19-3         
## [115] minqa_1.2.4          rmarkdown_2.1        ggforce_0.3.1       
## [118] shiny_1.4.0          lubridate_1.7.4      base64enc_0.1-3     
## [121] dygraphs_1.1.1.6