- Mediation (statistics)
In statistics, a mediation model is one that seeks to identify and explicate the mechanism that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third explanatory variable, known as a mediator variable. Rather than hypothesizing a direct causal relationship between the independent variable and the dependent variable, a mediational model hypothesizes that the independent variable causes the mediator variable, which in turn causes the dependent variable. The mediator variable, then, serves to clarify the nature of the relationship between the independent and dependent variables. While the concept of mediation as defined within psychology is theoretically appealing, the methods used to study mediation empirically have been challenged by statisticians and epidemiologists and interpreted formally.
Direct versus indirect effects
In the diagram shown above, assuming linear relationships, the indirect effect is the product of paths coefficients A and B, while the direct effect is the coefficient C. The total effect measures the extent to which the dependent variable changes when the independent variable increases by one unit. In contrast, the indirect effect (sometimes referred to as mediated effect) measures the extent to which the dependent variable changes when the independent variable is held fixed and the mediator variable changes to the level it would have attained had the independent variable increased by one unit. In linear systems, the total effect is equal to the sum of the direct and indirect effects (C + AB in the model above). In nonlinear models, the total effect is not generally equal to the sum of the direct and indirect effects, but to a modified combination of the two.
Complete versus partial mediation
When the measured effect between the independent variable and the dependent variable is zero upon fixing the mediator variable, the mediation effect is said to be complete (C = 0 in the diagram above.) If, however, the measured effect changes upon fixing the mediator but remains significantly different from zero, the mediation effect is said to be partial. In all cases, the operation of "fixing a variable" must be distinguished from that of "controlling for a variable," which has been inappropriately used in the literature. The former stands for physically fixing, while the latter stands for conditioning on, adjusting for, or adding to the regression model. The two notions coincide only when all error terms (not shown in the diagram) are statistically uncorrelated. When errors are correlated, adjustments must be made to neutralize those correlations before embarking on mediation analysis (see Bayesian Networks).
In order for either partial or complete mediation to be established, the reduction in variance explained by the independent variable must be significant as determined by one of several tests, such as the Sobel test (1982). The effect of an independent variable on the dependent variable can become nonsignificant when the mediator is introduced simply because a trivial amount of variance is explained (i.e., not true mediation). Thus, it is imperative to show a significant reduction in variance explained by the independent variable before asserting either partial or complete mediation. Hayes (2009) shows that it is possible to have statistically significant indirect effects in the absence of a total effect. This can be explained by the presence of several mediating paths that cancel each other out, and become noticeable when one of the cancelling mediator is controlled for. This implies that the terms 'complete' and 'partial' mediation should always be interpreted relative to the set of variables that are present in the model.
Suppression is defined as "a variable which increases the predictive validity of another variable (or set of variables) by its inclusion into a regression equation". For instance, if you are set to examine the effect of a treatment (e.g. medication) on an outcome (e.g. healing from a disease), a suppression would mean that instead of the drop that you would see from the direct effect of the treatment on the outcome when the mediator is included, the opposite happens. The inclusion of the suppressor variable in the equation increases, rather than decreases the relation between the treatment and outcome. This, too, can be explained by cancelation; disabling one mediating path may disturb the balance between otherwise cancelling paths.
Mediation and moderation can co-occur in statistical models. It is possible to mediate moderation and moderate mediation.
Moderated mediation is when the effect of the treatment effect A on the mediator B, and/or when the partial effect of B on C, depends on levels of another variable (D). This definition has been outlined by Muller, Judd, and Yzerbyt (2005) and Preacher, Rucker, and Hayes (2007).
Mediated moderation is a variant of both moderation and mediation. This is where there is initially overall moderation and the direct effect of the moderator variable on the outcome, is mediated either at the A ← B path or at the B → C. The main difference between mediated moderation and moderated mediation is that for the former there is initial moderation and this effect is mediated and for the latter there is no moderation but the effect of either the treatment (A) on the mediator (B) is moderated or the effect of the mediator (B) on the outcome (C) is moderated.
A mediator variable (or mediating variable, or intervening variable) in statistics is a variable that describes how rather than when effects will occur by accounting for the relationship between the independent and dependent variables. A mediating relationship is one in which the path relating A to C is mediated by a third variable (B).
For example, a mediating variable explains the actual relationship between the following variables. Most people will agree that older drivers (up to a certain point), are better drivers. Thus:
- aging better driving
But what is missing from this relationship is a mediating variable that is actually causing the improvement in driving: experience. The mediated relationship would look like the following:
- aging increased experience driving a car better driving
Mediating variables are often contrasted with moderating variables, which pinpoint the conditions under which an independent variable exerts its effects on a dependent variable. A moderating relationship can be thought of as an interaction. It occurs when the relationship between variables A and B depends on the level of C.
Significance of mediation
Bootstrapping   is becoming the most popular method of testing mediation because it does not require the normality assumption to be met, and because it can be effectively utilized with smaller sample sizes (N<25). However, mediation continues to be most frequently determined using the (1) the logic of Baron and Kenny  or (2) the Sobel test. However, this is changing, and it is becoming increasingly more difficult to publish tests of mediation based purely on the Baron and Kenny method or tests that make distributional assumptions such as the Sobel test. See Hayes (2009) for a discussion.
The Mediation Formula
Baron and Kenny's method of evaluating the degree to which an effect is mediated by a given path is applicable in linear systems only. In nonlinear models, especially those involving categorical variables and strong interactions, direct and indirect effects cannot be defined in terms of adding the putative mediator variable to a regression model. Instead, the following counterfactual definitions must be invoked :
The direct effect DE measures the expected change in the dependent variable (Y) when the independent variable (X) is increased by one unit, say from x to x+1, while the mediator variable (M) is held fixed at the level it would have attained before the change.
The indirect effect IE measures the expected change in the dependent variable (Y) when the independent variable (X) is held fixed, and the mediator variable (M) changes to the level it would have attained had the independent variable increased by one unit, say from x to x+1.
For the case of error independence (or no confoundedness), Pearl  derived closed-form expressions for both DE and IE, called the Mediation Formulas:
where m ranges over the values that the mediator variable can take.
DE gives the effect remaining after suppressing the M-mediated path, while IE gives the effect remaining after suppressing the direct path from X to Y. If TE is the total effect, then 1-DE/TE measures the fraction of response owed to mediation, while IE/TE measures the fraction explained by mediation. When the output (Y) is binary, 1-DE/TE measures the percentage of responding units for which mediation was necessary, while IE/TE measures the percentage for which mediation was sufficient.
The Mediation Formulas are applicable to all distributions, and to all types of variables, and they enable analysts to estimate direct and indirect effects efficiently, using both parametric and nonparametric regression.
Due to non-linearities, the total effect may be non-zero even in the absence of direct and indirect effects. This would occur, for example, when Y requires the presence of both M=1 and X=1, and M=X; neither the direct nor indirect path alone can trigger a response while the combined paths can.
Many times, mediation analyses may involve more than one level of analysis (i.e., multilevel modeling). For example, schools with the resources to hire many teachers may make students feel less socially isolated, which may then improve their individual grades and performance in school. Students are nested within schools, creating a multilevel data structure. In this school example, a level two variable (i.e., schools’ resources) is hypothesized to cause a level one variable (i.e., students’ grades and performance), and this relationship is mediated by a level one variable (i.e., student perceptions of social isolation), which represents a 2-1-1 multilevel mediation model. Adding higher levels of analysis introduce additional sources of variance that require the appropriate statistical models that account for this additional variance. For example, there may be variance between schools (i.e., schools differ in the amount of resources they have) and also variance within schools (i.e., students differ in how socially isolated they feel and also in the grades they obtain). Preacher, Zyphur, and Zhang (2010) suggest using structural equation modeling techniques to estimate a wide variety of multilevel mediation models to account for variance components at different levels of analysis. They offer four suggestions in conducting multilevel mediation analyses. (1) Identify the mediation hypothesis to be tested to determine the type of multilevel mediation model to be estimate (e.g., 2-1-1, 2-2-1, and so on). (2) Ensure that there is enough between cluster variability to support using multilevel structural equation modeling. (3) Fit the within cluster model. (4) Fit the between and within cluster models simultaneously. With this full hypothesized model, the indirect effect can be estimated to test the mediation hypotheses.
- ^ MacKinnon, D. P. (2008). Introduction to Statistical Mediation Analysis. New York: Erlbaum.
- ^ Bullock, J. G., Green, D. P., Ha, S. E. (2010). Yes, but what's the mechanism? (Don't expect an easy answer). Journal of Personality & Social Psychology, 98(4):550-558.
- ^ a b Kaufman, J. S., MacLehose R. F., Kaufman S (2004). A further critique of the analytic strategy of adjusting for covariates to identify biologic mediation. Epidemiology Innovations and Perspectives, 1:4.
- ^ a b c d Robins, J. M., Greenland, S. (1992). "Identifiability and exchangeability for direct and indirect effects". Epidemiology, 3(2):143–55.
- ^ a b c d e Pearl, J. (2001) "Direct and indirect effects". Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 411–420.
- ^ a b Pearl, J. (2000) Causality: Models, Reasoning and Inference, Cambridge University Press. 2nd edition (2009).
- ^ a b MacKinnon, D. P., Krull, J. L., Lockwood, C. M. (2000). Equivalence of the Mediation, Confounding and Suppression Effect. Prevention Science, 1(4): 173–181.
- ^ Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: new procedures and recommendations. Psychological Methods, 7(4), 422–445.
- ^ a b Muller, D., Judd, C. M., Yzerbyt, V. Y. (2005). When moderation is mediated and mediation is moderated. Journal of Personality and Social Psychology, 89(6), 852–863.
- ^ Preacher, K. J., Rucker, D. D. & Hayes, A. F. (2007). Assessing moderated mediation hypotheses: Strategies, methods, and prescriptions. Multivariate Behavioral Research, 42, 185–227.
- ^ a b Pearl, J., (2010). "The Mediation Formula: A guide to the assessment of causal pathways in non-linear models". UCLA Computer Science Department, Technical Report R-363, January 2011. To appear in C. Berzuini, P. Dawid, and L. Bernardinelli (Eds.), Causality: Statistical Perspectives and Applications. Forthcoming, 2011.
- ^ Imai, K., Keele, L., and Yamamoto, T., (2010). Identification, inference, and sensitivity analysis for causal mediation effects. Statistical Science, 25(1):51–71, 2010.
- Preacher, Kristopher J.; Hayes, Andrew F. (2004), "SPSS and SAS procedures for estimating indirect effects in simple mediation models", Behavior Research Methods, Instruments, and Computers 36 (4): 717–731, doi:10.3758/BF03206553, http://www.afhayes.com/spss-sas-and-mplus-macros-and-code.html
- Preacher, Kristopher J.; Hayes, Andrew F. (2008), "Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models", Behavior Research Methods 40 (3): 879–891, doi:10.3758/BRM.40.3.879, PMID 18697684, http://www.afhayes.com/spss-sas-and-mplus-macros-and-code.html
- Preacher, K. J.; Zyphur, M. J.; Zhang, Z. (2010), "A general multilevel SEM framework for assessing multilevel mediation", Psychological Methods 15 (3): 209–233, doi:10.1037/a0020141, PMID 20822249
- Baron, R. M. and Kenny, D. A. (1986) "The Moderator-Mediator Variable Distinction in Social Psychological Research – Conceptual, Strategic, and Statistical Considerations", Journal of Personality and Social Psychology, Vol. 51(6), pp. 1173–1182.
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York, NY: Academic Press.
- Preacher, K. J. & Kelley, K. (2011). "Effect sizes measures for mediation models: Quantitative strategies for communicating indirect effects". Psychological Methods, 16(2), 93-115.
- Rucker, D.D., Preacher, K.J., Tormala, Z.L. & Petty, R.E. (2011). Mediation analysis in social psychology: Current practices and new recommendations. Social and Personality Psychology Compass, 5/6, 359-371.
- Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, Vol. 13, pp. 290–312.
- Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication Monographs, Vol. 76, pp. 408–420.
Wikimedia Foundation. 2010.