14 Artifacts and Bias in Effect Sizes
collaboration, confidence interval, effect size, open educational resource, open scholarship, open science
14.1 Resources
Effect size estimates such as correlation coefficients and Cohen’s \(d\) values can be severely biased due to various statistical artifacts such as measurement error and selection effects (e.g., range restriction). Methods have been developed to correct for the bias in effect sizes and thus these corrections are called “artifact corrections”. Artifact correction formulas can be complex and therefore readers are referred to other resources listed below:
Jané (2023) : An open-access textbook that contains equations and R code for various types of artifact corrections. In beta version.
Hunter and Schmidt (1990) : Classic textbook on the topic of artifact corrections. Hunter and Schmidt pioneered the methodology for artifact correction style meta-analyses.
Wiernik and Dahlke (2020) : A paper that serves as a condensed version of Hunter and Schmidt’s book. It contains most of the equations necessary to correct effect sizes.
Dahlke and Wiernik (2019) : An R package called
psychmeta
that allows meta-analysts to conduct artifact correction meta-analyses. Contains all the functions one would need to correct effect sizes for artifacts in R.
14.2 Correcting for Measurement Error
If we have reliability estimates of the variables of interest, we can correct a Pearson correlation or a standardized mean difference (Cohen’s \(d\)) for measurement error. Non-differential measurement error attenuates Pearson correlations and Cohen’s \(d\), therefore we can apply correction factors to adjust for this bias. For a pearson correlation, we can use the correction for attenuation first developed by Spearman (1904),
\[ r_c = \frac{r_\text{obs}}{\sqrt{r_{xx'}r_{yy'}}} \tag{14.1}\] where \(r_c\) is the corrected correlation, \(r_\text{obs}\) is the observed correlation, \(r_{xx'}\) is the reliability of \(x\), and \(r_{yy'}\) is the reliability of \(y.\) Reliability coefficients can be estimated a number of different ways, however the two of the most common estimators are Cronbach’s Alpha and test-retest reliability. Alpha measures the internal consistency of a set of sub-component measurements (e.g., question item responses on a questionnaire) while test-retest reliability measures the stability over time.
A Cohen’s \(d\) can be corrected similarly to a correlation coefficient, however since \(d\) reflects the difference in a continuous variable (\(y\)) between two groups, we just need to correct for reliability in the continuous variable,
\[ d_c = \frac{d_\text{obs}}{\sqrt{r_{yy'}}}. \] However in the case of a Cohen’s \(d\), it is important that \(r_{yy'}\) is the pooled within-group reliability (calculate pooled reliability the same way you calculate the pooled standard deviation for denominator of Cohen’s \(d\)). If all you have is the total sample reliability (more commonly reported) you can follow this three step process (Wiernik and Dahlke 2020),
- Convert the \(d\) value to a point-biserial correlation (see section on conversions)
- Correct the point-biserial correlation using Equation 14.1 (setting \(r_{xx'}=1\))
- Convert it back to a Cohen’s \(d\)
Note that confidence intervals for \(r_c\) and \(d_c\) must also be corrected such that, \[ CI_{r_c} = \left[\frac{r_\text{lower-bound}}{\sqrt{r_{xx'}r_{yy'}}},\frac{r_\text{upper-bound}}{\sqrt{r_{xx'}r_{yy'}}}\right] \] and
\[ CI_{d_c} = \left[\frac{d_\text{lower-bound}}{\sqrt{r_{yy'}}},\frac{d_\text{upper-bound}}{\sqrt{r_{yy'}}}\right]. \]
14.3 Correcting for Range Restriction
Range restriction corrections can be quite complex depending on the underlying selection process. The process for correcting Pearson correlations and Cohen’s \(d\) for range restriction is laid out in table 3 of Wiernik and Dahlke (2020).