T-tests are the most commonly used statistical tests for examining differences between group means, or examining a group mean against a constant. Calculating effect sizes for t-tests is fairly straightforward. Nonetheless, there are cases where crucial figures for the calculation are missing (which happens quite often in older articles), and therefore we document methods that make use of partial information (e.g., only the M and the SD, or only the t-statistic and df) for the calculation. There are multiple types of effect sizes used to calculate standardized mean differences (i.e., Cohen’s \(d\)), yet researchers very often do not identify which type of \(d\) value they are reporting (see Lakens 2013). Here we document the equations and code necessary for calculating each type of \(d\) value compiled across multiple sources (Becker 1988; Cohen 1988; Lakens 2013; Caldwell 2022; Glass, McGaw, and Smith 1981). A \(d\) value calculated from a sample will also contain sampling error, therefore we will also show the equations to calculate the standard error. The standard allows us to then calculate the confidence interval. For each formulation in the sections below, the confidence interval will be able to be calculated in the same way, that is,
\[
CI_d = d \pm 1.96\times SE
\tag{7.1}\]
Lastly, we will supply example R code so you can apply to your own data.
Here is a table for every effect size discussed in this chapter:
Uses the average within-group standard deviation to standardize the mean difference. Can be calculated directly from a independent sample t-test. Assumes homogeneity of variance between groups.
Uses the standard deviation of the control group to standardize the mean difference (often referred to as Glass’s Delta). Does not assume homogeneity of variance between treatment/intervention and control group.
Uses the standard deviation of difference scores (also known as change scores) to standardize the within person mean difference (i.e., pre/post change).
Uses the within-person standard deviation that utilizes a correction to \(d_z\) to reduce the impact of the pre/post correlation on the effect size. Assumes homogeneity of variance between conditions.
Uses the pooled variance between conditions (pre/post test). Does not use the correlation between conditions. Assumes homogeneity of variance between conditions.
Uses the pre-test standard deviation to standardize the pre/post mean difference. Does not assume homogeneity of variance between pre-test and post-test.
\(d_{PPC1}\) - Separate pre-test standard deviations
Defined as the difference between the Becker’s d between the treatment and control group. Particularly, standardizing the mean pre/post change by the pre-test of the respective group.
Standardizes the difference in mean changes between treatment and control group. Assumes homogeneity of variance between the pre-test of the control and treatment condition.
\(d_{PPC3}\) - Pooled pre-test and post-test standard deviation
Pools the standard deviation between pre-test and post-test in treatment and control condition. Assumes homogeneity of variance between pre/post-test scores and treatment and control conditions. Confidence intervals are not easy to compute.
Whatever effect size and CI you choose to report, you can report it alongside the t-test statistics (i.e., t-value and the p value). For example,
The treatment group had a significantly higher mean than the control group (t = 2.76, p = .009, n = 35, d = 0.47 [0.11, 0.81]).
7.2 Single Group Designs
For a single group design, we have one group and we want to compare the mean of that group to some constant, \(C\) (i.e., a target value). The standardized mean difference for a single group can be calculated by (equation 2.3.3, Cohen 1988),
\[
d_s = \frac{M-C}{S_1}
\tag{7.2}\]
A positive \(d_s\) value would indicate that the mean is larger than the target value, \(C\). This formulation assumes that the sample is drawn from a normal distribution. The standardizer (i.e., the denominator) is the sample standard deviation. The corresponding standard error for \(d_s\) is (see documentation for Caldwell 2022),
In R, we can use the d.single.t function from the MOTE package to calculate the single group standardized mean difference.
# Install packages if not already installed:# install.packages('MOTE')# Cohen's d for one group# For example:# Sample Mean = 30.4, SD = 22.53, N = 96# Target Value, C = 15library(MOTE)stats <-d.single.t(m =30.4,u =15,sd =22.53,n =96)# print just the d value and confidence intervalsdata.frame(d =apa(stats$d), dlow =apa(stats$dlow), dhigh =apa(stats$dhigh))
d dlow dhigh
1 0.684 0.460 0.904
As you can see, the output shows that the effect size is \(d_s\) = 0.68, 95% CI [0.46, 0.90]. Note the apa function in MOTE takes a value and returns an APA formatted effect size value (i.e., leading zero and three decimal places).
7.3 Two Independent Groups Design
7.3.1 Standardize by Pooled Standard Deviation (\(d_p\))
For a two group design (i.e., between-groups design), we want to compare the means of two groups (group 1 and group 2). The standardized mean difference between two groups can be calculated by (equation 5.1, Glass, McGaw, and Smith 1981),
\[
d_p = \frac{M_1-M_2}{S_p}.
\tag{7.4}\]
A positive \(d_p\) value would indicate that the mean of group 1 is larger than the mean of group 2. Dividing the mean difference by the pooled standard deviation, \(S_p\), is the classic formulation of Cohen’s \(d\). The pooled standard deviation, \(S_p\), can be calculated as the square root of the average variance (weighted by the degrees of freedom, \(df=n-1\)) of group 1 and group 2 (pp. 108, Glass, McGaw, and Smith 1981):
Note that the term variance refers to the square of the standard deviation (\(S^2\)). Cohen’s \(d_p\) has is related to the t-statistic from an independent samples t-test. In fact, we can calculate the \(d_p\) value from the \(t\)-statistic with the following formula (equation 5.3, Glass, McGaw, and Smith 1981):
\[
d = t\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}.
\tag{7.6}\]
In R, we can use the d.ind.t function from the MOTE package to calculate the two group standardized mean difference. Since we have already loaded in the MOTE package, we do not need to again.
# Cohen's d for two independent groups# given means and SDs# For example:# Group 1 Mean = 30.4, SD = 22.53, N = 96# Group 2 Mean = 21.4, SD = 19.59, N = 96stats <-d.ind.t(m1 =30.4,m2 =21.4,sd1 =22.53,sd2 =19.59,n1 =96,n2 =96,a =0.05)# print just the d value and confidence intervalsdata.frame(d =apa(stats$d), dlow =apa(stats$dlow), dhigh =apa(stats$dhigh))
d dlow dhigh
1 0.426 0.140 0.712
The output shows that the effect size is \(d_p\) = 0.43, 95% CI [0.14, 0.71].
7.3.2 Standardize by Control Group Standard Deviation (\(d_{\Delta}\))
When two groups differ substantially in their standard deviations, we can instead standardize by the control group standard deviation (\(S_C\)), such that,
\[
d_{\Delta} = \frac{M_T-M_C}{S_C}.
\tag{7.8}\]
Where the subscripts, \(T\) and \(C\), denotes the treatment group and control group, respectively. This formulation is commonly referred to as Glass’ \(\Delta\)(Glass 1981). The standard error for \(d_{\Delta}\) can be defined as,
Notice that when we only standardize by the standard deviation of the control group (rather than pooling), we he will have less degrees of freedom (\(df=n_C-1\)) and therefore more sampling error than we do when we divide by the pooled standard deviation (\(df= n_T + n_C - 2\)).In R, we can use the delta.ind.t.diff function from the MOTE package to calculate \(d_\Delta\).
# Cohen's dz for difference scores# given difference score means and SDs# For example:# Control group Mean = 30.4, SD = 22.53, N = 96# Treatment group Mean = 21.4, SD = 19.59, N = 96# correlation between conditions: r = .40stats <-delta.ind.t(m1 =30.4,m2 =21.4,sd1 =22.53,sd2 =19.59,n1 =96,n2 =96,a =0.05)# print just the d value and confidence intervalsdata.frame(d =apa(stats$d), dlow =apa(stats$dlow), dhigh =apa(stats$dhigh))
d dlow dhigh
1 0.399 0.140 0.712
7.4 Repeated Measures Designs
In a repeated-measures design, the same subjects (or items, etc.) are measured on two or more separate occasions, or in multiple conditions within a single session, and we want to know the mean difference between those occasions or conditions (Baayen, Davidson, and Bates 2008; Barr et al. 2013). An example of this would be in a pre/post comparison where subjects are tested before and after undergoing some treatment (see Figure 7.1 for a visualization). A standardized mean difference in a repeated-measures design can take on a few different forms that we define below.
7.4.1 Difference Score \(d\) (\(d_z\))
Instead of comparing the means of two sets of scores, a within subject design allows us to subtract the scores obtained in condition 1 from the scores in condition 2. These difference scores (\(X_{\text{diff}}=X_2-X_1\)) can be used similarly to the single group design (if the target value was zero, i.e., \(C=0\)) such that (equation 2.3.5, Cohen 1988),
Where the difference between this formulation and the single group design is the nature of the scores (difference scores rather than raw scores). The convenient thing about \(d_z\) is that it has a straight-forward relationship with the \(t\)-statistic, \(d_z=\frac{t}{\sqrt{n}}\). This makes it very useful for power analyses. If the standard deviation of difference scores are not accessible, then it can be calculated using the standard deviation of condition 1 (\(S_1\)), the standard deviation of condition 2 (\(S_2\)), and the correlation between conditions (\(r\)) (equation 2.3.6, Cohen 1988):
\[
S_{\text{diff}}=\sqrt{S^2_1 + S^2_2 - 2 r S_1 S_2}
\tag{7.11}\]
It is important to note that when the correlation between groups is large, then the \(d_z\) value will also be larger, whereas a small correlation will return a smaller \(d_z\) value. The standard error of \(d_z\) can be calculated similarly to the single group design such that,
In R, we can use the d.ind.t.diff function from the MOTE package to calculate \(d_z\).
# Cohen's dz for difference scores# given difference score means and SDs# For example:# Difference Score Mean = 21.4, SD = 19.59, N = 96library(MOTE)stats <-d.dep.t.diff(m =21.4,sd =19.59,n =96,a =0.05)# print just the d value and confidence intervalsdata.frame(d =apa(stats$d), dlow =apa(stats$dlow), dhigh =apa(stats$dhigh))
d dlow dhigh
1 1.092 0.837 1.344
The output shows that the effect size is \(d_z\) = 1.09, 95% CI [0.84, 1.34].
7.4.2 Repeated Measures \(d\) (\(d_{rm}\))
For a within-group design, we want to compare the means of scores obtained from condition 1 and condition 2. The repeated measures standardized mean difference between the two conditions can be calculated by (equation 9, Lakens 2013),
\[
d_{rm} = \frac{M_2-M_1}{S_w}.
\tag{7.13}\]
A positive \(d_{rm}\) value would indicate that the mean of condition 2 is larger than the mean of condition 1. The standardizer here is the within-subject standard deviation, \(S_w\). The within-subject standard deviation can be defined as,
\[
S_{w}=\sqrt{\frac{S^2_1 + S^2_2 - 2 r S_1 S_2}{2(1-r)}}.
\tag{7.14}\]
We can also express \(S_w\) in terms of the standard deviation of difference scores (\(S_{\text{diff}}\)),
Ultimately the \(d_{rm}\) is more appropriate as an effect size estimate for use in meta-analysis whereas \(d_z\) is more appropriate for power analysis (Lakens 2013). The standard error for \(d_{rm}\) can be computed as,
In R, we can use the d.ind.t.rm function from the MOTE package to calculate the repeated measures standardized mean difference (\(d_{rm}\)).
# Cohen's d for repeated measures# given means and SDs and correlation# For example:# Condition 1 Mean = 30.4, SD = 22.53, N = 96# Condition 2 Mean = 21.4, SD = 19.59, N = 96# correlation between conditions: r = .40stats <-d.dep.t.rm(m1 =30.4,m2 =21.4,sd1 =22.53,sd2 =19.59,r = .40,n =96,a =0.05)# print just the d value and confidence intervalsdata.frame(d =apa(stats$d), dlow =apa(stats$dlow), dhigh =apa(stats$dhigh))
d dlow dhigh
1 0.425 0.215 0.633
The output shows that the effect size is \(d_{rm}\) = 0.42, 95% CI [0.21, 0.63].
7.4.3 Average Variance \(d\) (\(d_{av}\))
The problem with \(d_{z}\) and \(d_{rm}\), is that they require the correlation between conditions. In practice, correlations between conditions are frequently not reported. An alternative estimator of Cohen’s \(d\) in repeated measures design is to simply use the classic variation of cohen’s \(d\) (i.e., pooled standard deviation). In a repeated measures design, the sample size does not change between conditions. Therefore weighting the variance of condition 1 and condition 2 by their respective degrees of freedom (i.e., \(df=n-1\)) is an unnecessary step. Instead, we can standardize by the square root of the average the variances of condition 1 and 2 (see equation 5, Algina and Keselman 2003):
This formulation is convenient especially when the correlation is not present, however without the correlation it fails to take into account the consistency of change between conditions. The standard error of the \(d_{av}\) can be expressed as (equation 9, Algina and Keselman 2003),
In R, we can use the d.ind.t.rm function from the MOTE package to calculate the repeated measures standardized mean difference (\(d_{rm}\)).
# Cohen's d for repeated measures (average variance)# given means and SDs # For example:# Condition 1 Mean = 30.4, SD = 22.53, N = 96# Condition 2 Mean = 21.4, SD = 19.59, N = 96stats <-d.dep.t.avg(m1 =30.4,m2 =21.4,sd1 =22.53,sd2 =19.59,n =96,a =0.05)# print just the d value and confidence intervalsdata.frame(d =apa(stats$d), dlow =apa(stats$dlow), dhigh =apa(stats$dhigh))
d dlow dhigh
1 0.427 0.217 0.635
The output shows that the effect size is \(d_{av}\) = 0.43, 95% CI [0.22, 0.64].
7.4.4 Becker’s \(d\) (\(d_b\))
An even simpler variant of repeated measures \(d\) value comes from Becker (1988). Becker’s \(d\) standardizes simply by the pre-test standard deviation when the comparison is a pre/post design,
The convenient interpretation of “change in baseline standard deviations” can be quite useful. We can also obtain the standard error with (equation 13, Becker 1988),
Notice that even though the formula for calculating \(d_b\) did not include the correlation coefficient, the standard error does.
In base R, we can calculate Becker’s formulation of standardized mean difference using the equations above.
# Install the package below if not done so already# install.packages(escalc)# Cohen's d for repeated measures (becker's d)# given means, the pre-test SDs, and the correlation# For example:# Pre-test Mean = 21.4, SD = 19.59, N = 96# Post-test Mean = 30.4, N = 96# Correlation between conditions: r = .40Mpre <-21.4Mpost <-30.4Spre <-19.59r <- .40n <-96a <-0.05d <- (Mpost - Mpre) / SpreSE <-sqrt( 2*(1-r)/n + d^2/(2*n) )# print just the d value and confidence intervalsdata.frame(d =apa(d), dlow =apa(d -1.96*SE), dhigh =apa(d +1.96*SE))
d dlow dhigh
1 0.459 0.231 0.688
The output shows that the effect size is \(d_{rm}\) = 0.46, 95% CI [0.23, 0.69].
7.4.5 Comparing Repeated Measures \(d\) values
Figure 7.2 shows repeated measures designs with a high (\(r=\) .95) and low (\(r=\) .05) correlation between conditions. Let us fix the standard deviations and means for both conditions (i.e., high and low correlation) and only vary the correlation. Now we can compare the repeated measures estimators based on these two conditions shown in Figure 7.2:
High correlation:
\(d_z=1.24\)
\(d_{rm}=0.39\)
\(d_{av}=0.43\)
\(d_{b}=0.40\)
Low correlation:
\(d_z=0.31\)
\(d_{rm}=0.43\)
\(d_{av}=0.43\)
\(d_{b}=0.40\)
We notice that the correlation greatly influences \(d_z\) more than any other estimator. The \(d_{rm}\) value has very little change, whereas \(d_{av}\) and \(d_{b}\) do not take into account the correlation at all.
7.5 Pretest-Posttest-Control Group Designs
In many areas of research both between and within group factors are incorporated. For example, in research involving the examination of the effects of an intervention often a sample is randomised into two seperate groups (intervention and control) and then they are measured on the outcome of interest both before (pretest) and after (posttest) the intervention/control period. In these types of 2x2 (group x time) study designs it is usually the difference between the standardised mean change for the intervention/treatment (\(T\)) and control (\(C\)) groups that is of interest. For a visualization of a pretest-posttest-control group design see Figure 7.3.
Morris (2008) details three effect sizes for this pretest-posttest-control (PPC).
7.5.1 PPC1 - separate pre-test standard deviations
The separate pre-test (i.e., baseline) standard deviations are used to standardize the pre/post mean difference in the intervention group and the control group respectively (see equation 4, Morris 2008),
Note that these effect sizes are identical to the Becker’s \(d\) formulation of the SMD (see Section 7.4.4). Therefore the pretest-posttest-control group effect size is simply the difference between the intervention and control pre/post SMD (equation 15, Becker 1988),
\[
d_{PPC1} = d_T - d_C
\tag{7.24}\]
The asymptotic standard error of \(d_{PPC2}\) was first derived by Becker (1988) and can be expressed as the square root of the sum of the sampling variances (equation 16, Becker 1988)
We can calculate \(d_{PPC1}\) and it’s confidence intervals using base R:
# Example:# Control Group (N = 90)## Pre-test Mean = 20, SD = 6## Post-test Mean = 25, SD = 7## Pre/post correlation = .50M_Cpre <-20M_Cpost <-25SD_Cpre <-6SD_Cpost <-7rC <- .50nC <-90# Intervention Group (N = 90)## Pre-test Mean = 20, SD = 5## Post-test Mean = 27, SD = 8## Pre/post correlation = .50M_Tpre <-20M_Tpost <-27SD_Tpre <-5SD_Tpost <-8rT <- .50nT <-90# calculate the observed standardized mean differencedT <- (M_Tpost- M_Tpre) / SD_TpredC <- (M_Cpost - M_Cpre) / SD_CpredPPC1 <- dT - dC# calculate the standard errorSE <-sqrt( 2*(1-rT)/nT + dPPC1^2/(2*nT) +2*(1-rC)/nC + dPPC1^2/(2*nC) )# print the d value and confidence intervalsdata.frame(d = MOTE::apa(dPPC1),dlow = MOTE::apa(dPPC1 -1.96*SE),dhigh = MOTE::apa(dPPC1 +1.96*SE))
d dlow dhigh
1 0.567 0.252 0.881
The output shows a pre-post intervention effect of \(d_{PPC1}\) = 0.57 [0.25, 0.88].
7.5.2 PPC2 - pooled pre-test standard deviations
The pooled pre-test (i.e., baseline) standard deviations can be used to standardized the difference in pre/post change between intervention and control groups such that (equation 8, Morris 2008),
Note the original equation shown in the paper by Morris (2008) uses the population pre/post correlation \(\rho\), however in the equation above we replace \(\rho\) with the sample size weighted average of the Pearson correlation computed in the treatment group and the control group (i.e., \(\rho \approx \frac{n_T r_T + n_C r_C}{n_T + n_C}\)).
We can use base R to obtain \(d_{PPC2}\) and confidence intervals:
# Example:# Control Group (N = 90)## Pre-test Mean = 20, SD = 6## Post-test Mean = 25, SD = 7## Pre/post correlation = .50M_Cpre <-20M_Cpost <-25SD_Cpre <-6SD_Cpost <-7rC <- .50nC <-90# Intervention Group (N = 90)## Pre-test Mean = 20, SD = 5## Post-test Mean = 27, SD = 8## Pre/post correlation = .50M_Tpre <-20M_Tpost <-27SD_Tpre <-5SD_Tpost <-8rT <- .50nT <-90# calculate the observed standardized mean differencedPPC2 <- ((M_Tpost- M_Tpre) - (M_Cpost - M_Cpre)) /sqrt( ( (nT -1)*(SD_Tpre^2) + (nC -1)*(SD_Cpre^2) ) / (nT + nC -2) )# calculate the standard errorSE <-sqrt(2*(1-( (nT*rT+nC*rC)/(nT + nC))) * ((nT+nC)/(nT*nC)) * (1+ (dPPC2^2/ (2*(1- ((nT*rT+nC*rC)/(nT+nC))) * ((nT+nC)/(nT*nC)))))) - dPPC2# print the d value and confidence intervalsdata.frame(d = MOTE::apa(dPPC2),dlow = MOTE::apa(dPPC2 -1.96*SE),dhigh = MOTE::apa(dPPC2 +1.96*SE))
d dlow dhigh
1 0.362 0.304 0.420
The output shows a pre-post intervention effect of \(d_{PPC2}\) = 0.36 [0.30, 0.42].
7.5.3 PPC3 - pooled pre- and post-test
The two previous effect sizes only use the pretest standard deviation. But if we are happy to assume that pretest and posttest variances are homogenous1 the pooled pre-test and post-test standard deviations can be used to standardized the difference in pre/post change between intervention and control groups such that (equation 8, Morris 2008),
The standard error for \(d_{PPC2}\) is currently unknown. An option to estimate this standard error is to use a non-parametric or parametric bootstrap by repeatedly sampling the raw data, or if the raw data is not available resample simulated data. We can do this in base R by simulating pre/post data using the mvrnorm() function from the MASS package (Venables and Ripley 2002):
# Install the package below if not done so already# install.packages(MASS)# Example:# Control Group (N = 90)## Pre-test Mean = 20, SD = 6## Post-test Mean = 25, SD = 7## Pre/post correlation = .50M_Cpre <-20M_Cpost <-25SD_Cpre <-6SD_Cpost <-7rC <- .50nC <-90# Intervention Group (N = 90)## Pre-test Mean = 20, SD = 5## Post-test Mean = 27, SD = 8## Pre/post correlation = .50M_Tpre <-20M_Tpost <-27SD_Tpre <-5SD_Tpost <-8rT <- .50nT <-90# simulate dataset.seed(1) # set seed for reproducibilityboot_dPPC3 <-c()for(i in1:1000){# simulate control group pre-post data data_C <- MASS::mvrnorm(n = nC,# input observed meansmu =c(M_Cpre,M_Cpost),# input observed covariance matrixSigma =data.frame(pre =c(SD_Cpre^2, rC*SD_Cpre*SD_Cpost), post =c(rC*SD_Cpre*SD_Cpost,SD_Cpost^2)))# simulate intervention group pre-post data data_T <- MASS::mvrnorm(n = nT,# input observed meansmu =c(M_Tpre,M_Tpost),# input observed covariance matrixSigma =data.frame(pre =c(SD_Tpre^2, rT*SD_Tpre*SD_Tpost), post =c(rT*SD_Tpre*SD_Tpost,SD_Tpost^2)))# calculate the mean difference in pre/post change (the numerator) MeanDiff <- (mean(data_T[,2]) -mean(data_T[,1])) - (mean(data_C[,2]) -mean(data_C[,1]))# calculate the pooled pre-post standard deviation (the denominator) S_Pprepost <-sqrt( ( (nT -1)*(sd(data_T[,1])^2+sd(data_T[,2])^2) + (nC -1)*(sd(data_C[,1])^2+sd(data_C[,2])^2) ) / (nT + nC -2) )# calculate the standardized mean difference for each bootstrap iteration boot_dPPC3[i] <- MeanDiff / S_Pprepost}# calculate bootstrapped standard errorSE <-sd(boot_dPPC3)# calculate the observed standardized mean differencedPPC3 <- ((M_Tpost- M_Tpre) - (M_Cpost - M_Cpre)) /sqrt( ( (nT -1)*(SD_Tpre^2+SD_Tpost^2) + (nC -1)*(SD_Cpre^2+SD_Cpost^2) ) / (nT + nC -2) )#print the d value and confidence intervalsdata.frame(d = MOTE::apa(dPPC3),dlow = MOTE::apa(dPPC3 -1.96*SE),dhigh = MOTE::apa(dPPC3 +1.96*SE))
d dlow dhigh
1 0.214 0.002 0.427
The output shows a pre-post intervention effect of \(d_{PPC3}\) = 0.21 [0.002, 0.43].
7.6 Small Sample Bias in \(d\) values
All the estimators of \(d\) listed above are biased estimates of the population \(d\) value, specifically they all over-estimate the population value in small sample sizes. To adjust for this bias, we can apply a correction factor based on the degrees of freedom. The degrees of freedom will largely depend on the estimator used. The degrees of freedom for each estimator is listed below:
Single Group design (\(d_s\)): \(df = n-1\)
Between Groups - Pooled Standard Deviation (\(d_p\)): \(df = n_1+n_2-2\)
Between Groups - Control Group Standard Deviation (\(d_\Delta\)): \(df = n_C-1\)
Pretest-Posttest-Control Separate Standard Deviation (\(d_{PPC1}\)): \(df=n_C−1\)
Pretest-Posttest-Control Pooled Pretest Standard Deviation (\(d_{PPC2}\)): \(df=n_T+n_C−2\)
Pretest-Posttest-Control Pooled Pretest and Posttest Standard Deviation (\(d_{PPC3}\)): \(df=2(n_T+n_C−2)\)
With the appropriate degrees of freedom, we can use the following correction factor, \(CF\), to obtain an unbiased estimate of the population standardized mean difference:
Where \(\Gamma(\cdot)\) is the gamma function. An approximation of this complex formula given by Hedges (1981) can be written as \(CF\approx 1-\frac{3}{4\cdot df -1}\). In R, this can be calculated using,
This correction factor can then be applied to any of the estimators mentioned above,
\[
d^* = d\times CF
\tag{7.32}\]
The corrected \(d\) value, \(d^*\), is commonly referred to as Hedges’ \(g\) or just \(g\). To avoid notation confusion we will just add an asterisk to \(d\) to denote the correction. We also need to correct the standard error for \(d^*\)
\[
SE_{d^*} = SE_{d} \times CF
\tag{7.33}\]
These standard errors can then be used to calculate the confidence interval of the corrected \(d\) value,
# Example:# Cohen's d = .50, SE = .10d = .50SE = .10# correct d value and CIs small sample biasd_corrected <- d * CFSE_corrected <- SE * CFdlow_corrected <- d_corrected -1.96*SE_correcteddhigh_corrected <- d_corrected +1.96*SE_corrected# print just the d value and confidence intervalsdata.frame(d =apa(d), dlow =apa(dlow_corrected), dhigh =apa(dhigh_corrected))
d dlow dhigh
1 0.500 0.298 0.681
The output shows that the corrected effect size is \(d^*\) = 0.50, 95% CI [0.30, 0.68].
7.7 Ratios of Means
Another common approach, particularly within the fields of ecology and evolution, is to take the natural logarithm of the ratio between two means; the so-called Response Ratio (\(lnRR\)). This is sometimes more favorable as, due to its construction using the standard deviation in some form as a denominator, the various versions of standardized mean differences are impacted by the estimate of this parameter for which studies are often less powered compared to mean magnitudes (Yang et al. 2022). For the \(lnRR\) however the standard deviation only impacts its variance estimation and not the point estimate. A limitation of the lnRR however is that it is limited to data that are observed on a ratio scale (i.e., have an absolute zero and instances of it are related ordinally and additively meaning both means will be positive).
Although strictly speaking the \(lnRR\) is not a difference in means in an additive sense as the above standardized mean difference effect sizes are, it can in one sense be considered to reflect the difference in means on the multiplicative scale. In fact, after calculation it is often transformed to reflect the percentage difference or change between means: \(100\times \exp(lnRR)-1\). However, this can introduce transformation induced bias because a non-linear transformation of a mean value is not generally equal to the mean of the transformed value. In the context of meta-analysis combining \(lnRR\) estimated across studies a correct factor can be applied: \(100\times \exp(lnRR+0.5 S^2_\text{total})-1\), where \(S^2_\text{total}\) is the variance of all \(lnRR\) values.
Similarly to the various standardized mean differences, there are varied calculations for the lnRR dependent upon the study design being used (see Senior, Viechtbauer, and Nakagawa 2020).
7.7.1 lnRR for Independent Groups (\(lnRR_\text{ind}\))
The lnRR can be calculated when groups are independent as follows,
Where \(M_T\) and \(M_C\) are the means for the treatment and control group respectively and \(CF\) is the small sample correction factor calculated as,
Using R we can easily calculate this effect size using the escalc() function in the metafor package (Viechtbauer 2010):
# lnRR for two independent groups# given means and SDs# For example:# Group 1 Mean = 30.4, Standard deviation = 22.53, Sample size = 96# Group 2 Mean = 21.4, Standard deviation = 19.59, Sample size = 96library(metafor)# prepare the dataM1 <-30.4M2 <-21.4SD1 <-22.53SD2 <-19.59N1 =96N2 =96# calculate lnRRind and standard errorlnRRind <-escalc(measure ="ROM", m1i = M1,m2i = M2,sd1i = SD1,sd2i = SD2,n1i = N1,n2i = N2)lnRRind$SE <-sqrt(lnRRind$vi)# calculate confidence intervallnRRind$CIlow <- lnRRind$yi -1.96*lnRRind$SElnRRind$CIhigh <- lnRRind$yi +1.96*lnRRind$SE# print the VR value and confidence intervalsdata.frame(lnRRind = MOTE::apa(lnRRind$yi),lnRRind_low = MOTE::apa(lnRRind$CIlow),lnRRind_high = MOTE::apa(lnRRind$CIhigh))
The example shwos a natural log response ratio of \(lnRR_\text{dep}\) = 0.35 [0.17, 0.54].
Algina, James, and H. J. Keselman. 2003. “Approximate Confidence Intervals for Effect Sizes.”Educational and Psychological Measurement 63 (4): 537–53. https://doi.org/10.1177/0013164403256358.
Baayen, R Harald, Douglas J Davidson, and Douglas M Bates. 2008. “Mixed-Effects Modeling with Crossed Random Effects for Subjects and Items.”Journal of Memory and Language 59 (4): 390–412.
Barr, Dale J, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.”Journal of Memory and Language 68 (3): 255–78.
Becker, Betsy J. 1988. “Synthesizing Standardized Mean-Change Measures - UConn Library.”British Journal of Mathematical and Statistical Psychology 41 (2): 257278. https://doi.org/https://doi.org/10.1111/j.2044-8317.1988.tb00901.x.
Hedges, Larry V. 1981. “Distribution Theory for Glass’s Estimator of Effect Size and Related Estimators.”Journal of Educational Statistics 6 (2): 107–28. https://doi.org/10.3102/10769986006002107.
Morris, Scott B. 2008. “Estimating Effect Sizes From Pretest-Posttest-Control Group Designs.”Organizational Research Methods 11 (2): 364–86. https://doi.org/10.1177/1094428106291059.
Senior, Alistair M., Wolfgang Viechtbauer, and Shinichi Nakagawa. 2020. “Revisiting and Expanding the Meta-Analysis of Variation: The Log Coefficient of Variation Ratio.”Research Synthesis Methods 11 (4): 553–67. https://doi.org/10.1002/jrsm.1423.
Viechtbauer, Wolfgang. 2010. “Conducting Meta-Analyses in R with the metafor Package.”Journal of Statistical Software 36 (3): 1–48. https://doi.org/10.18637/jss.v036.i03.
Yang, Yefeng, Helmut Hillebrand, Malgorzata Lagisz, Ian Cleasby, and Shinichi Nakagawa. 2022. “Low Statistical Power and Overestimated Anthropogenic Impacts, Exacerbated by Publication Bias, Dominate Field Studies in Global Change Biology.”Global Change Biology 28 (3): 969–89. https://doi.org/10.1111/gcb.15972.
Note, this may not be the case especially where there is a mean-variance relationship and one (usually the intervention) group has a higher posttest mean score.↩︎