7  Mean Differences

T-tests are the most commonly used statistical tests for examining differences between group means, or examining a group mean against a constant. Calculating effect sizes for t-tests is fairly straightforward. Nonetheless, there are cases where crucial figures for the calculation are missing (which happens quite often in older articles), and therefore we document methods that make use of partial information (e.g., only the M and the SD, or only the t-statistic and df) for the calculation. There are multiple types of effect sizes used to calculate standardized mean differences (i.e., Cohen’s \(d\)), yet researchers very often do not identify which type of \(d\) value they are reporting (see Lakens 2013). Here we document the equations and code necessary for calculating each type of \(d\) value compiled across multiple sources (Becker 1988; Cohen 1988; Lakens 2013; Caldwell 2022; Glass, McGaw, and Smith 1981). A \(d\) value calculated from a sample will also contain sampling error, therefore we will also show the equations to calculate the standard error. The standard allows us to then calculate the confidence interval. For each formulation in the sections below, the confidence interval will be able to be calculated in the same way, that is,

\[ CI_d = d \pm 1.96\times SE \tag{7.1}\]

Lastly, we will supply example R code so you can apply to your own data.

Here is a table for every effect size discussed in this chapter:

Type Description Section
Single Group Design Section 7.2
\(d_s\) - Single Group Standardized mean difference for comparing a single group to some constant Section 7.2
Two Independent Groups Design Section 7.3
\(d_p\) - Pooled Standard Deviation Uses the average within-group standard deviation to standardize the mean difference. Can be calculated directly from a independent sample t-test. Assumes homogeneity of variance between groups. Section 7.3.1
\(d_\Delta\) - Control Group Standard Deviation Uses the standard deviation of the control group to standardize the mean difference (often referred to as Glass’s Delta). Does not assume homogeneity of variance between treatment/intervention and control group. Section 7.3.2
Repeated Measures (Paired Groups) Design Section 7.4
\(d_z\) - Difference score standard deviation Uses the standard deviation of difference scores (also known as change scores) to standardize the within person mean difference (i.e., pre/post change). Section 7.4.1
\(d_{rm}\) - Repeated measures Uses the within-person standard deviation that utilizes a correction to \(d_z\) to reduce the impact of the pre/post correlation on the effect size. Assumes homogeneity of variance between conditions. Section 7.4.2
\(d_{av}\) - Average variance Uses the pooled variance between conditions (pre/post test). Does not use the correlation between conditions. Assumes homogeneity of variance between conditions. Section 7.4.3
\(d_{b}\) - Becker’s d Uses the pre-test standard deviation to standardize the pre/post mean difference. Does not assume homogeneity of variance between pre-test and post-test. Section 7.4.4
Pre-Post-Control Design Section 7.5
\(d_{PPC1}\) - Separate pre-test standard deviations Defined as the difference between the Becker’s d between the treatment and control group. Particularly, standardizing the mean pre/post change by the pre-test of the respective group. Section 7.5.1
\(d_{PPC2}\) - Pooled pre-test standard deviation Standardizes the difference in mean changes between treatment and control group. Assumes homogeneity of variance between the pre-test of the control and treatment condition. Section 7.5.2
\(d_{PPC3}\) - Pooled pre-test and post-test standard deviation Pools the standard deviation between pre-test and post-test in treatment and control condition. Assumes homogeneity of variance between pre/post-test scores and treatment and control conditions. Confidence intervals are not easy to compute. Section 7.5.3
Mean Ratios Section 9.2.8
\(lnRR_\text{ind}\) - Response ratio between independent groups The ratio between the means between two groups. Does not use the standard deviation in the effect size formula. Section 7.7.1
\(lnRR_\text{dep}\) - Response ratio between dependent groups The ratio between the means between conditions (i.e., repeated measures). Does not use the standard deviation in the effect size formula. Section 7.7.2

7.1 Reporting a t-test with effect size and CI

Whatever effect size and CI you choose to report, you can report it alongside the t-test statistics (i.e., t-value and the p value). For example,

The treatment group had a significantly higher mean than the control group (t = 2.76, p = .009, n = 35, d = 0.47 [0.11, 0.81]).

7.2 Single Group Designs

For a single group design, we have one group and we want to compare the mean of that group to some constant, \(C\) (i.e., a target value). The standardized mean difference for a single group can be calculated by (equation 2.3.3, Cohen 1988),

\[ d_s = \frac{M-C}{S_1} \tag{7.2}\]

A positive \(d_s\) value would indicate that the mean is larger than the target value, \(C\). This formulation assumes that the sample is drawn from a normal distribution. The standardizer (i.e., the denominator) is the sample standard deviation. The corresponding standard error for \(d_s\) is (see documentation for Caldwell 2022),

\[ SE_{d_s} = \sqrt{\frac{1}{n}+\frac{d_s^2}{2n}}. \tag{7.3}\]

In R, we can use the d.single.t function from the MOTE package to calculate the single group standardized mean difference.

# Install packages if not already installed:
# install.packages('MOTE')
# Cohen's d for one group

# For example:
# Sample Mean = 30.4, SD = 22.53, N = 96
# Target Value, C = 15

library(MOTE)

stats <- d.single.t(
  m = 30.4,
  u = 15,
  sd = 22.53,
  n = 96
)

# print just the d value and confidence intervals
data.frame(d = apa(stats$d), 
           dlow = apa(stats$dlow), 
           dhigh = apa(stats$dhigh))
      d  dlow dhigh
1 0.684 0.460 0.904

As you can see, the output shows that the effect size is \(d_s\) = 0.68, 95% CI [0.46, 0.90]. Note the apa function in MOTE takes a value and returns an APA formatted effect size value (i.e., leading zero and three decimal places).

7.3 Two Independent Groups Design

7.3.1 Standardize by Pooled Standard Deviation (\(d_p\))

For a two group design (i.e., between-groups design), we want to compare the means of two groups (group 1 and group 2). The standardized mean difference between two groups can be calculated by (equation 5.1, Glass, McGaw, and Smith 1981),

\[ d_p = \frac{M_1-M_2}{S_p}. \tag{7.4}\]

A positive \(d_p\) value would indicate that the mean of group 1 is larger than the mean of group 2. Dividing the mean difference by the pooled standard deviation, \(S_p\), is the classic formulation of Cohen’s \(d\). The pooled standard deviation, \(S_p\), can be calculated as the square root of the average variance (weighted by the degrees of freedom, \(df=n-1\)) of group 1 and group 2 (pp. 108, Glass, McGaw, and Smith 1981):

\[ S_p = \sqrt{\frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1+n_2-2}} \tag{7.5}\]

Note that the term variance refers to the square of the standard deviation (\(S^2\)). Cohen’s \(d_p\) has is related to the t-statistic from an independent samples t-test. In fact, we can calculate the \(d_p\) value from the \(t\)-statistic with the following formula (equation 5.3, Glass, McGaw, and Smith 1981):

\[ d = t\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}. \tag{7.6}\]

The corresponding standard error of \(d_p\) is,

\[ SE_{d_p} = \sqrt{\frac{n_1+n_2}{n_1 n_2}+\frac{d_p^2}{2(n_1+n_2)}}. \tag{7.7}\]

In R, we can use the d.ind.t function from the MOTE package to calculate the two group standardized mean difference. Since we have already loaded in the MOTE package, we do not need to again.

# Cohen's d for two independent groups
# given means and SDs

# For example:
# Group 1 Mean = 30.4, SD = 22.53, N = 96
# Group 2 Mean = 21.4, SD = 19.59, N = 96

stats <- d.ind.t(
  m1 = 30.4,
  m2 = 21.4,
  sd1 = 22.53,
  sd2 = 19.59,
  n1 = 96,
  n2 = 96,
  a = 0.05
)

# print just the d value and confidence intervals
data.frame(d = apa(stats$d), 
           dlow = apa(stats$dlow), 
           dhigh = apa(stats$dhigh))
      d  dlow dhigh
1 0.426 0.140 0.712

The output shows that the effect size is \(d_p\) = 0.43, 95% CI [0.14, 0.71].

7.3.2 Standardize by Control Group Standard Deviation (\(d_{\Delta}\))

When two groups differ substantially in their standard deviations, we can instead standardize by the control group standard deviation (\(S_C\)), such that,

\[ d_{\Delta} = \frac{M_T-M_C}{S_C}. \tag{7.8}\]

Where the subscripts, \(T\) and \(C\), denotes the treatment group and control group, respectively. This formulation is commonly referred to as Glass’ \(\Delta\) (Glass 1981). The standard error for \(d_{\Delta}\) can be defined as,

\[ SE_{d_{\Delta}} = \sqrt{\frac{n_T+n_C}{n_T n_C} + \frac{d_\Delta^2}{n_C+1} } \tag{7.9}\]

Notice that when we only standardize by the standard deviation of the control group (rather than pooling), we he will have less degrees of freedom (\(df=n_C-1\)) and therefore more sampling error than we do when we divide by the pooled standard deviation (\(df= n_T + n_C - 2\)).In R, we can use the delta.ind.t.diff function from the MOTE package to calculate \(d_\Delta\).

# Cohen's dz for difference scores
# given difference score means and SDs

# For example:
# Control group Mean = 30.4, SD = 22.53, N = 96
# Treatment group Mean = 21.4, SD = 19.59, N = 96
# correlation between conditions: r = .40

stats <- delta.ind.t(
  m1 = 30.4,
  m2 = 21.4,
  sd1 = 22.53,
  sd2 = 19.59,
  n1 = 96,
  n2 = 96,
  a = 0.05
)

# print just the d value and confidence intervals
data.frame(d = apa(stats$d), 
           dlow = apa(stats$dlow), 
           dhigh = apa(stats$dhigh))
      d  dlow dhigh
1 0.399 0.140 0.712

7.4 Repeated Measures Designs

In a repeated-measures design, the same subjects (or items, etc.) are measured on two or more separate occasions, or in multiple conditions within a single session, and we want to know the mean difference between those occasions or conditions (Baayen, Davidson, and Bates 2008; Barr et al. 2013). An example of this would be in a pre/post comparison where subjects are tested before and after undergoing some treatment (see Figure 7.1 for a visualization). A standardized mean difference in a repeated-measures design can take on a few different forms that we define below.

Figure 7.1: Figure displaying simulated data of a repeated measures design, the x-axis shows the condition (e.g., pre-test and post-test) and y-axis is the scores. Lines indicate within person pre/post change.

7.4.1 Difference Score \(d\) (\(d_z\))

Instead of comparing the means of two sets of scores, a within subject design allows us to subtract the scores obtained in condition 1 from the scores in condition 2. These difference scores (\(X_{\text{diff}}=X_2-X_1\)) can be used similarly to the single group design (if the target value was zero, i.e., \(C=0\)) such that (equation 2.3.5, Cohen 1988),

\[ d_z = \frac{M_{\text{diff}}}{S_{\text{diff}}} \tag{7.10}\]

Where the difference between this formulation and the single group design is the nature of the scores (difference scores rather than raw scores). The convenient thing about \(d_z\) is that it has a straight-forward relationship with the \(t\)-statistic, \(d_z=\frac{t}{\sqrt{n}}\). This makes it very useful for power analyses. If the standard deviation of difference scores are not accessible, then it can be calculated using the standard deviation of condition 1 (\(S_1\)), the standard deviation of condition 2 (\(S_2\)), and the correlation between conditions (\(r\)) (equation 2.3.6, Cohen 1988):

\[ S_{\text{diff}}=\sqrt{S^2_1 + S^2_2 - 2 r S_1 S_2} \tag{7.11}\]

It is important to note that when the correlation between groups is large, then the \(d_z\) value will also be larger, whereas a small correlation will return a smaller \(d_z\) value. The standard error of \(d_z\) can be calculated similarly to the single group design such that,

\[ SE_{d_z} = \sqrt{\frac{1}{n}+\frac{d_z^2}{2n}} \tag{7.12}\]

In R, we can use the d.ind.t.diff function from the MOTE package to calculate \(d_z\).

# Cohen's dz for difference scores
# given difference score means and SDs

# For example:
# Difference Score Mean = 21.4, SD = 19.59, N = 96

library(MOTE)

stats <- d.dep.t.diff(
  m = 21.4,
  sd = 19.59,
  n = 96,
  a = 0.05
)

# print just the d value and confidence intervals
data.frame(d = apa(stats$d), 
           dlow = apa(stats$dlow), 
           dhigh = apa(stats$dhigh))
      d  dlow dhigh
1 1.092 0.837 1.344

The output shows that the effect size is \(d_z\) = 1.09, 95% CI [0.84, 1.34].

7.4.2 Repeated Measures \(d\) (\(d_{rm}\))

For a within-group design, we want to compare the means of scores obtained from condition 1 and condition 2. The repeated measures standardized mean difference between the two conditions can be calculated by (equation 9, Lakens 2013),

\[ d_{rm} = \frac{M_2-M_1}{S_w}. \tag{7.13}\]

A positive \(d_{rm}\) value would indicate that the mean of condition 2 is larger than the mean of condition 1. The standardizer here is the within-subject standard deviation, \(S_w\). The within-subject standard deviation can be defined as,

\[ S_{w}=\sqrt{\frac{S^2_1 + S^2_2 - 2 r S_1 S_2}{2(1-r)}}. \tag{7.14}\]

We can also express \(S_w\) in terms of the standard deviation of difference scores (\(S_{\text{diff}}\)),

\[ S_w = \frac{S_{\text{diff}}}{ \sqrt{2(1-r)} }. \tag{7.15}\]

Furthermore, we can even express \(d_{rm}\) in terms of the difference score standardized mean difference (\(d_z\)),

\[ d_{rm} = d_z \times \sqrt{2(1-r)}. \tag{7.16}\]

Ultimately the \(d_{rm}\) is more appropriate as an effect size estimate for use in meta-analysis whereas \(d_z\) is more appropriate for power analysis (Lakens 2013). The standard error for \(d_{rm}\) can be computed as,

\[ SE_{d_{rm}} = \sqrt{\left(\frac{1}{n} + \frac{d^2_{rm}}{2n}\right) \times 2(1-r)} \tag{7.17}\]

In R, we can use the d.ind.t.rm function from the MOTE package to calculate the repeated measures standardized mean difference (\(d_{rm}\)).

# Cohen's d for repeated measures
# given means and SDs and correlation

# For example:
# Condition 1 Mean = 30.4, SD = 22.53, N = 96
# Condition 2 Mean = 21.4, SD = 19.59, N = 96
# correlation between conditions: r = .40

stats <- d.dep.t.rm(
  m1 = 30.4,
  m2 = 21.4,
  sd1 = 22.53,
  sd2 = 19.59,
  r = .40,
  n = 96,
  a = 0.05
)

# print just the d value and confidence intervals
data.frame(d = apa(stats$d), 
           dlow = apa(stats$dlow), 
           dhigh = apa(stats$dhigh))
      d  dlow dhigh
1 0.425 0.215 0.633

The output shows that the effect size is \(d_{rm}\) = 0.42, 95% CI [0.21, 0.63].

7.4.3 Average Variance \(d\) (\(d_{av}\))

The problem with \(d_{z}\) and \(d_{rm}\), is that they require the correlation between conditions. In practice, correlations between conditions are frequently not reported. An alternative estimator of Cohen’s \(d\) in repeated measures design is to simply use the classic variation of cohen’s \(d\) (i.e., pooled standard deviation). In a repeated measures design, the sample size does not change between conditions. Therefore weighting the variance of condition 1 and condition 2 by their respective degrees of freedom (i.e., \(df=n-1\)) is an unnecessary step. Instead, we can standardize by the square root of the average the variances of condition 1 and 2 (see equation 5, Algina and Keselman 2003):

\[ d_{av} = \frac{M_2 - M_1}{\sqrt{\frac{S_1^2 + S_2^2}{2}}} \tag{7.18}\]

This formulation is convenient especially when the correlation is not present, however without the correlation it fails to take into account the consistency of change between conditions. The standard error of the \(d_{av}\) can be expressed as (equation 9, Algina and Keselman 2003),

\[ SE_{d_{av}}= \sqrt{\frac{2(S^2_1 + S^2_2 - 2rS_1S_2)}{n(S_1^2+S^2)}} \tag{7.19}\]

In R, we can use the d.ind.t.rm function from the MOTE package to calculate the repeated measures standardized mean difference (\(d_{rm}\)).

# Cohen's d for repeated measures (average variance)
# given means and SDs 

# For example:
# Condition 1 Mean = 30.4, SD = 22.53, N = 96
# Condition 2 Mean = 21.4, SD = 19.59, N = 96

stats <- d.dep.t.avg(
  m1 = 30.4,
  m2 = 21.4,
  sd1 = 22.53,
  sd2 = 19.59,
  n = 96,
  a = 0.05
)

# print just the d value and confidence intervals
data.frame(d = apa(stats$d), 
           dlow = apa(stats$dlow), 
           dhigh = apa(stats$dhigh))
      d  dlow dhigh
1 0.427 0.217 0.635

The output shows that the effect size is \(d_{av}\) = 0.43, 95% CI [0.22, 0.64].

7.4.4 Becker’s \(d\) (\(d_b\))

An even simpler variant of repeated measures \(d\) value comes from Becker (1988). Becker’s \(d\) standardizes simply by the pre-test standard deviation when the comparison is a pre/post design,

\[ d_b = \frac{M_{\text{post}}-M_{\text{pre}}}{S_{\text{pre}}}. \tag{7.20}\]

The convenient interpretation of “change in baseline standard deviations” can be quite useful. We can also obtain the standard error with (equation 13, Becker 1988),

\[ SE_{d_b} = \sqrt{\frac{2(1-r)}{n}+\frac{d_b^2}{2n}} \tag{7.21}\]

Notice that even though the formula for calculating \(d_b\) did not include the correlation coefficient, the standard error does.

In base R, we can calculate Becker’s formulation of standardized mean difference using the equations above.

# Install the package below if not done so already
# install.packages(escalc)
# Cohen's d for repeated measures (becker's d)
# given means, the pre-test SDs, and the correlation

# For example:
# Pre-test Mean = 21.4, SD = 19.59, N = 96
# Post-test Mean = 30.4, N = 96
# Correlation between conditions: r = .40

Mpre <- 21.4
Mpost <- 30.4
Spre <- 19.59
r <- .40
n <- 96
a <- 0.05

d <- (Mpost - Mpre) / Spre

SE <- sqrt( 2*(1-r)/n + d^2/(2*n) )

# print just the d value and confidence intervals
data.frame(d = apa(d), 
           dlow = apa(d - 1.96*SE), 
           dhigh = apa(d + 1.96*SE))
      d  dlow dhigh
1 0.459 0.231 0.688

The output shows that the effect size is \(d_{rm}\) = 0.46, 95% CI [0.23, 0.69].

7.4.5 Comparing Repeated Measures \(d\) values

Figure 7.2 shows repeated measures designs with a high (\(r=\) .95) and low (\(r=\) .05) correlation between conditions. Let us fix the standard deviations and means for both conditions (i.e., high and low correlation) and only vary the correlation. Now we can compare the repeated measures estimators based on these two conditions shown in Figure 7.2:

  • High correlation:
    • \(d_z=1.24\)
    • \(d_{rm}=0.39\)
    • \(d_{av}=0.43\)
    • \(d_{b}=0.40\)
  • Low correlation:
    • \(d_z=0.31\)
    • \(d_{rm}=0.43\)
    • \(d_{av}=0.43\)
    • \(d_{b}=0.40\)

We notice that the correlation greatly influences \(d_z\) more than any other estimator. The \(d_{rm}\) value has very little change, whereas \(d_{av}\) and \(d_{b}\) do not take into account the correlation at all.

Figure 7.2: Figure displaying simulated data of a repeated measures design, the x-axis shows the condition (e.g., pre-test and post-test) and y-axis is the scores. Left panel shows a high pre/post correlation (\(r\) = .95) and right panel shows a low correlation condition (\(r\) = .05). Lines indicate within person pre/post change.

7.5 Pretest-Posttest-Control Group Designs

In many areas of research both between and within group factors are incorporated. For example, in research involving the examination of the effects of an intervention often a sample is randomised into two seperate groups (intervention and control) and then they are measured on the outcome of interest both before (pretest) and after (posttest) the intervention/control period. In these types of 2x2 (group x time) study designs it is usually the difference between the standardised mean change for the intervention/treatment (\(T\)) and control (\(C\)) groups that is of interest. For a visualization of a pretest-posttest-control group design see Figure 7.3.

Morris (2008) details three effect sizes for this pretest-posttest-control (PPC).

Figure 7.3: Illustration of a pre-post control design. Left panel shows the pre-post difference in the control group and right panel shows the pre-post difference in the intervention/treatment group. Lines indicate within person pre/post change.

7.5.1 PPC1 - separate pre-test standard deviations

The separate pre-test (i.e., baseline) standard deviations are used to standardize the pre/post mean difference in the intervention group and the control group respectively (see equation 4, Morris 2008),

\[ d_T = \frac{M_{T,\text{post}} - M_{T,\text{pre}}}{S_{T,\text{pre}}} \tag{7.22}\]

\[ d_C = \frac{M_{C,\text{post}} - M_{C,\text{pre}}}{S_{C,\text{pre}}} \tag{7.23}\]

Note that these effect sizes are identical to the Becker’s \(d\) formulation of the SMD (see Section 7.4.4). Therefore the pretest-posttest-control group effect size is simply the difference between the intervention and control pre/post SMD (equation 15, Becker 1988),

\[ d_{PPC1} = d_T - d_C \tag{7.24}\]

The asymptotic standard error of \(d_{PPC2}\) was first derived by Becker (1988) and can be expressed as the square root of the sum of the sampling variances (equation 16, Becker 1988)

\[ SE_{d_{PPC1}} = \sqrt{\left[\frac{2(1-r_T)}{n_T} + \frac{d_T}{2n_T}\right] + \left[\frac{2(1-r_C)}{n_C} + \frac{d_C}{2n_C}\right]} \tag{7.25}\]

We can calculate \(d_{PPC1}\) and it’s confidence intervals using base R:

# Example:

# Control Group (N = 90)
## Pre-test Mean = 20, SD = 6
## Post-test Mean = 25, SD = 7
## Pre/post correlation = .50
M_Cpre <- 20
M_Cpost <- 25
SD_Cpre <- 6
SD_Cpost <- 7
rC <- .50
nC <- 90

# Intervention Group (N = 90)
## Pre-test Mean = 20, SD = 5
## Post-test Mean = 27, SD = 8
## Pre/post correlation = .50
M_Tpre <- 20
M_Tpost <- 27
SD_Tpre <- 5
SD_Tpost <- 8
rT <- .50
nT <- 90

# calculate the observed standardized mean difference
dT <- (M_Tpost- M_Tpre) / SD_Tpre
dC <- (M_Cpost - M_Cpre) / SD_Cpre
dPPC1 <- dT - dC

# calculate the standard error
SE <- sqrt( 2*(1-rT)/nT + dPPC1^2/(2*nT) + 2*(1-rC)/nC + dPPC1^2/(2*nC) )

# print the d value and confidence intervals
data.frame(d = MOTE::apa(dPPC1),
           dlow = MOTE::apa(dPPC1 - 1.96*SE),
           dhigh = MOTE::apa(dPPC1 + 1.96*SE))
      d  dlow dhigh
1 0.567 0.252 0.881

The output shows a pre-post intervention effect of \(d_{PPC1}\) = 0.57 [0.25, 0.88].

7.5.2 PPC2 - pooled pre-test standard deviations

The pooled pre-test (i.e., baseline) standard deviations can be used to standardized the difference in pre/post change between intervention and control groups such that (equation 8, Morris 2008),

\[ d_{PPC2} = \frac{(M_{T,\text{post}} - M_{T,\text{pre}}) - (M_{C,\text{post}} - M_{C,\text{pre}})}{S_{p,\text{pre}}} \tag{7.26}\]

where

\[ S_{p,\text{pre}} = \sqrt{\frac{(n_T-1)S^2_{T,\text{pre}} + (n_C - 1)S^2_{C,\text{pre}}}{n_T + n_C - 2}}. \tag{7.27}\]

The distribution of \(d_{PPC2}\) was described by Morris (2008) and can be expressed as (adapted from equation 16, Morris 2008),

\[ \small{SE_{d_{PPC2}} = \sqrt{2\left(1-\frac{n_T r_T + n_C r_C}{n_T + n_C}\right)\left(\frac{n_T + n_C}{n_T n_C}\right)\left[1 + \frac{d^2_{PPC2}}{2\left(1-\frac{n_T r_T + n_C r_C}{n_T + n_C}\right)\left(\frac{n_T + n_C}{n_T n_C}\right)}\right] - d^2_{PPC2}}} \tag{7.28}\]

Note the original equation shown in the paper by Morris (2008) uses the population pre/post correlation \(\rho\), however in the equation above we replace \(\rho\) with the sample size weighted average of the Pearson correlation computed in the treatment group and the control group (i.e., \(\rho \approx \frac{n_T r_T + n_C r_C}{n_T + n_C}\)).

We can use base R to obtain \(d_{PPC2}\) and confidence intervals:

# Example:

# Control Group (N = 90)
## Pre-test Mean = 20, SD = 6
## Post-test Mean = 25, SD = 7
## Pre/post correlation = .50
M_Cpre <- 20
M_Cpost <- 25
SD_Cpre <- 6
SD_Cpost <- 7
rC <- .50
nC <- 90

# Intervention Group (N = 90)
## Pre-test Mean = 20, SD = 5
## Post-test Mean = 27, SD = 8
## Pre/post correlation = .50
M_Tpre <- 20
M_Tpost <- 27
SD_Tpre <- 5
SD_Tpost <- 8
rT <- .50
nT <- 90

# calculate the observed standardized mean difference
dPPC2 <- ((M_Tpost- M_Tpre) - (M_Cpost - M_Cpre)) / sqrt( ( (nT - 1)*(SD_Tpre^2) + (nC - 1)*(SD_Cpre^2) ) / (nT + nC - 2) )

# calculate the standard error
SE <-  sqrt(2*(1-( (nT*rT+nC*rC)/(nT + nC))) * ((nT+nC)/(nT*nC)) * (1 + (dPPC2^2 / (2*(1 - ((nT*rT+nC*rC)/(nT+nC))) * ((nT+nC)/(nT*nC)))))) - dPPC2

# print the d value and confidence intervals
data.frame(d = MOTE::apa(dPPC2),
           dlow = MOTE::apa(dPPC2 - 1.96*SE),
           dhigh = MOTE::apa(dPPC2 + 1.96*SE))
      d  dlow dhigh
1 0.362 0.304 0.420

The output shows a pre-post intervention effect of \(d_{PPC2}\) = 0.36 [0.30, 0.42].

7.5.3 PPC3 - pooled pre- and post-test

The two previous effect sizes only use the pretest standard deviation. But if we are happy to assume that pretest and posttest variances are homogenous1 the pooled pre-test and post-test standard deviations can be used to standardized the difference in pre/post change between intervention and control groups such that (equation 8, Morris 2008),

\[ d_{PPC3} = \frac{(M_{T,\text{post}} - M_{T,\text{pre}}) - (M_{C,\text{post}} - M_{C,\text{pre}})}{S_{p,\text{pre-post}}}, \tag{7.29}\]

where,

\[ S_{p,\text{pre-post}} = \sqrt{\frac{(n_T-1)\left(S^2_{T,\text{pre}} + S^2_{T,\text{post}}\right) + (n_C - 1)\left(S^2_{C,\text{pre}} + S^2_{C,\text{post}}\right)}{2(n_T + n_C - 2)}}. \tag{7.30}\]

The standard error for \(d_{PPC2}\) is currently unknown. An option to estimate this standard error is to use a non-parametric or parametric bootstrap by repeatedly sampling the raw data, or if the raw data is not available resample simulated data. We can do this in base R by simulating pre/post data using the mvrnorm() function from the MASS package (Venables and Ripley 2002):

# Install the package below if not done so already
# install.packages(MASS)

# Example:

# Control Group (N = 90)
## Pre-test Mean = 20, SD = 6
## Post-test Mean = 25, SD = 7
## Pre/post correlation = .50
M_Cpre <- 20
M_Cpost <- 25
SD_Cpre <- 6
SD_Cpost <- 7
rC <- .50
nC <- 90

# Intervention Group (N = 90)
## Pre-test Mean = 20, SD = 5
## Post-test Mean = 27, SD = 8
## Pre/post correlation = .50
M_Tpre <- 20
M_Tpost <- 27
SD_Tpre <- 5
SD_Tpost <- 8
rT <- .50
nT <- 90

# simulate data
set.seed(1) # set seed for reproducibility
boot_dPPC3 <- c()
for(i in 1:1000){
  # simulate control group pre-post data
  data_C <- MASS::mvrnorm(n = nC,
                          # input observed means
                          mu = c(M_Cpre,M_Cpost),
                          # input observed covariance matrix
                          Sigma = data.frame(pre = c(SD_Cpre^2, rC*SD_Cpre*SD_Cpost), 
                                             post = c(rC*SD_Cpre*SD_Cpost,SD_Cpost^2)))
  # simulate intervention group pre-post data
  data_T <- MASS::mvrnorm(n = nT,
                          # input observed means
                          mu = c(M_Tpre,M_Tpost),
                          # input observed covariance matrix
                          Sigma = data.frame(pre = c(SD_Tpre^2, rT*SD_Tpre*SD_Tpost), 
                                             post = c(rT*SD_Tpre*SD_Tpost,SD_Tpost^2)))
  
  # calculate the mean difference in pre/post change (the numerator)
  MeanDiff <- (mean(data_T[,2]) - mean(data_T[,1])) - (mean(data_C[,2]) - mean(data_C[,1]))
  
  # calculate the pooled pre-post standard deviation (the denominator)
  S_Pprepost <-  sqrt( ( (nT - 1)*(sd(data_T[,1])^2+sd(data_T[,2])^2) + (nC - 1)*(sd(data_C[,1])^2+sd(data_C[,2])^2) ) / (nT + nC - 2) )
  
  # calculate the standardized mean difference for each bootstrap iteration
  boot_dPPC3[i] <- MeanDiff / S_Pprepost
}

# calculate bootstrapped standard error
SE <- sd(boot_dPPC3)

# calculate the observed standardized mean difference
dPPC3 <- ((M_Tpost- M_Tpre) - (M_Cpost - M_Cpre)) / sqrt( ( (nT - 1)*(SD_Tpre^2+SD_Tpost^2) + (nC - 1)*(SD_Cpre^2+SD_Cpost^2) ) / (nT + nC - 2) )

#print the d value and confidence intervals
data.frame(d = MOTE::apa(dPPC3),
           dlow = MOTE::apa(dPPC3 - 1.96*SE),
           dhigh = MOTE::apa(dPPC3 + 1.96*SE))
      d  dlow dhigh
1 0.214 0.002 0.427

The output shows a pre-post intervention effect of \(d_{PPC3}\) = 0.21 [0.002, 0.43].

7.6 Small Sample Bias in \(d\) values

All the estimators of \(d\) listed above are biased estimates of the population \(d\) value, specifically they all over-estimate the population value in small sample sizes. To adjust for this bias, we can apply a correction factor based on the degrees of freedom. The degrees of freedom will largely depend on the estimator used. The degrees of freedom for each estimator is listed below:

  • Single Group design (\(d_s\)): \(df = n-1\)
  • Between Groups - Pooled Standard Deviation (\(d_p\)): \(df = n_1+n_2-2\)
  • Between Groups - Control Group Standard Deviation (\(d_\Delta\)): \(df = n_C-1\)
  • Repeated Measures - all types (\(d_z\), \(d_{rm}\), \(d_{av}\), \(d_{b}\)): \(df = n-1\)
  • Pretest-Posttest-Control Separate Standard Deviation (\(d_{PPC1}\)): \(df=n_C−1\)
  • Pretest-Posttest-Control Pooled Pretest Standard Deviation (\(d_{PPC2}\)): \(df=n_T+n_C−2\)
  • Pretest-Posttest-Control Pooled Pretest and Posttest Standard Deviation (\(d_{PPC3}\)): \(df=2(n_T+n_C−2)\)

With the appropriate degrees of freedom, we can use the following correction factor, \(CF\), to obtain an unbiased estimate of the population standardized mean difference:

\[ CF = \frac{\Gamma\left(\frac{df}{2}\right)}{\Gamma\left(\frac{df-1}{2}\right)\sqrt{\frac{df}{2}}} \tag{7.31}\]

Where \(\Gamma(\cdot)\) is the gamma function. An approximation of this complex formula given by Hedges (1981) can be written as \(CF\approx 1-\frac{3}{4\cdot df -1}\). In R, this can be calculated using,

# Example:
# Group 1 sample size = 20
# Group 2 sample size = 18

n1 <- 20
n2 <- 18

df <- n1 + n2 - 2

CF <- gamma(df/2) / ( sqrt(df/2) * gamma((df-1)/2) )

CF
[1] 0.9789964

This correction factor can then be applied to any of the estimators mentioned above,

\[ d^* = d\times CF \tag{7.32}\]

The corrected \(d\) value, \(d^*\), is commonly referred to as Hedges’ \(g\) or just \(g\). To avoid notation confusion we will just add an asterisk to \(d\) to denote the correction. We also need to correct the standard error for \(d^*\)

\[ SE_{d^*} = SE_{d} \times CF \tag{7.33}\]

These standard errors can then be used to calculate the confidence interval of the corrected \(d\) value,

\[ CI_{d*} = d^* \pm 1.96\times SE_{d^*} \tag{7.34}\]

# Example:
# Cohen's d = .50, SE = .10

d = .50
SE = .10

# correct d value and CIs small sample bias
d_corrected <- d * CF
SE_corrected <- SE * CF
dlow_corrected <- d_corrected - 1.96*SE_corrected
dhigh_corrected <- d_corrected + 1.96*SE_corrected

# print just the d value and confidence intervals
data.frame(d = apa(d), 
           dlow = apa(dlow_corrected), 
           dhigh = apa(dhigh_corrected))
      d  dlow dhigh
1 0.500 0.298 0.681

The output shows that the corrected effect size is \(d^*\) = 0.50, 95% CI [0.30, 0.68].

7.7 Ratios of Means

Another common approach, particularly within the fields of ecology and evolution, is to take the natural logarithm of the ratio between two means; the so-called Response Ratio (\(lnRR\)). This is sometimes more favorable as, due to its construction using the standard deviation in some form as a denominator, the various versions of standardized mean differences are impacted by the estimate of this parameter for which studies are often less powered compared to mean magnitudes (Yang et al. 2022). For the \(lnRR\) however the standard deviation only impacts its variance estimation and not the point estimate. A limitation of the lnRR however is that it is limited to data that are observed on a ratio scale (i.e., have an absolute zero and instances of it are related ordinally and additively meaning both means will be positive).

Although strictly speaking the \(lnRR\) is not a difference in means in an additive sense as the above standardized mean difference effect sizes are, it can in one sense be considered to reflect the difference in means on the multiplicative scale. In fact, after calculation it is often transformed to reflect the percentage difference or change between means: \(100\times \exp(lnRR)-1\). However, this can introduce transformation induced bias because a non-linear transformation of a mean value is not generally equal to the mean of the transformed value. In the context of meta-analysis combining \(lnRR\) estimated across studies a correct factor can be applied: \(100\times \exp(lnRR+0.5 S^2_\text{total})-1\), where \(S^2_\text{total}\) is the variance of all \(lnRR\) values.

Similarly to the various standardized mean differences, there are varied calculations for the lnRR dependent upon the study design being used (see Senior, Viechtbauer, and Nakagawa 2020).

7.7.1 lnRR for Independent Groups (\(lnRR_\text{ind}\))

The lnRR can be calculated when groups are independent as follows,

\[ lnRR_\text{ind}=\ln\left(\frac{M_T}{M_C}\right)+CF \tag{7.35}\]

Where \(M_T\) and \(M_C\) are the means for the treatment and control group respectively and \(CF\) is the small sample correction factor calculated as,

\[ CF = \frac{S^2_T}{2n_TM_T^2} - \frac{S^2_C}{2n_CM_C^2} \tag{7.36}\]

The standard error can be calculated as,

\[ SE_{lnRR_\text{ind}} = \sqrt{ \frac{S^2_T}{n_T M_T^2} + \frac{S^2_C}{n_C M_C^2} +\frac{S^4_T}{2n^2_T M_T^4} + \frac{S^4_C}{2n^2_C M_C^4}} \tag{7.37}\]

Using R we can easily calculate this effect size using the escalc() function in the metafor package (Viechtbauer 2010):

# lnRR for two independent groups
# given means and SDs

# For example:
# Group 1 Mean = 30.4, Standard deviation = 22.53, Sample size = 96
# Group 2 Mean = 21.4, Standard deviation = 19.59, Sample size = 96

library(metafor)


# prepare the data
M1 <- 30.4
M2 <- 21.4
SD1 <- 22.53
SD2 <- 19.59
N1 = 96
N2 = 96

# calculate lnRRind and standard error
lnRRind <- escalc(measure = "ROM", 
               m1i = M1,
               m2i = M2,
               sd1i = SD1,
               sd2i = SD2,
               n1i = N1,
               n2i = N2)

lnRRind$SE <- sqrt(lnRRind$vi)

# calculate confidence interval
lnRRind$CIlow <- lnRRind$yi - 1.96*lnRRind$SE
lnRRind$CIhigh <-  lnRRind$yi + 1.96*lnRRind$SE

# print the VR value and confidence intervals
data.frame(lnRRind = MOTE::apa(lnRRind$yi),
           lnRRind_low = MOTE::apa(lnRRind$CIlow),
           lnRRind_high = MOTE::apa(lnRRind$CIhigh))
  lnRRind lnRRind_low lnRRind_high
1   0.351       0.115        0.587

The example shwos a natural log response ratio of \(lnRR_\text{ind}\) = 0.35 [0.12, 0.59].

7.7.2 lnRR for dependent groups (\(lnRR_\text{dep}\))

The lnRR can be calculated when groups are dependent (i.e., same subjects in both conditions), for example a pre-post comparison, as follows,

\[ lnRR_\text{dep} = \ln\left(\frac{M_2}{M_1}\right) + CF \tag{7.38}\]

Where \(CF\) is the small sample correct factor calculated as,

\[ CF = \frac{S^2_2}{2nM^2_2} - \frac{S^2_1}{2nM^2_1} \tag{7.39}\]

The standard error can then be calculated as,

\[ \small{SE_{lnRR_\text{dep}} = \sqrt{ \frac{S^2_1}{n M_1^2} + \frac{S^2_2}{n M_2^2} + \frac{S^4_1}{2n^2M^4_1} + \frac{S^4_2}{2n^2M^4_2} + \frac{2rS_1 S_2}{n M_1 M_2} + \frac{r^2S^2_1 S^2_2 (M_1^4 + M_2^4)}{2n^2 M_1^4 M_2^4}}} \tag{7.40}\]

Using R we can easily calculate this effect size using the escalc() function from the metafor package as follows:

# lnRR for two dependent groups
# given means and SDs


# For example:
# Mean 1 = 30.4, Standard deviation 1 = 22.53
# Mean 2 = 21.4, Standard deviation 2 = 19.59
# Sample size = 96
# Correlation = 0.4

library(metafor)


# prepare the data
M1 <- 30.4
M2 <- 21.4
SD1 <- 22.53
SD2 <- 19.59
N = 96
R = 0.4


# calculate lnRR and standard error
lnRRdep <- escalc(measure = "ROMC", 
               m1i = M1,
               m2i = M2,
               sd1i = SD1,
               sd2i = SD2,
               ni = N,
               ri = R)

# obtain standard error from sqrt of sampling variance
lnRRdep$SE <- sqrt(lnRRdep$vi)


# calculate confidence interval
lnRRdep$CIlow <- lnRRdep$yi - 1.96*lnRRdep$SE
lnRRdep$CIhigh <-  lnRRdep$yi + 1.96*lnRRdep$SE




# print the VR value and confidence intervals
data.frame(lnRRdep = MOTE::apa(lnRRdep$yi),
           lnRRdep_low = MOTE::apa(lnRRdep$CIlow),
           lnRRdep_high = MOTE::apa(lnRRdep$CIhigh))
  lnRRdep lnRRdep_low lnRRdep_high
1   0.351       0.167        0.535

The example shwos a natural log response ratio of \(lnRR_\text{dep}\) = 0.35 [0.17, 0.54].


  1. Note, this may not be the case especially where there is a mean-variance relationship and one (usually the intervention) group has a higher posttest mean score.↩︎