# Case-only approach applied in environmental epidemiology: 2 examples of interaction effect using the US National Health and Nutrition Examination Survey (NHANES) datasets | BMC Medical Research Methodology

### Basic assumption: the joint and ICR on the multiplicative scale

Statistical interactions between the effects of susceptibility factors and those of environmental factors can be assessed as departures from multiplicativity of effects or as departures from additivity of effects. Table 2 indicates an example of a study with cases and non-cases. With the unexposed and no susceptibility (E-G-) group set as the reference group, we can calculate relative risk (RR) and odds ratio (OR) for all other 3 groups.

The joint RR for the susceptibility factor and environmental exposure (RR_{se}) can be compared with the RR for environmental exposure alone (RR_{e}) or with the RR for susceptibility factor alone (RR_{s}). The joint OR for the susceptibility factor and environmental exposure (OR_{se}) can be compared with the OR for environmental exposure alone (OR_{e}) or with the OR for susceptibility factor alone (OR_{s}). In the joint RR model with the additive scale, the ICR (ICR_{c/nc}) indicates the departures from the sum of individual RRs minus one (ICR_{c/nc} = RR_{se}-(RR_{s} + RR_{e}-1)). This equation is called ‘relative excess risk due to interaction (RERI)’ in epidemiologic literature [15]. In the joint OR model with the additive scale, the ICR (ICR_{c/nc}) indicates the departures from the sum of individual ORs minus one (ICR_{c/nc} = OR_{se}-(OR_{s} + OR_{e}-1)). In the joint RR model with the multiplicative scale, the ICR (ICR_{c/nc}) indicates the departures from the product of individual RRs (ICR_{c/nc} = RR_{se}/(RR_{s} × RR_{e})). In the joint OR model with the multiplicative scale, the ICR (ICR_{c/nc}) indicates the departures from the product of individual ORs (ICR_{c/nc} = OR_{se}/(OR_{s} × OR_{e})). In this article, we used only the joint RR or the joint OR model with the multiplicative scale to estimate the ICR_{c/nc}.

### The ICR in a case-only study and the ICR in a study with cases and non-cases

Table 2 illustrates the composition of a study with cases and non-cases. To generate case-only data from the above source population, we extracted only the ‘case’ column in Table 3.

The ICR in a case-only study will be as follows:

$${mathrm{ICR}}_{mathrm{c}mathrm{o}}=frac{left[frac{left{frac{mathrm{a}}{mathrm{a}+mathrm{e}}right}}{left{frac{mathrm{e}}{mathrm{a}+mathrm{e}}right}}right]}{left[frac{left{frac{mathrm{c}}{mathrm{c}+mathrm{g}}right}}{left{frac{mathrm{g}}{mathrm{c}+mathrm{g}}right}}right]}=left(frac{mathrm{a}mathrm{g}}{mathrm{c}mathrm{e}}right)$$

(1)

The ICR in a study with cases and non-cases will be as follows:

$$kern1em {mathrm{ICR}}_{mathrm{c}/mathrm{nc}}=frac{{mathrm{RR}}_{mathrm{s}mathrm{e}}}{{mathrm{RR}}_{mathrm{s}}{mathrm{RR}}_{mathrm{e}}}=left(frac{mathrm{ag}}{mathrm{c}mathrm{e}}right)left(frac{left(mathrm{c}+mathrm{D}right)left(mathrm{e}+mathrm{F}right)}{left(mathrm{a}+mathrm{B}right)left(mathrm{g}+mathrm{H}right)}right)=left({mathrm{ICR}}_{mathrm{c}mathrm{o}}right)left(frac{left(mathrm{c}+mathrm{D}right)left(mathrm{e}+mathrm{F}right)}{left(mathrm{a}+mathrm{B}right)left(mathrm{g}+mathrm{H}right)}right)$$

(2)

In Eq. (2), (ag/ce) is converted into ICR_{co} obtained in the case-only study. ICR_{c/nc} is the ICR calculated in a study with cases and non-cases. From Eq. (2), the requirement for the equality between the ICR acquired from a study with cases and non-cases and the ICR acquired from the case-only study is as follows:

$$left(frac{left(mathrm{c}+mathrm{D}right)left(mathrm{e}+mathrm{F}right)}{left(mathrm{a}+mathrm{B}right)left(mathrm{g}+mathrm{H}right)}right)=mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}/mathrm{nc}}=1$$

(3)

Equation (3) means that the environmental exposure and the susceptibility factor must be independent in a study with cases and non-cases for the equality between the ICR acquired from a study with cases and non-cases and the ICR acquired from the case-only study. In Eqs. (2) and (3), we should note that the equality between the ICR from a study with case and non-cases and the ICR from the case-only study does not necessarily require a rare disease assumption (a low prevalence of the disease).

The above equations in this subsection can be understood from the context of a logistic model, with other covariates adjusted. The following equations indicate a conventional logistic regression model for a case-only study:

$$mathrm{logit} mathrm{P}left(mathrm{S}=1right)={upgamma}_0+{upgamma}_1 mathrm{E}$$

(4)

$${mathrm{ICR}}_{mathrm{co}}=exp left({upgamma}_1right)$$

(5)

When E is a categorical or continuous variable for environmental exposure status, a case-only estimate for the interaction effect can be obtained using Eq. (5).

We can also assess the independence between an environmental factor and a susceptibility factor in a study with cases and non-cases from the context of a logistic model using the following equations:

$$mathrm{logit} mathrm{P}left(mathrm{S}=1right)={upeta}_0+{upeta}_1mathrm{E}$$

(6)

$$mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}/mathrm{nc}}=exp left({upeta}_1right)$$

(7)

According to the independence assumption provided in Eq. (3), the environmental exposure and the susceptibility factor must be independent in the population with cases and non-cases for the equality between the ICR obtained in the population with cases, and non-cases and the ICR obtained in the case-only study. From the context of a logistic model, this means that the confidence interval for Eq. (7) must include 1 and that the point estimate for Eq. (7) must be close to 1.

We can also calculate the ICR obtained in the population with cases and non-cases from the context of a logistic model, using the following equation:

$$mathrm{logit} mathrm{P}left(mathrm{D}=1right)={upbeta}_0+{upbeta}_1mathrm{S}+{upbeta}_2mathrm{E}+{upbeta}_3mathrm{SE}$$

(8)

$${mathrm{ICR}}_{mathrm{c}/mathrm{nc}}=exp left({upbeta}_3right)$$

(9)

### The ICR in a case-control study

We can define the susceptibility-environment ICR acquired from a case-control study in the model with the multiplicative scale as follows:

$${mathrm{ICR}}_{mathrm{cc}}={mathrm{OR}}_{mathrm{s}mathrm{e}}/left({mathrm{OR}}_{mathrm{s}}times {mathrm{OR}}_{mathrm{e}}right)$$

(10)

ICR_{cc}: the ICR calculated in a case-control study.

ICR_{cc} > 1: The joint OR is larger than the product of each individual OR.

ICR_{cc} < 1: The joint OR is smaller than the product of each individual OR.

ICR_{cc} = 1: The joint OR is the same as the product of each individual OR.

If the joint OR is larger than the product of each individual OR, the ICR_{cc} will be larger than 1. If the joint OR is smaller than the product of each individual OR, the ICR_{cc} will be smaller than 1. If the joint OR is the same as the product of each individual OR, the ICR_{cc} will be 1.

### The ICR in a case-only study and the ICR in a case-control study

For the generation of the case-control study data, a fraction (p) of controls in each group was selected from the population with cases and non-cases in Table 4.

The ICR in a case-control study can be calculated as follows:

$${mathrm{ICR}}_{mathrm{cc}}=frac{{mathrm{OR}}_{mathrm{s}mathrm{e}}}{{mathrm{OR}}_{mathrm{s}}{mathrm{OR}}_{mathrm{e}}}=left(frac{mathrm{ag}}{mathrm{ce}}right)left(frac{mathrm{DF}}{mathrm{BH}}right)=left({mathrm{ICR}}_{mathrm{co}}right)left(frac{mathrm{DF}}{mathrm{BH}}right)$$

(11)

In Eq. (11), the requirement for equality between ICR_{cc} and ICR_{co} is as follows:

$$left(frac{mathrm{DF}}{mathrm{BH}}right)=frac{mathrm{df}}{mathrm{bh}}=mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{control}}=1$$

(12)

Equation (12) means that for the equality between ICR_{cc} and ICR_{co}, the susceptibility factor and environmental exposure must be independent in the control population. A rare disease assumption is also not required for this equality.

We can also calculate the ICR in a case-control study from the context of a logistic model, using the following equation:

$$mathrm{logit} mathrm{P}left(mathrm{D}=1right)={upbeta}_0+{upbeta}_1mathrm{S}+{upbeta}_2mathrm{E}+{upbeta}_3mathrm{SE}$$

(13)

$${mathrm{ICR}}_{mathrm{cc}}=exp left({upbeta}_3right)$$

(14)

### The ICR in a study with cases and non-cases and the ICR in a case-control study

The equality between ICR_{cc} and ICR_{co} does not mean that these 2 estimates are not biased away from the ICR acquired from the population with cases and non-cases (ICR_{c/nc}). Based on Eqs. (2) and (11), we can get the following equation:

$${mathrm{ICR}}_{mathrm{c}mathrm{c}}={mathrm{ICR}}_{mathrm{c}/mathrm{nc}} frac{left(mathrm{DF}right)}{left(mathrm{BH}right)} left(frac{left(mathrm{a}+mathrm{B}right)left(mathrm{g}+mathrm{H}right)}{left(mathrm{c}+mathrm{D}right)left(mathrm{e}+mathrm{F}right)}right)$$

(15)

In Eq. (15), for the equality between ICR_{cc} and ICR_{c/nc}, the following equation or at least 1 of 2 conditions suggested below should be met:

$${displaystyle begin{array}{c}frac{left(mathrm{DF}right)}{left(mathrm{BH}right)} left(frac{left(mathrm{a}+mathrm{B}right)left(mathrm{g}+mathrm{H}right)}{left(mathrm{c}+mathrm{D}right)left(mathrm{e}+mathrm{F}right)}right)=1 {}left[mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}mathrm{ontrol}}=mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}/mathrm{nc}}=1right] mathrm{or} left[mathrm{the} mathrm{disease} mathrm{is} mathrm{rare}right]end{array}}$$

(16)

Equation (16) means that for the equality between ICR_{cc} and ICR_{c/nc}, the susceptibility factor and the environmental exposure must be independent both in the population with cases and non-cases and in the controls. Alternatively, if the disease is rare, Eq. (16) will be satisfied. In this case, the rare disease assumption must be examined in the population with cases and non-cases.

### S-E independence in the population with cases and non-cases and S-E independence in the controls: one cannot replace the other

If we evaluate Eq. (16) in detail, we can find an important relationship. The S-E independence in the controls is a totally different concept from the S-E independence in the population with cases and non-cases: one cannot replace the other.

$${mathrm{ICR}}_{mathrm{c}mathrm{c}}={mathrm{ICR}}_{mathrm{c}mathrm{o}}={mathrm{ICR}}_{mathrm{c}/mathrm{nc}}$$

(17)

For the first equal sign, S-E OR_{control} = 1 is required according to Eq. (11).

For the second equal sign, S-E OR_{c/nc} = 1 is required according to Eq. (2).

If the disease is rare, ({mathrm{ICR}}_{mathrm{cc}}=left({mathrm{ICR}}_{mathrm{co}}right)left(frac{mathrm{DF}}{mathrm{BH}}right)) according to Eq. (11), and ({mathrm{ICR}}_{mathrm{c}/mathrm{nc}}=left({mathrm{ICR}}_{mathrm{c}mathrm{o}}right)left(frac{mathrm{DF}}{mathrm{BH}}right)) according to Eq. (2).

$$mathrm{Therefore},kern0.5em {mathrm{ICR}}_{mathrm{c}mathrm{c}}={mathrm{ICR}}_{mathrm{c}/mathrm{nc}}ne {mathrm{ICR}}_{mathrm{c}mathrm{o}}$$

(18)

If a researcher uses whether or not S-E OR_{controls} equals 1, instead of whether or not S-E OR_{c/nc} equals 1, for the assessment of the validity of using ICR_{co} instead of using ICR_{c/nc}, this misuse can lead to either the rejection of the valid ICR_{co} or the acceptance of the invalid ICR_{co} mistakenly.

In Supplementary material B, an example from Gatto et al. [8] is provided for this problem. In the first example, S and E are independent in the population, including cases and non-cases (S-E OR_{c/nc} = 1). The interaction estimate in the population, including cases and non-cases (i.e., ICR_{c/nc}) is 2.5. The ICR_{co} is also 2.5. In this situation, the S-E OR_{control} of 0.7 does not provide a reliable estimation for S-E OR_{c/nc} of 1.0. In the second example, the S-E OR_{c/nc} is 2.0, showing a non-independent relationship. The ICR_{c/nc} is 1.0, but ICR_{co} is 2.0. In this situation, the S-E OR_{control} of 1.0 does not provide a reliable estimation for S-E OR_{c/nc} of 2.0.

### The rare disease assumption: for ICR_{cc} = ICR_{c/nc} and S-E OR_{control} = S-E OR_{c/nc}

The rare disease assumption provides 2 implications in this discussion of the case-only approach. The first implication is provided in Eq. (18). The second implication is the following:

(left(frac{left(mathrm{c}+mathrm{D}right)left(mathrm{e}+mathrm{F}right)}{left(mathrm{a}+mathrm{B}right)left(mathrm{g}+mathrm{H}right)}right)=)S-E OR_{c/nc} from Eq. (3) (mathrm{and} left(frac{mathrm{DF}}{mathrm{BH}}right)=frac{mathrm{df}}{mathrm{bh}}=) S-E OR_{control} from Eq. (12)

$$mathrm{If} mathrm{the} mathrm{disease} mathrm{is} mathrm{rare},mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}/mathrm{nc}}=mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}mathrm{ontrol}}=left(frac{mathrm{DF}}{mathrm{BH}}right)$$

(19)

In this subsection, we will deal with the second implication. Equation (20) indicates the relationship between S-E OR_{control} and S-E OR_{c/nc} [8].

$$mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}mathrm{ontrol}}=mathrm{S}-mathrm{E} {mathrm{OR}}_{mathrm{c}/mathrm{nc}}times left(frac{left(frac{1}{mathrm{p}left(mathrm{D}|mathrm{S}-mathrm{E}-right)}-1right)times left(frac{1}{mathrm{p}left(mathrm{D}|mathrm{S}-mathrm{E}-right)}-{mathrm{RR}}_{mathrm{SE}}right)}{left(frac{1}{mathrm{p}left(mathrm{D}|mathrm{S}-mathrm{E}-right)}-{mathrm{RR}}_{mathrm{G}}right)times left(frac{1}{mathrm{p}left(mathrm{D}|mathrm{S}-mathrm{E}-right)}-{mathrm{RR}}_{mathrm{E}}right)}right)$$

(20)

In Gatto et al. [8], the authors used Eq. (20) to conduct a sensitivity analysis (Supplementary material C). The article assessed the impact of the baseline risk of disease in the population (p(D|S-E-)) and the independent effect of S (RR_{S}) on the S-E OR_{control} when the S-E OR_{c/nc} is 1.0. In Supplementary material C, the baseline risk of disease ranges from 0.1 to 6%. As illustrated in Supplementary material C, the S-E OR_{control} is similar to the S-E OR_{c/nc} of 1.0 when either the baseline risk of disease (p(D|S-E-)) is under 1%, and the independent effect of S is relatively low (RR_{S} < 2.5). However, as the baseline risk of disease approaches 3%, the S-E OR_{control} begins to diverge from the S-E OR_{c/nc} of 1.0. This worsens when the independent effect of the susceptibility factor increases.

### Violation of independence: confounder and subpopulation dependence

The violation of independence between S and E occurs when an individual alters his or her environmental exposure according to his or her susceptibility factor. This violation is due to 2 factors mainly: (i) a confounder and (ii) subpopulation dependence.

Gatto et al. [8] provide 2 examples of confounders. In the first example of Supplementary material D, the family history functions as a confounder, and in the second example of Supplementary material D, the adverse reaction to alcohol functions as a mediator between the susceptibility factor and the environmental exposure. For these 2 examples, the positive multiplicative interaction (ICR_{CO} of > 1) will be biased towards the null (ICR_{CO} ≈ 1) because of the overall negative association between S and E due to C.

If these covariates can be adjusted, the independence between S and E can be restored.

$$mathrm{logit} mathrm{P}left(mathrm{S}=1right)={upgamma_0}^{‘}+{upgamma_1}^{‘}mathrm{E}+{upgamma_2}^{‘}mathrm{C}$$

(21)

$$mathrm{adjusted} {mathrm{ICR}}_{mathrm{CO}}left(mathrm{adjusted} mathrm{for} mathrm{covariate} mathrm{C}right)=exp left({upgamma_1}^{‘}right)$$

(22)

However, a cautious approach is required because the adjustment of unrelated covariates with S-E dependence would cost some degrees of freedom and would reduce the precision of ICR_{CO} [8].

Another source of the violation of independence is a hidden dependence on a subpopulation. Wang et al. [9] provide a unique solution for this problem, providing the following Eq. (9):

$$mathrm{CIR}={mathrm{r}}_{mathrm{S}mathrm{E}} mathrm{times} {mathrm{CV}}_{mathrm{S}} mathrm{times} {mathrm{CV}}_{mathrm{E}}+1$$

(23)

CIR: Confounding Interaction Ratio. r_{SE}: the correlation coefficient between S and E. CV_{S}: variation in susceptibility factor prevalence odds. CV_{E}: variation in environmental exposure prevalence odds.

$${mathrm{CIR}}_{mathrm{U}}=frac{sqrt{upupsilon_{mathrm{S}}{upupsilon}_{mathrm{E}}}times {left(sqrt{upupsilon_{mathrm{S}}{upupsilon}_{mathrm{E}}}+1right)}^2}{left(sqrt{upupsilon_{mathrm{S}}{upupsilon}_{mathrm{E}}}+{upupsilon}_{mathrm{S}}right)left(sqrt{upupsilon_{mathrm{S}}{upupsilon}_{mathrm{E}}}+{upupsilon}_{mathrm{E}}right)}ge 1,{mathrm{CIR}}_{mathrm{L}}=frac{1}{mathrm{U}}le 1$$

(24)

CIR_{U}: the upper bound of CIR, CIR_{L}: the lower bound of CIR, *υ*_{S}(*υ*_{S} ≥ 1): the ratio of the largest and the smallest susceptibility frequency odds across all strata. *υ*_{E}(*υ*_{E} ≥ 1): the ratio of the largest and the smallest exposure frequency odds across all strata.

In Eq. (23), CIR is the ratio of the crude ICR_{c/nc} without stratification over ICR_{c/nc} with stratification. According to the above equation, there would be no population stratification bias (CIR =1), (i) if the exposure prevalence odds and the susceptibility frequency odds are uncorrelated across all strata (r_{ES} = 0), (ii) no variation exists in the exposure prevalence odds (CV_{E} = 0), or (iii) no variation exists in the susceptibility frequency odds (CV_{S} = 0).

In Eq. (24), *υ*_{S}(*υ*_{S} ≥ 1) denotes the ratio of the largest over the smallest susceptibility frequency odds, and *υ*_{E}(*υ*_{E} ≥ 1) denotes the ratio of the largest over the smallest exposure prevalence odds across all the strata in the population. If there is either no variation in the susceptibility frequency odds (*υ*_{S} = 1) or in the exposure prevalence odds (*υ*_{E} = 1), there would be no bias (U = L = 1) according to Eq. (24). If we can calculate CIR for a population, we can calculate ICR_{c/nc} with stratification.

For the violation of S-E independence, researchers usually would try to evaluate a potential confounder based on their subject-matter knowledge. However, for subpopulation dependence, attention should be paid to the whole study population and the strata rather than finding a confounder. This important difference should be in the mind of researchers using a case-only approach.

### The efficiency gained from the case-only approach

Case-only approach can calculate a more precise interaction effect estimate (i.e., that with a narrower confidence interval) than a study design with case and non-cases, such as a cohort/case-control study approach can do [16].

In Eqs. (8) and (9), and Table 2, the asymptotic variance of (hat{upbeta})_{3} in a population with cases and non-cases is as follows:

$$mathrm{Var}left({hat{upbeta}}_3right)=frac{1}{a}+frac{1}{B}+frac{1}{c}+frac{1}{D}+frac{1}{e}+frac{1}{F}+frac{1}{g}+frac{1}{H}$$

(25)

In Eqs. (13) and (14), and Table 4, the asymptotic variance of ({overline{overline{upbeta}}}_3) in a case-control study is as follows:

$$mathrm{Var}left(overline{overline{upbeta_3}}right)=frac{1}{a}+frac{1}{b}+frac{1}{c}+frac{1}{d}+frac{1}{e}+frac{1}{f}+frac{1}{g}+frac{1}{h}$$

(26)

In Eqs. (4), Eq. (5), and Table 3, the asymptotic variance of (hat{gamma})_{1} in a case-only study is as follows:

$$mathrm{Var}left({hat{y}}_1right)=frac{1}{a}+frac{1}{c}+frac{1}{e}+frac{1}{g}$$

(27)

Comparing Eq. (27) with Eqs. (25) and (26), the case-only design can provide an estimate with a narrower confidence interval than either the case-control or the cohort design (study designs with cases and non-cases) can do. This efficiency gain comes from the independence assumption between susceptibility factor and environmental exposure (S-E OR_{c/nc} = 1).

### Methodological issues to be considered

Several issues must be considered when applying the case-only approach to estimating the interaction effect between a susceptibility factor and an environmental exposure. Firstly, the case selection process must follow a typical rule of case selection as in a case-control study. Secondly, researchers must verify independence between the susceptibility trait and the environmental exposure in the population with cases and non-cases to substitute the ICR_{CO} calculated in a case-only design for the ICR_{c/nc} calculated in a population with cases and non-cases (according to Eqs. (2) and (3)). If evidence of an association between susceptibility factor and environmental exposure exists, the calculated S-E OR_{c/nc} must be used to correct the ICR_{CO} by multiplying it as provided in Eq. (2). Thirdly, the independence assumption might seem reasonable for various susceptibility factors and environmental exposures. However, some susceptibility factors can modify the likelihood of environmental exposure. This hidden association must be discovered before a case-only approach is applied. Finally, the interaction effect estimate (ICR_{CO}) obtained from the case-only approach can only be interpreted as a departure from the multiplicative effect and not from the additive effect. However, according to previous epidemiologic literature, additive interaction more closely corresponds to mechanistic biologic interaction effects rather than merely statistical interaction effects [17, 18]. Even though this is true, researchers in the current academic societies often use the multiplicative scale to estimate interaction effects because of several practical reasons [18]. This limitation should be considered when the results of this study are applied.

### Summary

In summary, the case-only approach can be applied to environmental epidemiology successfully when a susceptibility factor and an environmental exposure are independent in a population with cases and non-cases. Through this approach, a more precise interaction effect estimate can be calculated.