The relevance of case-control studies when working with cohort data
Can we use them is different settings?
When a researcher has data from a defined cohort of individuals, such as data from an ongoing multi-centered study or a disease registry, when does it make sense to conduct a case-control study? We’re taught in our classes that a case-control study is an efficient analog of a cohort study that requires less time and data. It is sensible to use it when the outcome is “rare”. Indeed a reasonable justification as it could be unrealistic to wait 4 or 5 years for a few individuals to develop the outcome in order to conduct a study that answers an urgent research gap. But what if we have the full cohort (e.g., all individuals with cystic fibrosis, or particular cancer) at our disposal, could we leverage this cohort with a case-control design to validly answer some of our pressing questions?
Here are a few scenarios where a case-control study probably makes more sense than the very widely used retrospective or prospective cohort study.
First, a quick definition of a case-control study: the hallmark of a case-control study is in how you select the people included in your study. In cohort studies, we select people based on whether they are exposed or non-exposed (e.g., did they get the treatment or not) and then we follow these patients for a period of time and measure the incidence of disease or some other continuous measure (e.g., lung function) in each of these groups.
In case-control studies, we go backward. We select people based on whether they have the disease or not and then compare the two on some (or more) exposure (or risk-factors or characteristics). This is why the case-control study is considered more efficient – it avoids the need for follow-up time and focuses our efforts on where all the money is – the “cases”. However, the validity of the design and associated inferences depend on how the controls were selected, which is a formidable task. For this post, I am assuming that a good selection of controls originating from the same source population as cases arose and thus, it mimics a cohort design.
In many instances, we already have cohort data available, meaning, we do not really need to conduct a prospective study and wait for the outcomes to develop. So it begs the question, why don’t we just simply analyze the entire cohort rather than limiting our analysis to the cases and dealing with selection bias issues related to the choice of controls?
It is true that you can just simply analyze the entire cohort for some research queries, but what if you need to collect additional data? For example, we are interested in a particular biomarker and its relationship with diabetes. We do not have biomarker data in our cohort of 1000 patients but we do have data on who developed and who did not develop diabetes. It is just not feasible and very expensive to get 1000 people into the hospital for blood samples to obtain biomarker data. Or alternatively, what if we are interested in the relationship between age at initiation of treatment and clinical outcomes, but we are missing some important confounders which are not present in the current cohort and we know these data are needed in order to estimate the association accurately.
This is exactly the setting for which a case-control study design can be of great assistance to researchers. Rather than collecting blood for the entire cohort or abstracting confounding data from thousands of charts, we can quickly and more efficiently focus on the cases and create a good control group to answer our research question. A task that would take lots of money and time (and probably not pursued as a result), can be done quite easily with a case-control study. The correct nomenclature for these variants of the case-control design is the nested case-control or case-cohort design. Dr. Kenneth Rothman also has a nice commentary about the history of case-control studies in the American Journal of Epidemiology (https://pubmed.ncbi.nlm.nih.gov/28535170/) that is a good read.
Conclusion
Case-control designs are useful when we have cohort data already available and need to collect more data. Rather than using all of our time and money to collect data on a large number of people, we can first conduct a smaller-scale study that takes advantage of both the methodological soundness of a cohort design by limiting selection bias and the efficiency of a case-control approach to answer our research questions. I am sure there are other ways to use the case-control design, so please comment below!