Descriptive epidemiologic studies
What are they and are they useful? ✏
One of the first functions of an epidemiologist (someone who studies diseases, NOT a skin doctor!) is to describe disease distribution by characteristics relating to person (i.e., age, gender, race, education, employment, income), place (i.e., census tracts, neighbourhoods, cities, provinces, countries, schools and workplaces, rural, urban), or time (i.e., secular trends, temporal patterns, timing of diagnosis, etc.). Although seemingly simple to analyze, it is truly an under-appreciated aspect of epidemiology and frankly many other fields of inquiry as well.
We want to first answer questions about the health problem or disease relating to who has the disease, where does the disease occur, and finally when does the disease occur before delving into the “why” did the disease occur question – an incredibly complicated question requiring additional tools and assumptions probably not worthy of study without understanding these basic questions first.
From a historical perspective, the first known health-related descriptive study was done by John Graunt in 1662, who was actually a merchant by trade but is considered the first epidemiologist, statistician, and demographer (for my epi friends: John Snow who?). He published his landmark book, Natural and Political Observations Made upon the Bills of Mortality examining the causes of death in London. In his book, he calculated the world’s first documented life table showing how many members of a given cohort survived in each successive decade, providing useful information to governments (in that time it was useful for insurance purposes).
It is also important to clarify what the goal of our descriptive study is so we do not conflate description with causal effect estimation inquiries. A great discussion was initiated by Dr. Eleanor Murray on Twitter (@EpiEllie) that emphasized this point (link).
For instance, if our goal is to describe disparities in opioid mortality in Canada, we should stratify opioid mortality based on where we think those may exist – maybe race, gender, location, opioid type, etc. If we want to generate hypotheses about a health problem, we should stratify the problem based on potential upstream factors. For example, if we think diabetes in people with cystic fibrosis leads to a higher mortality rate, we would stratify the mortality rate by people with and without diabetes and compare the magnitude of the difference (both relative and absolute).
If we find that people with diabetes tend to have a higher mortality rate than those without, we SHOULD NOT interpret these findings as definitive and claim causality between diabetes and mortality. Why not? Well, because our study was not designed and analyzed to estimate causal effects in the first place! Not using causal language (e.g., wording around “reducing” diabetes to change mortality rates or changing any exposure X level to change Y outcome such as “X impacts Y”, “X effects Y”, “X leads to Y”, “X increases Y”, “X improves Y” would, in general, imply causality) in descriptive studies is key to avoid confusion and misinterpretation of study findings and implications. Dr. Hernan has a great (article) on this topic.
The next study (assuming data is limited in the current study to undertake more sophisticated analyses adjusting for confounding factors), would be to examine this proposed descriptive association between diabetes and mortality in a more nuanced manner by adjusting for potential confounding variables based on theory and substantive knowledge, which opens a whole new can of methods, tools, and assumptions, probably for another blog.
Are descriptive studies useful?
Many people (journals included!) are generally skeptics when it comes to the usefulness and importance of descriptive studies. So, why do we need to conduct descriptive studies? And why is it important to describe disease distribution by characteristics, time and space? This is a deep question worthy of a longer discussion and probably very contextual depending on one’s research area and influenced by centuries of epidemiologic theory and reasoning. However, I will provide some basic “goals” of descriptive epidemiology from a population health perspective based on my own experience:
-
We can quantify the magnitude of diseases or problems – how “big” is the problem? Maybe the problem or disease isn’t really a problem or on the other hand, it is a big problem and needs immediate attention.
-
It provides a basis for planning, close monitoring, and targeting of interventions with specific characteristics in order to achieve better outcomes. For example, if Northern British Columbia is showing a higher rate of disease Y, governments can allocate additional health resources in that region. Patients exhibiting characteristics X, Y, Z may indicate a “high-risk” group which doctors can closely monitor and introduce additional medications.
-
It can identify problems for further study (i.e., hypotheses generation), including identification of risk-factors to potentially intervene upon. For example, finding an association with diabetes and mortality may call for a larger scale study examining this observation in more detail.
Obviously there are more applications of descriptive studies and I would love to hear more about them, particularly, real-world examples of their use. Comment below!
So in conclusion, yes, we do need descriptive studies! But, we need to be explicit about our goals and objectives and very careful with how we present and interpret our data so as to not conflate them with causal inference studies.
UPDATE 03/25/2020: A prime example of “descriptive epi” in action is during this unfortunate pandemic of COVID-19. One of the first descriptive observations about the epidemiology of the deadly virus was that it differentially impacted very old people. This was simply an observation from examining those who were positive and more importantly, those who died as a result of the virus. We don’t know why old people are dying more and there are many plausible hypotheses such as a more compromised immune response. In other words, we are unsure of the causal mechanism between age and negative outcomes for those with the virus but good old-fashioned descriptive epidemiology gave us that information! You will see many studies examining this observation in more detail. How else did this help? It informed screening criteria and other public health measures such as mitigation and suppression strategies that focused on protecting the elderly as they were deemed “high risk”. It will continue to inform us as we begin to relax these measures (when it’s the right time).