4 Chapter 4: Study Design
Learning Objectives:
- Understand the different types of study designs used in health research and their respective strengths and limitations.
- Learn how to match research questions and hypotheses with appropriate study designs.
- Recognize the importance of ethical considerations in study design, including patient-informed and community-engaged approaches.
- Explore the role of observational studies in generating hypotheses for experimental research.
- Examine the iterative process of developing hypotheses and study designs to ensure research questions are answerable and study designs are appropriate.
Key Terms:
- Study Design: The overall strategy or blueprint used to conduct research, determining how data will be collected, analyzed, and interpreted.
- Experimental Study: A study design that involves the manipulation of variables to test causation, often through randomized controlled trials.
- Observational Study: A study design that involves observing and measuring variables without manipulating them, used to identify correlations and associations.
- Randomized Controlled Trials (RCTs): An RCT is a study design where participants are randomly assigned to either the intervention group or the control group to evaluate the effectiveness of an intervention.
- Longitudinal Study: A study design that follows the same participants over an extended period, tracking changes and development.
- Confounding Variable: A factor other than the independent variable that might influence the outcome of a study, potentially leading to biased results. Control Variables are factors that researchers keep constant to isolate the relationship between the independent and dependent variables, thereby addressing confounders in the study.
Introduction
In the realm of health research, the design of a study is the blueprint that guides the entire research process. It determines how data will be collected, analyzed, and interpreted to answer the research question. Study design is not a one-size-fits-all process. Different research questions require different study designs, each with its own strengths and limitations.
In this chapter, we will explore the various types of study designs commonly used in health research, including experimental and observational designs. We will discuss how each design is suited to answering specific types of research questions and the implications of these choices for the validity and reliability of the study’s findings. By the end of this chapter, you will have a solid understanding of how to choose the appropriate study design for your research question and how to implement it effectively to obtain meaningful and trustworthy results.
Overview of Study Design Types
This section provides a brief overview of the landscape of study design types in health research, we will focus attention on experimental and observational study designs as these are foundational to researching population health.
Figure 5: Study Design Types
Basic Science
Basic science refers to fundamental, theoretical research aimed at understanding the underlying principles and mechanisms of natural phenomena, often without immediate practical applications. Basic science discoveries in fields such as genetics, molecular biology, and animal studies (e.g., laboratory rats) contribute to our understanding of disease mechanisms, general biological and health processes, inform public health strategies, and guide the development of new interventions and treatments. It serves as the foundation for applied research, including clinical and translational studies, by providing the necessary knowledge base and conceptual framework.
Experimental vs Observational Research
The basic difference between experimental and observational research is that in experiments researchers directly modify or intervene in the behaviors or environments of participants, whereas in observational studies researchers only collect data on participants without making any changes to their behavior or environments. For example, imagine you’re curious about whether eating candy every day makes kids hyperactive. To find out, you could do two types of investigations:
- Experimental Research: Think of this like a science experiment. You take two groups of kids, give candy to one group and no candy to the other group, and then watch to see which group is more hyperactive. Since you’re controlling who eats the candy, you can be more sure that any difference in hyperactivity is because of the candy.
- Observational Research: Instead of controlling who eats candy, you just go to the playground and watch kids who are already eating candy and those who are not. You see if there’s a difference in how hyper they are. In this case, you’re not changing anything; you’re just observing what’s happening naturally.
Experimental studies involve the manipulation of variables to assess causality, while observational studies involve the collection of data without manipulation, observing natural occurrences. Experimental studies, like randomized controlled trials, are considered the gold standard for testing the efficacy of interventions. Observational studies, such as cohort and case-control studies, are essential for identifying risk factors and associations in large populations.
Experimental Study Designs
Randomized Controlled Trials (RCTs): Think of an RCT like a fair test to see if a new medicine or health treatment really works. Here’s how most RCTs work:
- Randomly Assigned: Study participants get randomly assigned into two groups: one group gets the new health treatment or intervention, and the other group gets a regular treatment or a fake treatment (called a placebo).
- Intervention Group: This is the group that gets the new treatment that we’re testing. There may be more than one type of intervention group depending on the study.
- Control Group: This group doesn’t get the new treatment. Instead, they might get no treatment, a regular treatment, or a fake treatment. The control group allows a comparison to help us see what happens when you don’t get the new treatment.
- Why Randomize?: Participants get randomly assigned to groups so that both groups are similar in every way except for the treatment. This way, we can be more confident that any differences we see are because of the treatment, not because one group was healthier or different to start with.
- Testing Effectiveness: Researchers then watch both groups to see how they do. If the group with the new treatment does better, this provides evidence that the treatment works.
RCTs help us figure out if new medicines or health treatments are safe and effective, such as vaccines, medications, and lifestyle modifications. RCTs providing high-quality evidence for public health policies and clinical guidelines for medical professionals like doctors.
As an example of an RCT, imagine we want to find out if a new type of sunscreen is better at preventing sunburns than a regular sunscreen. We take 100 people who are going to the beach and randomly divide them into two groups: (1) Intervention Group: 50 people get the new sunscreen; (2) Control Group: 50 people get the regular sunscreen. After a day at the beach, we check to see how many people in each group got sunburned. If fewer people in the intervention group got sunburned compared to the control group, it suggests that the new sunscreen might be more effective.
Quasi-Experiments: In some experimental designs, randomization of participants in not possible or ethical. Quasi-experimental designs resemble experimental studies but lack random assignment, relying on natural or pre-existing groups. Due to the lack of randomization, quasi-experiments are generally considered less robust than RCTs (generally meaning, slightly less reliable conclusions). Quasi-experiments are often used in population health research to evaluate policy changes, public health interventions, and program implementations in real-world settings.
As an example of a quasi-experiment, imagine that a city introduces a new program to encourage people to walk more by adding more pedestrian zones. Since we cannot randomly assign people to live in different areas, then we need to rely on a quasi-experiment with two groups: (1) Intervention Group: People living in areas with the new pedestrian zones; (2) Comparison Group: People living in similar areas without the new pedestrian zones. We measure how much people in both groups walk before and after the pedestrian zones are added. If people in the intervention group start walking more compared to the comparison group, it suggests that the program might be effective in encouraging walking.
Observational Study Designs
Longitudinal Studies: Longitudinal studies track the same group of individuals over an extended period, observing changes or outcomes over time. Longitudinal studies aim to understand developmental trends, causal relationships, and changes over time. They are inherently long-term and involve repeated measurements or observations. For example, a longitudinal study could following a group of people from childhood to adulthood to study how their health, behavior, or cognitive abilities evolve. The ABCD study is a longitudinal study, further detailed at the end of the chapter.
Cohort Studies: Cohort studies focus on a specific group of individuals, regardless of whether they are tracked over time or not. All cohort studies are longitudinal, but not all longitudinal studies are cohort studies. Cohorts can be prospective (followed forward in time) or retrospective (data collected from past records). Cohort studies examine associations between exposure (e.g., risk factors and independent variables) and outcomes (e.g., disease incidence and dependent variables). For example, studying a group of smokers and non-smokers over several years to assess lung cancer risk.
Case-Control Studies: Case-control studies compare individuals with a specific outcome (cases) to those without it (controls). Case-control studies investigate associations between exposure and disease. They are retrospective (looking backwards in time), starting with identified cases and selecting controls from the same population. Controls represent the baseline population from which cases arose. For example, comparing past dietary habits of individuals with and without heart disease to identify risk factors.
Cross-Sectional Studies: A cross-sectional study is a research design where data is collected from many different individuals at a single point in time (in contrast to longitudinal studies which collect data across time). These studies help us understand the current prevalence of a disease or outcome in a specific population subset.
Meta-Analyses
Think of a meta-analysis as a study of studies, in which researchers combining the results of multiple studies into one big study. Researchers investigating a particular research question evaluate all existing studies on the topic to determine the most definitive answer to the question. More technically, a meta-analysis is a statistical technique that combines the results of multiple studies to arrive at a more comprehensive understanding of a research question or hypothesis. The general process of conducting a meta-analysis is as follows:
- Collecting Studies: Researchers gather existing studies on a specific health topic (e.g., effectiveness of a drug or treatment).
- Pooling Data: They take the data from each study and put it together.
- Analyzing Results: By combining all the data, they can calculate an average effect size. This tells us how strong the treatment effect is across all the studies.
- Drawing Conclusions: Meta-analysis helps us see patterns and trends that might not be clear in individual studies. It’s like stepping back to see the whole puzzle instead of just one piece.
In summary, meta-analysis helps researchers make more informed decisions by looking at the bigger picture formed by many smaller studies.
Twin Studies
Twin studies are not included in the tree graph in Figure 5, but twin studies are a pillar of health research, and they can appear within both experimental and observational designs. Twin studies are a valuable tool for evaluating the contributions of genetic and environmental factors to various traits and behaviors. By comparing similarities and differences between monozygotic (identical) twins, who share nearly 100% of their genes, and dizygotic (fraternal) twins, who share about 50% of their genes, researchers can estimate the heritability of traits and the extent to which environmental factors play a role. These studies provide insights into the nature vs. nurture debate by helping to determine how much of a trait is influenced by genetics versus environmental factors.
Strength of Conclusions by Study Design Type
Figure 6: Hierarchy of Strength of Conclusions by Study Design Type
Now that we have an intuition for the most common study design types, how do we know which study design will deliver the best results, or the most trust-worthy answers to our research questions? Figure 6 presents the strength of conclusions by study design type, meaning, the study design types ranked from more trust-worthy findings to less trust-worthy findings. This hierarchy of study designs is based on two main considerations: (1) correlation vs causation, and (2) sources of bias.
Correlation vs Causation
Correlation is when two variables are associated with each other, meaning they tend to change together. A positive correlation is when two (or more) variables increase together (e.g., age and income increase together), and a negative correlation is when one variable increases while the other decreases (e.g., money spent vs money saved). However, correlation is not causation. For example, ice cream sales and swimming pool attendance both increase during summer, but one doesn’t cause the other. Causation is when one variable directly causes a change in another. For example, consistent exposure to sunlight without protection can cause sunburn.
In health research we want to understand cause and effect to get at the root cause of a particular disease, condition, or health outcome. If someone has asthma, for example, then we want to understand why. Knowing that a specific behavior or treatment directly causes a health outcome allows for targeted interventions.
Experimental studies, especially RCTs, are best designed to test causation relative to observational studies. By randomly assigning participants to treatment and control groups, isolate the effect of the treatment, providing stronger evidence for causation. In other words, experimental studies (RCTs) are typically better than observational studies at minimizing confounding factors.
Confounding Factors
Confounding factors influence both the independent variable and the dependent variable in a study, potentially leading to a false association between the two. For example, in a study examining the relationship between exercise and heart health, age could be a confounding variable (in this case, we are using the terms confounding factor and confounding variable interchangeably). Older individuals might exercise less and also have poorer heart health. If age is not accounted for, it might appear that exercise directly impacts heart health, when in fact, the observed effect might be partly or entirely due to age.
A confounding variable can create a spurious association between independent and dependent variables being observed in an observational study. This means that the observed relationship might not be causal, but rather due to the influence of the confounding variable. Returning to the above example of a correlation between ice cream sales and swimming pool attendance in the summer, the correlation is described as a spurious association and hot weather is the confounding variable driving the spurious association.
However, confounding can also influence experimental designs. Imagine, for example, an RCT where not all participants in the treatment group finished the treatment (i.e., some participants dropped out of the study). These dropouts introduce a confounding variable – the possibility that there are some underlying conditions that influence or shape who finished the treatment and who did not (Vu and Harrington 2021).
Randomization of Treatment and Control Groups
Randomization minimizes measured and unmeasured confounding factors by ensuring that participants in different groups of a study (treatment vs control) are similar in all respects except for the treatment being tested (Lim and In 2019). Here’s how it works:
- Balances Known and Unknown Factors: By randomly assigning participants to treatment and control groups, randomization ensures that both known and unknown confounding variables are evenly distributed across the groups. This means that any differences in outcomes between the groups are more likely to be due to the treatment rather than these confounding factors.
- Prevents Selection Bias: Randomization prevents researchers or participants from consciously or unconsciously selecting certain individuals for one group over another, which could introduce bias. For example, if researchers could choose who receives a new drug, they might unintentionally select healthier individuals, skewing the results.
- Facilitates Statistical Analysis: Because randomization creates comparable groups, it allows for the use of statistical methods to estimate the treatment effect while accounting for the random variation between groups.
By minimizing confounding factors, randomization strengthens the ability to make causal inferences about the relationship between the treatment and the outcome. This is crucial for understanding the true effect of an intervention in experimental research. Observational studies are limited in their ability to establish causation mainly because of the challenge of addressing and minimizing confounding variables. These studies can identify associations, but it’s difficult to determine if the observed relationship is causal or if it’s influenced by confounding factors.
Sources of Bias
In the context of health research, bias refers to any systematic error in the design, conduct, or analysis of a study that results in a distorted or misleading estimate of the true effect of an exposure on the outcome (Attia 2018). Confounding variables are a form of bias, but there are several other common types of methodological bias:
- Selection Bias: Occurs when the way participants are chosen for a study leads to a non-representative sample (further discussed in Chapter 7), affecting the generalizability of the results.
- Information Bias: Happens when there is a systematic error in measuring the variables or outcomes in a study, leading to inaccurate data.
- Healthy-User Bias: Occurs when individuals who engage in healthy behaviors (e.g., taking vitamins, exercising) are also more likely to engage in other behaviors that contribute to better health outcomes, leading to a biased association between the behavior and the health outcome.
- Reverse-Causality Bias: Happens when the direction of cause and effect is misunderstood. For example, if a study finds that people with a certain disease are more likely to consume a particular food, it might be that the disease causes an increased consumption of that food, rather than the food causing the disease.
Experimental studies, especially RCTs, are considered the gold standard in research because they are designed to minimize all forms bias, confounding bias or those mentioned above. Researchers can control many variables in an experimental study, which reduces the risk of outside factors influencing the results. By randomly assigning participants to different groups, experimental studies ensure that any differences between groups are due to chance, not bias. In a double-blind study, neither the participants nor the researchers know who is receiving the treatment and who is in the control group. This helps prevent bias in reporting and interpreting results.
Necessities of Observational Studies
While experimental studies are powerful tools for understanding cause and effect, they are not always practical or ethical. Some experiments would be too difficult or expensive to conduct. For example, testing the long-term effects of a dietary change might require controlling participants’ diets for years, which would be impossible to do. In other cases, it would be unethical to conduct an experiment that exposes participants to harm. For example, you can’t ethically assign people to smoke cigarettes to see if they develop lung cancer. In these cases, observational research becomes crucial. They allow researchers to study the effects of potentially harmful exposures without deliberately exposing participants to them.
Observational studies can generate hypotheses that can later be tested in experimental studies when feasible and ethical. Observational studies can provide valuable information about how things work in the real world, outside of a controlled experimental setting. This information is used to generate hypotheses about potential causal relationships, which can then be tested in more rigorous experimental studies. For example, if an observational study suggests that a certain diet is associated with a lower risk of heart disease, researchers can design an RCT to test whether the diet directly causes the reduced risk.
Using Control Variables to Account for Confounders in Observational Studies
In the previous chapter, we identified three types of variables: independent, dependent, and demographic variables. In the context of confounding factors in observational studies, control variables are one of the main strategies to address confounders, such as through multivariate statistical methods (Jager et. al 2008; Pourhoseingholi et. al 2012). Control variables are factors that researchers keep constant to isolate the relationship between the independent and dependent variables.
For example, in a study examining the effect of a new medication on blood pressure, age, diet, and physical activity might be controlled to ensure that any observed changes in blood pressure are attributable to the medication rather than these other factors. In this example, age, diet, and physical activity are potential confounders in the relationship between the medication (independent variable) and blood pressure (dependent variable). By explicitly introducing age, diet, and physical activity into the study design, we can attempt to isolate the effects of the medication and blood pressure (or lack thereof).
Multivariate Research Questions to Control for Confounders
These are general steps to modifying research questions from the previous chapters to introduce control variables, ensuring the research findings are robust and reliable. A statistical analysis that include control variables is known as multivariate analysis.
1) Identify Potential Confounders: Start by listing all variables that could potentially affect both the independent and dependent variables. Often, this requires a thorough understanding of the subject area and the specific context of your research. For example, if studying the impact of diet on mental health, potential confounders might include age, physical activity, genetic predispositions, and socioeconomic status.
2) Choose Appropriate Control Variables: From the list of potential confounders, select those that are most relevant to your research question and are measurable. The selection should be based on previous research findings, theoretical considerations, and the availability of data. Continuing with the example above, if previous studies have shown that age and socioeconomic status significantly impact both diet and mental health, these should be included as control variables in your study design.
3) Formulate Multivariate Research Questions: Write research questions that explicitly mention the control variables. A well-formulated multivariate research question not only specifies the relationship between the primary independent and dependent variables but also integrates the control variables. For example, “How does a diet high in fruits and vegetables compared to a diet high in processed foods affect anxiety levels among adults aged 20-30, controlling for socioeconomic status and physical activity?”
Following these first three steps, are operationalizing the variables (topic of Chapter 7) and planning the statistical analysis (topic of Chapter 9).
Iterative Development of Hypotheses and Study Design
With a clearer idea of the strengths and weaknesses (limitations) of different study design types, we can now return to refining our research questions and ensuring that our chosen study design is well-equipped to provide meaningful answers. The previous chapter detailed the process of developing research questions and hypotheses but cautioned that they are typically further revised during study design phase, the topic of this chapter. The process of developing hypotheses and study designs is iterative, meaning they inform and refine each other over time (Patten 2016). This dynamic relationship ensures that the research questions are answerable and that the study design is appropriate for testing the hypotheses.
A well-formulated hypothesis guides the choice of study design, ensuring that the research question can be effectively tested. Conversely, the practicalities of a chosen study design may require refining the hypothesis to ensure it is testable within the constraints of the study. This is achieved through matching the research question, hypothesis, and study design. For students at this stage, this is rough a step-by-step guide to help you make the right match:
- Determine whether your question is exploratory or confirmatory (see Chapter 2). This will influence the type of study design you choose. For example, exploratory questions might be best suited for observational studies, while analytical questions may require experimental designs.
- Identify the type of data you need to collect and the best way to analyze it.
- Different study designs provide different levels of evidence. For example, randomized controlled trials (RCTs) provide strong evidence for causality, while case studies provide weaker evidence. Match your study design to the level of evidence needed to answer your research question and support your hypothesis.
- Consider the practicality of implementing the study design and any ethical considerations. Some designs, like RCTs, may not be feasible or ethical for certain research questions.
- Based on the above considerations, select the study design that best fits your research question and hypothesis. For example, if you need to establish causality, an experimental design like an RCT might be appropriate. If you’re studying associations, an observational design like a cohort study might be suitable.
- Once you’ve chosen your study design, revisit your hypothesis to ensure it aligns with the design. You may need to make adjustments to ensure that your hypothesis is testable within the constraints of the chosen design.
It is recommended to document any changes made to the hypotheses or study design throughout the research process. This documentation provides a clear record of the research evolution and the rationale behind each decision. Lastly, in many cases, researchers should prioritize patient-informed and/or community-engaged study designs.
Ethical Considerations: Patient-Informed and Community-Engaged Study Designs
Patient-informed study designs, or patient-centered research, means designing studies that address patients’ needs, preferences, and values. For example, in research on addiction, HIV, disabilities, and parallel fields, the axiom “nothing about us without us” refers to patient-informed research practices in which researchers should not conduct research about patients without their input (Jürgens 2005). This ensures that the research is relevant and beneficial to those it aims to help. Researchers do this by engaging with patients from the beginning of the study design process to understand their perspectives.
Community-engaged study designs, or community-based participatory research, involves researchers directly collaborating with the community that they are studying in all phases of the research (Wallerstein, et. al. 2020). This ensures that the study addresses community needs and leverages local knowledge. Researchers achieve this by building partnerships with community organizations and leaders, involving community members in planning and conducting the research, and sharing findings and benefits with the community. This often necessitates that researchers share decision-making power and resources with community partners.
Adolescent Brain Cognitive Development Study Design
The ABCD Study employs a longitudinal cohort design, tracking the same group of participants over a decade. ABCD is the largest long-term study of brain development and child health in the United States, involving over 11,000 children aged 9-10 at baseline from diverse backgrounds across the country. This approach allows researchers to observe developmental changes and trajectories, providing invaluable insights into the complex processes of adolescent brain maturation. Participants undergo extensive assessments, including neuroimaging scans to capture brain structure and function, cognitive and psychological tests to evaluate mental processes and emotional well-being, and surveys to gather information on health, lifestyle, and environmental influences. Assessments are conducted annually, with more detailed follow-up visits every two years, to monitor changes and identify factors that impact developmental outcomes. The study integrates data on genetics, social environments, and health behaviors to comprehensively examine their effects on adolescent development.
Table 4: Examples of Matching ABCD Research Questions with ABCD Study Design |
|
Example Research Area |
Study Design Considerations |
Examine effects of physical activity, sleep, screen time, and other activities on brain developmental outcomes. |
The ABCD Study collects data on participants’ physical activity levels, sleep patterns, and screen time to investigate their effects on brain development and cognitive function. By analyzing these data over time, researchers can identify patterns and relationships between lifestyle factors and developmental outcomes. |
Study the onset and progression of mental health disorders and their influencing factors. |
The study includes assessments of mental health symptoms and disorders to track their onset and progression during adolescence. Researchers can examine how genetic, environmental, and lifestyle factors contribute to the development and trajectory of mental health conditions over time. |
Determine how exposure to various levels and patterns of alcohol, nicotine, cannabis, caffeine, and other substances affects developmental outcomes and vice versa. |
The ABCD Study collects detailed information on participants’ exposure to substances, including alcohol, nicotine, cannabis, and caffeine. This data allows researchers to explore the impact of substance use on brain development, cognitive function, and behavioral outcomes, as well as how developmental changes might influence substance use behaviors over time. |
The ABCD Twin Subsample
ABCD has a nested twin subsample, which was designed to study the genetic and environmental contributions to mental and physical health outcomes in children, including substance use, brain and behavioral development, and their interrelationship (Iacono, Heath et al. 2018). The subsample consists of 800 pairs of same-sex twins, recruited from four leading centers for twin research in Minnesota, Colorado, Virginia, and Missouri. Each site enrolled 200 twin pairs as well as singletons. The twins are recruited from registries of all twin births in each state during 2006–2008, while singletons are recruited using the same school-based procedures as the rest of the ABCD study. The twin subsample will provide valuable insights into the interplay between genetic and environmental factors and help test hypotheses critical to the aims of the ABCD study.
Summary
In this chapter, we explored the iterative process of study design in health research, focusing on how hypotheses and study designs inform and refine each other. We discussed various study design types, including experimental designs such as randomized controlled trials (RCTs) and quasi-experiments, as well as observational designs like cohort, case-control, and cross-sectional studies. We addressed the importance of matching research questions and hypotheses to appropriate study designs and highlighted the role of randomization in minimizing bias.
We also emphasized the significance of ethical considerations in patient-informed and community-engaged study designs, such as ensuring informed consent, respecting cultural differences, and establishing equitable partnerships. The Adolescent Brain Cognitive Development (ABCD) Study was presented as an exemplary model of a longitudinal cohort study design that addresses key research areas in adolescent development. The next chapter focuses on using literature reviews to justify the chosen research question and study design.