1 The Research Process & Data Ethics
Reading Objectives:
- Understand the Research Process: Gain a comprehensive understanding of the research process, including the formulation of research questions, study design, literature review, sampling, instrumentation, data analysis, and reporting results, using the Adolescent Brain Cognitive Development (ABCD) Study as an example.
- Identify and Define Key Variables: Learn to identify and define key variables in research, including dependent, independent, and demographic variables, and distinguish between exploratory and confirmatory research questions and hypotheses.
- Recognize and Apply Ethical Principles: Recognize the importance of ethical principles in behavioral data science and addiction research, including respect for persons, beneficence, and justice, and understand how to apply these principles to ensure the responsible conduct of research.
Key Terms:
- Variable: A concept or characteristic that can be measured, quantified, or otherwise observed. Quantitative measurement of variables involves numerical representation of a variable; qualitative measurement are non-number categories of a variable (e.g., text, images, videos). Dependent variables are the outcomes of focus in the research study (also called response variables). Independent variables are the factors that can influence the dependent variable. Demographic variables are population characteristics of research participants.
- Research question: An inquiry into an unexplored or contested area of science, and it guides a research project and is formed with variables. Exploratory research questions are open-ended inquiries that aim to investigate a broad topic without a predetermined hypothesis; confirmatory research questions are specific inquiries designed to test a predefined hypothesis or theory.
- Hypothesis: A specific, testable prediction derived from a research question and formed with variables. A falsifiable hypothesis must be structured in a way that allows it to be proven false through empirical observation or experimentation.
- Scientific and peer-reviewed literature: Scientific literature are written works that report on original scientific research, theories, or reviews of existing research in the field of science, serving as a foundation for new studies; peer-reviewed literature are studies that have passed scientific merit reviews of experts in the field.
- Data Ethics: Principles and practices that guide the responsible collection, analysis, and use of data, ensuring respect for privacy, accuracy, fairness, and transparency in data handling.
- Responsible Conduct of Research: Adherence to ethical and professional standards in research, including honesty, accuracy, efficiency, and objectivity, to ensure the integrity and reliability of scientific investigations.
Introduction
Let’s start with what addiction is not. Addiction is not about “bad choices.” Perhaps the most famous version of this definition of addiction was the “Just Say No” campaign in the 1980s, which promoted the idea that avoiding drug use was a simple matter of personal choice. From this perspective, people who use drugs make poor moral choices, and addiction is the result of personal failure. These arguments fueled the stigmatization of drug users; a stigma is a negative label of “deficiency” that is broadly applied to a group of people, which brings them disgrace and social barriers (APA 2024). This widespread stigma was accompanied by the “war on drugs,” which refers to wide-reaching government policies that focused on punishment rather than treatment and support for those struggling with addiction. This criminalization of drug use led to mass incarceration, disproportionately targeting racially and economically marginalized communities, despite the fact that patterns of drug use among races were roughly the same (Alexander 2010).

In the next module we will look at a scientific definition of addition, which includes understanding addiction as a chronic, relapsing disorder characterized by compulsive drug-seeking behaviors, despite harmful consequences. However, on what truth does this definition stand? How do we even what facts are true?
Our established understanding of truth of knowledge comes from the scientific research process. This module grounds us in that process and its principles. Module 1 covers:
- The steps of the research process
- The Adolescent Brain Cognitive Development (ABCD) Study as a central example
- The public resources we will need to understand ABCD data and research
- The ethical and responsible conduct of research
Overview of the Research Process & the ABCD Study
Researchers use a research process to produce new knowledge and/or contest existing knowledge. The research process merges theory and methods. Consider theory as an explanation for how or why things function or happen, and research methods aim to create theory or prove/disprove theory.
There are two principal categories of research studies—experimental and observational study designs. Experimental designs are a controlled setup where the researcher changes one thing to see its effect on another. For example, a clinical trial where participants are randomly assigned to receive either a new medication or a placebo to observe the medication’s effectiveness is an experimental design. Observational designs involve monitoring subjects without interfering. An example of this is a study where researchers follow a group of people over time to see how their health outcomes relate to various factors, like their diet or lifestyle choices.
Researchers use a similar research process in both experimental and observational studies, described in Table 1. Our focus in this course is the data analysis part of the research process. However, we must contextualize data analysis in within the whole research process, as our purposes in data analysis are to address specific research questions within a particular study design.
| Step | Description |
|---|---|
| Research Question | Identifying the research problem and formulating specific questions to guide research, and aligning research questions with the study design. |
| Study Design | Outlining the structure of the experimental or observational study design and ensuring that the study can answer the research question. |
| Literature Review | Reviewing existing scientific literature to contextualize and justify the proposed research question and study design. |
| Sampling | Identifying appropriate quantities and qualities for the sample, a subset of the population for study. |
| Instrumentation | Selecting and validating measurement tools (i.e., instruments) to test the research question. |
| Protocols | Implementing procedures for data collection (i.e., protocols) and ethical considerations in the responsible conduct of research. |
| Data Analysis | Applying appropriate techniques to examine the collected data, and ensuring accurate and responsible interpretation of results. |
| Reporting Results | Considering and reflecting on the research findings in relation to existing scientific literature, and communicating these findings to the scientific community and the public. |
The Adolescent Brain Cognitive Development (ABCD) Study
We will develop and apply our research skills using the ABCD study. The ABCD study is a hallmark study of the National institutes of Health (NIH). ABCD is an observational study that follows youth participants over 10 years (called a longitudinal study). There are 21 research sites across the United States that collect data on 11,880 youth participants (Jernigan & Brown 2018). ABCD researchers collect a range of data, including on brain images, biogenetic materials, neurocognition, physical and mental health, social and emotional functions, and culture and environment. See Figure 2 for an overview of the data collection schedule. The design of the study was guided by a range of research objectives (Jernigan & Brown 2018):
- To develop national standards for normal brain development in youth, by defining the range and pattern of variability in trajectories of brain development observed in children growing up in the U.S.
- To define the factors predictive of variability in individual developmental trajectories (e.g., of cognitive and emotional development, academic progress, etc.).
- To examine the roles of genetic vs. environmental factors on development, as well as interactions (e.g., by analysis of data from 800 twin pairs embedded within the cohort, and through genomic analyses).
- To estimate the effects of health, pubertal changes, physical activity, sleep, as well as sports and other injuries on brain development and other outcomes.
- To further elucidate the onset and progression of mental disorders, factors that influence their course or severity; and the relationship between mental disorders and substance use.
To determine how exposure to various levels and patterns of alcohol, nicotine, cannabis, caffeine, and other substances affect developmental outcomes, and how earlier developmental differences relate to use patterns.


Research Questions, Hypotheses, & Variables
Data analysis and data science in addiction research occur within the context of a research process, and specifically, they are guided by research questions.
Starting with a Broad Research Topic
Health research often begins with broad discussion topics that debate complex observations. Researchers draw inspiration for these topics from a range of sources, often coming from our personal lives.
Research Questions with Variables
A research question is a fundamental inquiry that guides the entire research process of a particular study. Let’s specify that a scientific research question must include variables. A variable is a concept or characteristic that can be measured, quantified, or otherwise observed. All variables in a research question or hypothesis should be tied to clearly defined measures. Table 2 presents some examples of variables.
| Variable | Measurement |
| Physical fitness | Heart rate variability, VO2 max, exercise frequency |
| Sleep quality | Hours of sleep, sleep latency, sleep quality index scores |
| Dietary habits | Daily calorie intake, frequency of fruit/vegetable consumption, food diary |
| Stress levels | Cortisol levels, perceived stress scale, daily stress logs |
| Social connectivity | Number of close friends, frequency of social interactions, social support scale |
Measurement of variables is done quantitatively or qualitatively. Quantitative measurement involves the process of quantifying variables using numerical data. Qualitative measurement focuses on non-numerical data, such as text, images, audio, and video.
Levels of Measurement
When defining or selecting variables for a research question, it’s also essential to know how the data are measured or categorized. The four commonly recognized levels of measurement are:
- Nominal: Categories without any intrinsic order (e.g., “Type of substance used”: alcohol, cannabis, nicotine).
- Ordinal: Categories that follow a ranked or ordered structure, but the exact distance between levels is not uniform (e.g., “Never / Rarely / Sometimes / Often / Very Often”).
- Interval: Numeric scales where intervals between values are consistent, but there is no true zero point (e.g., temperature in Celsius).
- Ratio: Numeric scales with a meaningful zero, where ratios between numbers are interpretable (e.g., height, weight, number of cigarettes smoked).

Independent vs Dependent Variables
Let’s consider three categories of variables that are necessary to constructing research questions: dependent, independent, and demographic variables.
- Demographic Variables: These refer to descriptive characteristics of the study population, such as age, gender, race, socioeconomic status, among many others.
- Dependent Variables: Often called response variables, these are the outcomes or effects that we are interested to study.
- For example, in research examining the effect of dietary habits on depression, depression is the dependent variable because it is the outcome of interest, as we are expecting depression to vary based on dietary intake.
- Independent Variables: In experimental studies, independent variables are the factors that you, as a researcher, manipulate or vary to observe their effect on dependent variables. In observational studies, independent variables are the factors that researchers anticipate will impact dependent variables. Independent variables can also be referred to as explanatory variables.
- Continuing with the example of the effect of dietary habits on depression, the type of diet (e.g., high in processed foods vs. high in unprocessed foods) would be an independent variable that impacts depression, the dependent variable.

Examples of research questions with variables are presented in Table 3.
Exploratory vs Confirmatory Research Questions
Exploratory research questions typically have either dependent or independent variables, in addition to demographic variables. For example, “What factors most influence substance use among urban high school students?” In that example, “substance use” is the dependent variable and there is no defined independent variable, and the demographic variable is “urban high school students.” Exploratory research is particularly valuable in the early stages of research when the relationships among variables are not well understood. It allows researchers to explore large datasets, identify patterns, and generate new, confirmatory research questions based on the observed data.
Confirmatory research questions test the relationships of independent variables and dependent variables, given certain demographics. For example, “What are the effects of peer pressure on substance use among urban high school students?” In this example, we are specifically testing the effects of “peer pressure” as an independent variable on “substance use,” our dependent variable. Confirmatory research is conducted to test theories with data.
Hypotheses with Variables
Confirmatory research questions can be reformulated as hypotheses, which are testable predictions about outcomes of confirmatory research questions. For instance, “Increased physical activity (independent variable) will improve cognitive function (dependent variable) in adolescents.” Incorporating variables into a hypothesis adds specificity by clearly identifying what will be measured (dependent variable) and what will influence those measurements (independent variable).
Falsifiable Hypotheses: The Pillar of Empirical Research
A scientific hypothesis is falsifiable. The purpose of falsification is to demonstrate increased confidence in the hypothesis by showing that it cannot be easily disproven. If the hypothesis is not proven false, then there is increased evidence for the opposite – that the hypothesis is correct. A strong falsifiable hypothesis includes variables that are specific and measurable.
| Broad research topic | Exploratory research question | Confirmatory research question | Falsifiable hypothesis |
|---|---|---|---|
| Characterize individual developmental trajectories (brain, cognitive, emotional, academic) and factors that can affect them | RQ: What are the typical patterns of cognitive development among adolescents?
Population: Adolescents DV: Cognitive development (e.g., cognitive test scores) IV: Not specified |
RQ: How does socioeconomic status influence cognitive development in adolescents?Population: AdolescentsDV: Cognitive development (e.g., cognitive test scores)IV: Socioeconomic status | Adolescents from higher socioeconomic backgrounds will show greater improvement in cognitive test scores over a one-year period compared to adolescents from lower socioeconomic backgrounds. |
| Disentangle genetic vs. environmental factors on development | RQ: What patterns of risk-taking behavior are observed among adolescents with indicators of elevated genetic liability for learning difficulties?
Population: Adolescents with elevated genetic liability indicators (defined using ABCD genetic measures) DV: Risk-taking behavior IV: Not specified |
RQ: How does parental involvement impact risk-taking behavior among adolescents with elevated genetic liability for learning difficulties?
Population: Adolescents with elevated genetic liability for learning difficulties DV: Risk-taking behavior IV: Parental involvement |
Among adolescents with elevated genetic liability for learning difficulties, higher parental involvement will be associated with lower risk-taking behavior over the next year compared to lower parental involvement. |
| Determine how exposure to alcohol, nicotine, cannabis, caffeine, and other substances affects developmental outcomes (and vice versa) | RQ: What are the prevalence and frequency patterns of cannabis use among adolescents?
Population: Adolescents DV: Cannabis use frequency (e.g., days used in past 30 days) IV: Not specified |
RQ: How does weekly cannabis use affect academic performance in adolescents?
Population: Adolescents DV: Academic performance (e.g., grades, standardized scores, school engagement) IV: Weekly cannabis use |
Adolescents who use cannabis weekly will show lower academic performance over one school year compared to adolescents who do not use cannabis weekly. |
| Study the onset and progression of mental health disorders and their influencing factors | RQ: What early patterns of anxiety symptoms are observed in adolescents?
Population: Adolescents DV: Anxiety symptoms (or anxiety disorder indicators) IV: Not specified |
RQ: How does peer support influence changes in anxiety symptoms over time in adolescents?
Population: Adolescents DV: Anxiety symptoms over time IV: Peer support |
Adolescents with higher peer support will show larger decreases in anxiety symptoms over two years compared to adolescents with lower peer support. |
Data Ethics & Responsible Conduct of Research
While data analysis and data science in health research are shaped by an overarching research process, the research process itself is shaped by ethical considerations. Ethics refers to our morals, values, principles, and behavioral norms, such as distinguishing between “right” and “wrong.”
In recent years, the volume of data generated and consumed has reached the scale of hundreds of zettabytes, giving researchers unprecedented access to detailed behavioral profiles and the ability to influence decision-making. This capability, while beneficial for personalized health interventions, also poses significant risks to data privacy and user autonomy.

Ethical Principles in Human Subjects Research
Ethical research involving human subjects is paramount to ensuring the dignity, rights, and welfare of research participants. Three core principles guide ethical human subjects research, as outlined in the 1979 Belmont Report:
- Respect for Persons:
- Informed Consent: Participants must be given comprehensive information about the study, including its purpose, procedures, risks, and benefits. They should be able to comprehend this information and voluntarily decide whether to participate without coercion.
- Additional Protections: Special protections are required for individuals with diminished autonomy, such as children, prisoners, or those with cognitive impairments.
- Beneficence:
- Maximize Benefits and Minimize Harms: Researchers are obligated to maximize the potential benefits of the research while minimizing any potential harm to participants. This involves careful risk assessment and implementation of measures to mitigate identified risks.
- Justice:
- Fair Distribution of Burdens and Benefits: The selection of research subjects should be fair, ensuring that no particular group is unduly burdened or excluded from the potential benefits of research. This principle also involves ensuring equitable access to research participation.
Regulatory Framework and Guidelines
Human subjects research is governed by a robust regulatory framework designed to enforce these ethical principles. Key components include:
- Institutional Review Boards (IRBs): IRBs review and approve research protocols to ensure they are ethically sound and that the rights and welfare of participants are protected.
- The Common Rule: A federal policy in the United States that outlines the ethical standards and regulatory requirements for human subjects research.
- International Guidelines: Documents such as the Declaration of Helsinki provide ethical guidelines for researchers worldwide.
Practical Considerations for Researchers
When conducting research involving human subjects, such as in addiction and related fields, researchers must:
- Obtain Informed Consent: Clearly explain the study’s purpose, procedures, risks, and benefits. Ensure that participants understand this information and voluntarily agree to participate.
- Ensure Confidentiality: Protect participants’ privacy by securely handling and storing data. Anonymize data whenever possible to prevent the identification of individual participants.
- Monitor for Adverse Events: Continuously monitor participants for any adverse effects and be prepared to take appropriate action if any arise.
- Provide Additional Protections for Vulnerable Populations: Implement additional safeguards for populations with diminished autonomy, ensuring their participation is truly voluntary and informed.
Applied Ethics: The ABCD Data Use Certificate (DUC)
ABCD Study data are made available to the scientific community in curated and anonymized form through the National Institute of Mental Health (NIMH) Data Archive, now accessed via the NBDC Data Hub. Although participant information is de-identified, the data’s depth and sensitivity require strict protections to safeguard confidentiality.
Access is governed by the NIMH Data Archive Data Use Certificate (DUC), a legal agreement between the researcher, their institution, and the archive. The DUC ensures that researchers:
- Maintain participant privacy and confidentiality.
- Use data only for approved scientific, educational, or scholarly purposes.
- Follow robust security measures, including encryption and restricted access.
- Refrain from attempting re-identification of participants.
- Delete data or securely archive it upon project completion or DUC expiration.
Current Access Process (per NBDC Data Hub Data Access):
- Create an account on the NIMH/NBDC Data Hub.
- Affiliate with your institution (requires institutional sign-off on the DUC).
- Create and submit a project describing your intended analyses.
- Obtain institutional and NIMH Data Archive approval.
- Download and use data only in secure, approved environments.
This protocol operationalizes the “open science” mission of ABCD while upholding the highest ethical standards for human subjects research.
Working with Synthetic Data
In this course, we will work with synthetic data modeled after the real data from the Adolescent Brain Cognitive Development (ABCD) study. Synthetic data is artificially generated data that can retain the statistical properties of the original data without exposing sensitive information. This approach ensures the privacy and confidentiality of the ABCD study participants while allowing you to practice data analysis techniques. We should not draw any real conclusions about ABCD study participants by working with the synthetic data in this course.
Summary
In Module 1, we established the foundation for conducting ethical, scientifically grounded addiction research. We began by exploring the entire research process—from developing research questions and designing studies to selecting instruments, analyzing data, and reporting results—using the ABCD Study as a central example. Key concepts were introduced, including the definitions of independent, dependent, and demographic variables, as well as the distinctions between exploratory and confirmatory research questions. We also examined the four levels of measurement and discussed how these inform the way variables are defined and interpreted. Finally, ethical principles, including the guidelines of the Belmont Report and the specific requirements of the ABCD Data Use Certificate, were discussed to emphasize the importance of protecting participant confidentiality and ensuring responsible research practices.
Works Cited
Alexander, M. (2010).
The New Jim Crow: Mass Incarceration in the Age of Colorblindness. The New Press.
American Psychological Association. (n.d.).
Stigma. In APA Dictionary of Psychology. Retrieved December 30, 2024, from https://dictionary.apa.org/stigma
Jernigan, T. L., & Brown, S. A. (2018).
Introduction. Developmental Cognitive Neuroscience, 32, 1–3.
https://doi.org/10.1016/j.dcn.2018.05.002
Microsoft. (2021).
Data Science for Beginners. Retrieved [date], from https://docs.microsoft.com/en-us/learn/modules/data-science-basics/
National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979).
The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. U.S. Department of Health, Education, and Welfare.
National Institute of Mental Health Data Archive. (n.d.).
Data Use Certification. Retrieved from https://nda.nih.gov/about/policies
U.S. Department of Health and Human Services. (2018).
Basic HHS Policy for Protection of Human Research Subjects (The Common Rule), 45 C.F.R. 46.
Retrieved from https://www.hhs.gov/ohrp/regulations-and-policy/regulations/45-cfr-46/index.html
Smith Collection/Gado/Getty Images. (1985). Department of Education posters featuring McGruff the Crime Dog [Photograph]. Retrieved from Getty Images or Slate (depending on source location).
World Medical Association. (2013).
Declaration of Helsinki: Ethical principles for medical research involving human subjects. JAMA, 310(20), 2191–2194.
https://doi.org/10.1001/jama.2013.281053