Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
5 Polygenic Risks of Addictions
Reading Objectives
By the end of this module, you should be able to:
Understand Molecular Genetics Fundamentals: Grasp the basic structure and function of DNA, genes, and genomes, and recognize the role of genetic variations such as Single Nucleotide Polymorphisms (SNPs) in influencing behaviors and addiction.
Explain Genome-Wide Association Studies (GWAS): Describe the methodology and purpose of GWAS, including study designs like case-control and cohort studies, and understand how GWAS identify genetic variants associated with addiction.
Comprehend Polygenic Scores (PGS): Understand how PGS are derived from GWAS data, their applications in predicting addiction risk, and the limitations associated with their use, especially concerning population diversity.
Identify and Address Ethical Considerations in Genetic Research: Recognize the importance of data privacy, informed consent, and the responsible use of genetic data to prevent scientific racism and ensure equitable research practices. Implement best practices for using population descriptors and appreciate the significance of genetic diversity in research.
Explore the ABCD Study’s Genetic Data Collection: Learn how the ABCD Study collects, analyzes, and utilizes genetic data to investigate the interplay between genetics, brain development, and addiction.
Key Terms
DNA (Deoxyribonucleic Acid): The molecule that carries genetic information in all living organisms, structured as a double helix composed of four bases: adenine (A), thymine (T), cytosine (C), and guanine (G).
Gene: A specific segment of DNA that encodes instructions for building proteins, serving as the basic units of heredity.
Genome: The complete set of genetic material within an organism, encompassing all of its genes and non-coding regions.
Single Nucleotide Polymorphism (SNP): The most common type of genetic variation among people, involving a change of a single nucleotide in the DNA sequence, occurring approximately once every 300 bases (Genetic Science Learning Center, 2016).
Genome-Wide Association Study (GWAS): A research approach that involves scanning entire genomes of many individuals to identify genetic variants, particularly SNPs, associated with specific traits or diseases, such as addiction (Baurley et al., 2016).
Polygenic Score (PGS): A quantitative index that aggregates the tiny effects of many genetic variants (typically SNPs) identified in GWAS to summarize inherited liability for a trait. A PGS shifts group-level probabilities; it does not determine outcomes for any one person.
I. Introduction
In 2003, the Human Genome Project achieved a near-complete sequence of the human genome after 13 years of relentless effort and a staggering investment of around £2 billion. Just two decades later, advancements in sequencing technologies have accelerated this process exponentially. By 2022, the Wellcome Sanger Institute showcased the power of modern sequencing by producing a human genome every 12 minutes—a stark contrast to the painstaking pace of the past (Wellcome Sanger Institute, 2022). Today, entire genomes can be sequenced in less than an hour, revolutionizing our understanding of genetics and opening new frontiers in biology, medicine, and healthcare.
Figure 1. DNA sequencing workflow. Courtesy: National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), 2023. Public domain. Source: genome.gov (DNA Sequencing Fact Sheet).
In the previous module, we developed an intuition for heritability and foundational methodologies in behavioral genetics. We now turn to the molecular structures that underlie heritability. Whereas an atom is the smallest unit of an element (carbon, hydrogen, oxygen, etc.) that keeps its chemical identity, molecules are two or more atoms held together by chemical bonds (water, carbon dioxide, etc.). Our focus is the biomolecules of genetics: DNA, RNA, and proteins.
In this module, we explore molecular genetics, genome-wide association studies (GWAS), and polygenic scores (PGS) to understand how these technologies help scientists uncover complex relationships between genes and behavior, particularly in the context of addiction.
Additionally, we consider ethical challenges and responsible approaches to behavioral genetics in addiction science.
II. Molecular Genetics
Understanding a few molecular genetics basics helps us interpret how DNA variation can relate to behavior, including addiction. This section introduces DNA, genes, chromosomes, the genome, and common genetic variation called single-nucleotide polymorphisms (SNPs).
DNA and Genes
DNA (deoxyribonucleic acid) carries the biological instructions for life. It consists of two complementary strands twisted into a double helix, like a spiral ladder. The sugar-phosphate backbones form the sides, and paired bases form the rungs. DNA uses four bases: A (adenine), T (thymine), C (cytosine), and G (guanine). A pairs with T, and C pairs with G. The order of the bases, or sequence, functions like biological “text” that cells read to build proteins, which perform most life functions.
When a cell needs a protein, it copies a specific stretch of DNA into RNA (ribonucleic acid). RNA is usually single-stranded and uses U (uracil) instead of T. The short-lived copy, messenger RNA (mRNA), is then read in three-base codons to assemble a protein.
A gene is a stretch of DNA whose sequence is transcribed into RNA, typically to make a protein. Genes are passed from parents to offspring and influence traits such as eye color and disease susceptibility. A chromosome is a long, tightly packaged DNA molecule containing many genes, wrapped around proteins. Chromosomes are located in the cell nucleus in most cells. Humans have 23 pairs of chromosomes (46 total), with one chromosome in each pair inherited from each parent. The genome is the complete DNA sequence across all chromosomes in a person, totaling about 3 billion base pairs in humans.
Figure 2. Gene–DNA–chromosome relationship and protein production. Illustration by Laura Olivares Boldú / Wellcome Connecting Science (YourGenome), CC BY 4.0.
Gene Expression
Gene expression is the process by which information from a gene is used to create a functional product, usually a protein. Not all genes are active at all times. Gene regulation controls when and how much of a protein is made, ensuring that proteins are produced as needed by the cell.
Genotype vs Phenotype
The genotype is the genetic makeup of an individual, meaning the specific set of genes they carry. The phenotype is the observable traits or characteristics that result from the interaction of the genotype with the environment. For example, a gene may influence a person’s tendency toward impulsivity, which, combined with environmental factors, can affect behavior.
Alleles: Different Versions of a Gene
Genes often exist in slightly different forms known as alleles. Each individual inherits two alleles for every gene, one from each parent. While many alleles have little or no influence on observable traits, some can lead to differences in characteristics, such as eye color, or predispositions to certain behaviors or diseases. For example, a gene that influences eye color may have one allele for blue eyes and another for brown eyes.
The combination of alleles you carry can influence how genes are expressed and, in some cases, how you respond to your environment, medications, and other external factors. Additionally, Single Nucleotide Polymorphisms (SNPs) represent a type of allelic variation where a single nucleotide differs between alleles. Allelic variation is therefore an important concept in understanding genetic diversity and how traits and health outcomes vary between individuals and across populations.
Figure 3. Example of traits mapped to corresponding regions on homologous chromosomes, illustrating that the same loci appear in the same positions on both chromosome copies. Created by the author. Production note: created using generative AI and edited by the author.
Genotype and Phenotype
The term genotype refers to one’s genetic makeup (the specific alleles one carries), whereas phenotype refers to observable characteristics or outcomes (which result from the interaction of genotype with environment). For instance, a person might have a genetic allele that increases impulsivity (genotype), but whether that manifests in behavior (phenotype) could depend on upbringing, stress, and other environmental factors.
It’s crucial to remember that for complex traits like addiction vulnerability, there is no single “addiction gene.” Instead, many genes each make small contributions. In other words, such traits are polygenic (influenced by multiple genes).
Single-Nucleotide Polymorphisms (SNPs)
One common type of genetic variation is the single nucleotide polymorphism (SNP), essentially a single “letter” difference in the DNA sequence between individuals. For example, at a particular position in the genome one person might have an A, while another has a G. Each SNP is like a single-character typo or variation in a very long book.
SNPs occur roughly once in every 300 base pairs on average, meaning there are millions of SNP differences between any two people. Most SNPs have no effect on health, but some can influence how genes function or how proteins are made. As such, SNPs can alter how a person responds to a drug, or their susceptibility to a health condition, including possibly their predisposition to addiction.
Because SNPs are so abundant and spread throughout the genome, they serve as useful markers for researchers. Modern genotyping technologies, such as SNP microarrays, can test hundreds of thousands of SNPs in a person’s DNA quickly and cost-effectively. This allows scientists to create a genome-wide “fingerprint” of genetic variants for each individual in a study. These data feed the analyses you’ll see next.
A recommended quick resource to learn more about SNPs can be found at Making SNPs Make Sense from the Genetics Science Learning Center (2016).
Figure 4. A gene’s DNA sequence encodes a protein; a SNP (single-letter DNA change) can create a variant genotype and may alter the resulting protein. Created by the author. Production note: created using generative AI and edited by the author.
Genome-Wide Association Studies (GWAS)
How do scientists go from raw DNA data to discovering which genetic variants might increase the risk of disease, or show associations with phenotypes like addiction? The answer is often through genome-wide association studies, commonly abbreviated as GWAS (pronounced “GEE-wahs”).
A GWAS is a systematic scan of the entire genome, comparing many people to see whether specific genetic variants (usually SNPs) are associated with a particular trait or disease. For a visual overview, see the cartoon explainer of GWAS developed by the Broad Institute (2017).
In a GWAS, researchers examine hundreds of thousands, or even millions, of SNPs across the genome in a large group of individuals. The goal is to identify SNPs that are statistically more frequent in people with the trait of interest (for example, nicotine addiction) compared to people without the trait.
If a particular SNP is significantly more common in the cases (those with the addiction) than in the controls (those without), that SNP is flagged as potentially associated with the trait.
Figure 5. Genome-wide association study (GWAS) concept: compare SNP frequencies in individuals with and without a disease to identify variants associated with disease risk. Image Credit: NHGRI 2020.
The Process of GWAS
At the outset, researchers must define the phenotype, or trait, they are studying. In addiction research, a phenotype might be a clinical diagnosis, such as opioid use disorder defined by diagnostic criteria, or a quantitative measure, such as the number of cigarettes smoked per day.
Once the phenotype is defined and DNA is collected, usually through blood or saliva samples, genotyping is performed using SNP arrays or sequencing technologies. This process yields each individual’s genotype at hundreds of thousands or even millions of SNPs. Researchers then conduct a statistical test for each SNP to evaluate whether variation at that location is associated with the trait of interest.
Because so many SNPs are tested, researchers apply a very stringent standard for statistical significance. This high threshold helps distinguish true associations from results that could arise by random chance. Any SNP that surpasses this cutoff is considered genome-wide significant and treated as a strong candidate for a real association.
Replication is essential. To be confident that a finding is not a statistical fluke or a quirk of a single dataset, significant SNP–trait associations should be confirmed in an independent sample. Replication strengthens confidence that the association reflects a genuine biological signal.
Interpreting GWAS Results
GWAS results are commonly visualized using a Manhattan plot, named for its resemblance to a city skyline. Each dot represents a SNP, plotted by its genomic position along the x-axis (typically chromosomes 1–22) and by the strength of its association with the trait on the y-axis (often shown as −log10(p-value), where higher values mean stronger statistical evidence).
Most points appear near the bottom of the plot, indicating little or no association. Occasionally, tall “skyscraper” peaks rise above a horizontal threshold line, marking variants (or regions) that reach genome-wide significance. These peaks highlight genomic regions where variants are statistically associated with the trait. Because true associations are rare and effect sizes are usually small, only a limited number of regions typically stand out, even in very large studies.
Figure 6. Manhattan plot showing genome association with microcirculation (GWAS example). Source figure from Ikram et al. (2010), PLOS Genetics; shared via Wikimedia Commons under CC BY 2.5.
Interpreting a Manhattan Plot for Externalizing Behaviors
Figure 6 provides a concrete example of how to read a Manhattan plot. Each dot is a single DNA variant. The x-axis shows where that variant sits in the genome, organized by chromosome. The y-axis shows how strong the statistical evidence is that the variant is associated with the phenotype being studied. Higher dots mean stronger evidence.
The dashed horizontal line represents the genome-wide significance threshold. Peaks that cross this line suggest genomic regions where many nearby variants show strong signals, often because they are correlated with each other (they tend to be inherited together). In practice, researchers do not stop at “a tall peak.” They check whether the signal replicates in an independent sample, and they follow up to identify which genes or biological pathways might plausibly connect that genomic region to the phenotype.
Key takeaway (from Figure 6). Manhattan plots typically show a “many dots, few peaks” pattern, reflecting that complex traits are influenced by many variants with small effects, plus a smaller number of variants or regions that clear a very strict significance threshold.
Connecting this back to externalizing and addiction
Although Figure 6 is a GWAS example for microcirculation (not a behavioral trait), Manhattan plots for behavioral phenotypes use the same logic and look visually similar. In GWAS of externalizing-related traits (such as impulsivity, rule-breaking, and early substance use), researchers also look for peaks above the genome-wide threshold, interpret them as associated genomic regions, and require replication to confirm the findings.
Externalizing is widely understood as polygenic, meaning there is no single “gene for” addiction or risk-taking. Instead, many small genetic influences are spread across the genome. Individually, each effect is tiny. Together, they can shift the odds at the group level. When these small effects are combined into a polygenic score (PGS), the score can capture some portion of variation in externalizing in large samples, but it does not predict destiny for any single person.
Figure 6 shows how to read the plot; for a domain-specific externalizing example, see Manhattan plots reported in large externalizing GWAS papers (for example, Karlsson et al., 2021).
Notable Findings from GWAS of Addiction
Recent large-scale GWAS have moved from mixed early results to several replicated discoveries that clarify substance-specific signals and shared vulnerability.
Alcohol Use Disorder (AUD). Meta-analyses repeatedly identify variants in alcohol-metabolizing enzymes, especially ADH1B, and also ADH1C and ALDH2 (Zaso et al., 2019). Certain ADH1B alleles (common in some East Asian groups) are protective because they speed acetaldehyde buildup, producing an aversive flushing response (Cho et al., 2023).
Opioid Use Disorder (OUD). A coding SNP in OPRM1 (the μ-opioid receptor) shows a robust association with opioid dependence and replicates across cohorts (Zhou et al., 2020). This is biologically plausible because OPRM1 is the receptor targeted by opioids like heroin and oxycodone (Gaddis et al., 2022). The finding underscores how receptor biology can shape individual risk.
Shared vulnerability across substances. A 2023 mega-analysis of more than 1 million individuals identified 19 independent markers for a cross-substance addiction risk factor spanning alcohol, nicotine, cannabis, and opioids (Hatoum et al., 2023). Many signals map to genes involved in dopamine signaling, highlighting shared reward circuitry. Higher genetic liability for addiction also correlates with elevated risk for several mental-health and medical conditions, suggesting overlapping genetic architecture.
These examples illustrate that GWAS can point to both specific genes (like OPRM1 or ADH1B) and broader biological systems (like dopamine regulation) as relevant to addiction. Each associated SNP is a clue. For example, it may affect receptor function or drug metabolism, which can influence responses to substances.
Limitations and Challenges of GWAS
GWAS are powerful discovery tools, but their findings come with important caveats that affect interpretation and use.
Small effects and polygenicity. Most associated SNPs shift risk by only a few percent, and complex traits like addiction are influenced by thousands of variants. Even large studies often explain only a small share of total risk, so GWAS signals are best viewed as small pieces of a much larger puzzle.
Population stratification. If cases and controls differ in ancestry, allele-frequency differences can create false associations unrelated to the trait. Statistical corrections (for example, principal components) help, but residual bias can remain. This is one reason diverse, well-matched samples matter.
“Missing heritability” and LD tagging. A significant SNP often tags a broader region because neighboring variants are inherited together (linkage disequilibrium). Identifying the causal change usually requires fine-mapping, sequencing, and additional study designs.
Need for functional follow-up. Association does not reveal mechanism. A variant might alter a protein, change gene expression, or affect regulation in specific tissues or developmental windows. Post-GWAS functional genomics is essential to translate hits into biology.
Winner’s curse and replication. First reports tend to overestimate effect sizes, and some early findings fail to replicate. Independent replication and meta-analysis are necessary to confirm signals and obtain more accurate effect estimates.
Polygenic Scores (PGS)
A polygenic score (PGS) (or polygenic risk score – PGS) collapses information from many tiny genetic effects into a single number that estimates a person’s inherited predisposition to a trait (e.g., addiction risk). In plain terms, it’s a “genetic risk tally” much like a credit score condenses your financial history into one number that shifts your likelihood of loan approval. A high PGS does not guarantee an outcome, and a low PGS does not prevent it. It changes probabilities.
Constructing a PGS
Select SNPs (which DNA sites to include). Researchers start from GWAS results and decide which single-nucleotide polymorphisms (SNPs) to use. One strategy includes only SNPs that pass a very strict p-value cutoff (i.e., results unlikely to be due to chance). Another includes many more “sub-threshold” SNPs because lots of very small signals can help prediction when combined. Researchers also remove highly correlated SNPs (called LD pruning) so the score does not double-count the same signal.
Assign a weight to each SNP (how strongly it relates to the trait). Each included SNP gets a weight based on its effect size from the GWAS, typically a regression beta or a (log) odds ratio. In plain language, this number tells you how much carrying one copy of the risk allele nudges the trait up or down. Positive weights push risk higher, negative weights push it lower.
Calculate the person’s score (sum the weighted alleles). For each SNP, count how many risk alleles the person has, 0, 1, or 2, and multiply by that SNP’s weight. Then add those products across all selected SNPs: PGS = Σ (allele count × weight). Conceptually, this is just “add up all the tiny nudges.” If the GWAS reported odds ratios, analysts usually take logarithms first so the math becomes simple addition rather than multiplication.
Standardize the score (make it interpretable). After computing raw PGS values for everyone in a dataset, researchers typically standardize them, for example by converting to a z-score (how many standard deviations above or below the group average) or to a percentile. In plain terms, a 90th-percentile PGS means your score is higher than 90% of people in that sample.
Figure 7. Workflow for constructing a polygenic score (PGS) from GWAS effect sizes by selecting SNPs, weighting genotypes, summing across variants, and standardizing the resulting score for interpretation. Created by the author. Production note: created using generative AI and edited by the author.
What a PGS Tells Us (and Does Not)
Probabilities, not certainties. A higher PGS means a higher likelihood, on average in large groups, of the trait. It is not a guarantee for any one person. Like a credit score, it shifts odds, and behavior and context still matter.
Context matters. Environment and choices can amplify or buffer genetic liability. For example, a person with a high PGS for alcohol problems who never drinks will not express that risk, while high stress or easy access could increase risk for someone who does drink.
Figure 8. Created by the author using synthetic (simulated) data for educational purposes. Conceptually informed by findings from Karlsson Linnér et al. (2021), Nature Neuroscience, but does not reproduce or depict original study data or figures. Released under a Creative Commons Attribution (CC BY 4.0) license. Production note: created using generative AI and edited by the author.
Figure 8 illustrates this principle using a synthetic (hypothetical) example of substance use outcomes across groups of individuals binned by their PGS. In this illustration, average rates of alcohol use disorder (AUD), illicit drug use, and opioid use increase as we move from the lowest-scoring 20% to the highest 20%. The key point is the shape of the relationship, not the exact numbers: the increases are gradual, not absolute. Many people with high scores never develop problems, and some with lower scores still do.
This matches what real studies typically find, including large-scale analyses of externalizing and substance outcomes (Karlsson Linnér et al., 2021). The takeaway is that polygenic scores shift group averages. They help explain patterns at the population level, while individual outcomes remain shaped by many other genetic, environmental, and behavioral factors.
Figure 9. Normal distribution of polygenic scores in a sample, illustrating relative position using the 10th, 50th (median), and 90th percentiles. Created by the author. Production note: created using generative AI and edited by the author.
Figure 9 reinforces two key points:
Being in a “high” or “low” group is always relative to others in the sample, not an absolute threshold.
The vast majority of people are clustered around the middle of the distribution, where genetic liability is modest.
Taken together, Figures 8 and 9 show that polygenic scores reflect a continuum of probabilities, not destinies. Higher scores may nudge risk upward on average, but there is no sharp dividing line between “affected” and “unaffected.”
Common Research Uses of PGS
Stratifying risk in studies. Researchers can compare outcomes for participants with higher versus lower PGS to see whether trajectories (e.g., earlier initiation or faster escalation) differ.
Testing gene–environment interplay. PGS provides a single summary of genetic liability that can be interacted with environmental measures (e.g., stress, parental monitoring) to test whether context changes genetic effects.
Adjusting for inherited propensity. Studies evaluating programs or exposures can include PGS as a covariate to account for baseline genetic differences among participants.
Exploring shared biology. PGS for one trait (e.g., depression) can be tested against another outcome (e.g., substance use) to probe pleiotropy, meaning shared genetic influences across conditions.
Limitations of PGS
Ancestry transferability. Polygenic scores often do not transfer well across populations with different ancestral backgrounds. If the GWAS that supplied the effect sizes was mostly European-ancestry, the same score can predict poorly for people of East Asian, African, or admixed ancestries because allele frequencies and effect sizes can differ. This is why increasing diversity in genetic research is essential so that PGS are equitable and accurate for all groups.
Small variance explained. A PGS usually accounts for only a small slice of the overall differences among people (e.g., “about 5% of variance” in liability). In plain language, most of what makes individuals similar or different on the trait is not captured by the score. Other genes, rare variants, environment, and chance all matter. As a result, prediction for a single person is limited: some people with high scores will not develop problems, and some with low scores will.
Environment still shapes outcomes. PGS captures genetic propensity, not life context. Two people with the same score can have very different outcomes if one grows up in a supportive, low-risk environment and the other faces high stress or easy access to substances. Put simply, the score nudges probabilities, but environments and choices can amplify, mute, or prevent expression of that liability.
Risk of misunderstanding or misuse. Without careful explanation, people may interpret a high PGS as destiny (“it’s in my genes, there’s nothing I can do”) or panic about stigmatizing labels. Institutions could also be tempted to use scores inappropriately, hence the importance of legal protections and strong ethics guidance discussed later in the module. Clear communication should emphasize that PGS shifts odds. It does not define an individual.
Despite these caveats, PGS research is valuable because it aggregates many tiny genetic effects into a usable summary, helping scientists study risk patterns and gene–environment interplay. For complex traits like addiction, there is no single “gene for” the outcome. Many small influences add up, and PGS gives us one careful way to quantify that aggregate.
Ethics and Responsible Use of PGS
Why genetics research must be handled responsibly
Scenario to frame our ethics lens: In 2018, media covered clinics exploring embryo screening for polygenic disease risk. Imagine adding “addiction risk” to that list. Who decides what counts as “high risk”? Could that label follow a person for life? This section equips you to spot risks and communicate findings responsibly so the science helps people, not harms them.
1) Data privacy and informed consent
Genetic data are uniquely identifying. Even “de-identified” datasets can sometimes be re-identified when combined with other information. Treat DNA as high-risk personal data: use robust security, least-privilege access, and controlled-access repositories. In ABCD and similar studies, qualified researchers access genetic files through gated systems designed to protect participant confidentiality.
Informed consent must be specific and clear. Participants (and caregivers) should understand what will be measured, how data will be stored and shared, who may access it, whether results might be reused in future studies, and their right to withdraw. Make these elements explicit whenever genomic data are collected or analyzed.
2) Avoiding genetic determinism and stigma
Communicate probabilities, not destinies. GWAS and PGS shift odds at the population level. They do not define any one person. Environment and choices still matter. Scores nudge risk but do not fix outcomes.
Name common misinterpretations up front. High PGS does not equal inevitability. Low PGS does not equal immunity. Warn against institutional misuse and note that legal and ethical guardrails restrict certain uses (e.g., U.S. protections like GINA for employers and health insurers).
Adopt precise, respectful language.
Do say:
“Genetic liability increases average risk in some contexts.”
“Heritability is a population statistic.”
“Supports can buffer risk.”
“Results may be age-graded and context-dependent.”
Don’t say:
“Born to be addicted.”
“High heritability means environments don’t matter.”
“50% heritable means half of your behavior is genetic.”
Why representation matters. Many GWAS historically over-sampled European-ancestry participants. As a result, PGS often transfers poorly to other ancestries because allele frequencies and effect sizes can differ, reducing accuracy and fairness. Prioritize diverse cohorts, transparent documentation, and cross-ancestry validation so findings benefit everyone.
Avoid scientific racism and typological thinking. Race and ethnicity are social constructs and not proxies for genetic ancestry. Use precise genetic markers; justify any population descriptors; be transparent about classification methods; and acknowledge limits to generalizability.
Keep between-group and within-group inferences separate. Heritability within a group cannot explain differences between groups living in different contexts. Maintain an equity lens: focus on supports and opportunity structures, not “deficits.”
4) Responsible use of genetic data (policy and practice)
Purpose-bound use and minimum necessary. Collect and analyze only what you need; avoid repurposing data without consent. Use controlled-access workflows, log decisions, and align with IRB and data-use agreements.
Guardrails against misuse. Clearly state that GWAS and PGS are research tools and are unsuitable for labeling, tracking, or penalizing individuals (e.g., in schools, workplaces, or insurance). Note that legal protections (e.g., GINA) and evolving policies aim to reduce discrimination, but vigilance and clear communication remain essential.
Language that reduces harm. Carry forward Module 4’s communication principles: use neutral, action-oriented phrasing; avoid deterministic labels; and foreground modifiable supports (mentoring, routines, coping skills, evidence-based prevention).
Closing note: balanced optimism. Ethical risks are real, but they are manageable. Strong privacy practices, clear consent, diverse study designs, careful communication, and community engagement all improve the science and its impact. Used responsibly, GWAS and PGS can illuminate mechanisms, sharpen study design, and ultimately inform prevention without stigmatizing individuals or communities.
ABCD Study’s Genetic Data and Resources
The ABCD Study collects genomic data to illuminate how inherited differences contribute to brain development and addiction-related behaviors across adolescence. Below is how ABCD gathers, processes, and shares genetic data, the design features that make it uniquely powerful, and how researchers are using these resources, alongside a reminder that all work with DNA requires careful ethical stewardship.
How is DNA collected?
Saliva (primary method). ABCD obtains DNA primarily via saliva, non-invasive and youth-friendly, then extracts genomic material for downstream analyses.
Biospecimen note. ABCD collects additional biospecimens for other scientific aims (e.g., exposure assays), but genotyping for genetics is based on saliva-derived DNA.
How is the genetic information analyzed?
Genotyping platform. ABCD has used the Axiom Smokescreen array, enabling dense coverage of SNPs relevant to neurodevelopment and substance-use research.
What variants are studied? Analyses focus on single-nucleotide polymorphisms (SNPs). Post-GWAS tools (e.g., polygenic scores) summarize many tiny effects into a single index of inherited liability.
Quality control and population structure. Standard QC (e.g., call-rate thresholds, Hardy–Weinberg checks, sex/relatedness concordance) precedes analyses. Ancestry inference (e.g., principal components) helps limit confounding from population structure.
Imputation. Genotypes are imputed against reference panels to expand variant coverage beyond the typed SNPs.
Data availability and access
Versioned, documented releases. Genetic datasets are distributed with rigorous QC documentation to support reproducibility.
Controlled access. Qualified investigators obtain de-identified data through controlled-access repositories under data-use agreements, balancing scientific openness with participant privacy and consent.
Unique features of ABCD genetics
Population breadth. ABCD enrolls participants from diverse U.S. communities, improving generalizability and enabling tests of portability for GWAS findings and polygenic scores across backgrounds.
Twin/sibling design. A substantial twin and sibling subsample allows researchers to combine behavioral genetic designs (A/C/E, discordant-twin contrasts) with molecular approaches (GWAS/PGS) within the same longitudinal cohort, which is powerful for probing gene–environment interplay and developmental timing.
How researchers use the ABCD genetic data
Validating biology of the brain. Linking genomic variation to neuroimaging and cognitive phenotypes to test whether previously reported brain-related loci generalize to youth.
Psychiatric and behavioral liability. Examining how polygenic propensities for psychiatric conditions relate to emerging psychopathology and substance-use trajectories in adolescence (Fan et al., 2023: 160).
Cross-trait architecture and G×E. Exploring shared genetic architecture across externalizing, internalizing, and substance phenotypes, and testing how environmental contexts (family, school, neighborhood) tune genetic effects over time.
Ethics connection: Access controls, careful population descriptors, and clear consent are integral. See this module’s Ethics and Responsible Use section for guidance on privacy, communication, equity, and responsible interpretation.
Works Cited
Auchter, A. M., Myers, C. E., Mann, J. B., & Ryan, L. A. (2018). A description of the ABCD organizational structure and the development of a large-scale neurodevelopmental study. Developmental Cognitive Neuroscience, 32, 8–15. https://doi.org/10.1016/j.dcn.2018.04.003
Baurley, J. W., Edlund, C. K., Pardamean, C. I., Conti, D. V., & Bergen, A. W. (2016). Smokescreen: A targeted genotyping array for addiction research. Addiction Biology, 21(3), 517–525. https://doi.org/10.1111/adb.12345
Bird, K. A., & Carlson, J. (2021). Typological thinking in human genomics research contributes to the production and prominence of scientific racism. Manuscript in preparation.
Cardenas-Iniguez, C., & Robledo Gonzalez, M. (2023). Recommendations for the responsible use and communication of race and ethnicity in neuroimaging research. Nature Neuroscience, 26(1), 45–60. https://doi.org/10.1038/s41593-022-01234-5
Casey, B. J., Cannonier, T., Conley, M. I., et al. (2018). The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience, 32, 43–54. https://doi.org/10.1016/j.dcn.2018.06.004
Cho, Y., Lin, K., Lee, S. H., et al. (2023). Genetic influences on alcohol flushing in East Asian populations. BMC Genomics, 24, 638. https://doi.org/10.1186/s12864-023-09721-7
Fan, C. C., Loughnan, R., & ABCD Genetic Working Group. (2023). Genotype data and derived genetic instruments of Adolescent Brain Cognitive Development Study® for better understanding of human brain development. Behavior Genetics, 53(1), 31–45. https://doi.org/10.1007/s10519-022-10002-3
Gaddis, N., Mathur, R., Marks, J., et al. (2022). Multi-trait genome-wide association study of opioid addiction: OPRM1 and beyond. Scientific Reports, 12, 16873. https://doi.org/10.1038/s41598-022-21003-y
Gymrek, M., Willems, T., Mandal, S., et al. (2013). Identifying personal genomes by surname inference. Nature Genetics, 45(6), 304–309. https://doi.org/10.1038/ng.2644
Hatoum, A. S., Colbert, S. M. C., Johnson, E. C., et al. (2023). Multivariate genome-wide association meta-analysis of over 1 million subjects identifies loci underlying multiple substance use disorders. Nature Mental Health, 1, 210–223. https://doi.org/10.1038/s44220-023-00034-y
Iacono, W. G., Heath, A. C., Hewitt, J. K., Neale, M. C., Banich, M. T., & Luciana, M. (2018). The utility of twins in developmental cognitive neuroscience research: How twins strengthen the ABCD research design. Developmental Cognitive Neuroscience, 32, 30–42. https://doi.org/10.1016/j.dcn.2017.09.001
Karlsson Linnér, R., Mallard, T. T., Barr, P. B., et al. (2021). Multivariate analysis of 1.5 million people identifies genetic associations with traits related to self-regulation and addiction. Nature Neuroscience, 24, 1367–1376. https://doi.org/10.1038/s41593-021-00908-3
National Academies of Sciences, Engineering, and Medicine. (2018). National Academies Press. https://doi.org/10.17226/[DOI]
National Academies of Sciences, Engineering, and Medicine. (2023). Using Population Descriptors in Genetics and Genomics Research. National Academies Press. https://doi.org/10.17226/[DOI]
Plomin, R., DeFries, J. C., & Fulker, D. W. (1988). Nature and Nurture during Middle Childhood. Blackwell.
Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A., & Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38(8), 904–909. https://doi.org/10.1038/ng1764
Uban, K. A., Horton, M. K., Jacobus, J., Heyser, C., Thompson, W. K., Tapert, S. F., Madden, P. A. F., & Sowell, E. R. (2018). Biospecimens and the ABCD study: Rationale, methods of collection, measurement and early data. Developmental Cognitive Neuroscience, 32, 97–106. https://doi.org/10.1016/j.dcn.2018.06.005
Wellcome Sanger Institute. (2022). Human genome sequencing advancements. Retrieved from https://www.sanger.ac.uk/
Zaso, M. J., Goodhines, P. A., Wall, T. L., & Park, A. (2019). Meta-analysis on associations of alcohol metabolism genes with alcohol use disorder in East Asians. Alcohol and Alcoholism, 54(3), 216–224. https://doi.org/10.1093/alcalc/agz011
Zhou, H., Rentsch, C. T., Cheng, Z., et al. (2020). Association of OPRM1 functional coding variant with opioid use disorder: A genome-wide association study. JAMA Psychiatry, 77(10), 1072–1080. https://doi.org/10.1001/jamapsychiatry.2020.1206