If you have felt you learned something in the past year please consider becoming a paid contributor to this Substack. It’ll always remain paywall free.
At my day job we’re currently in the middle of a raise for our Series A. This is what we’re up to, in part. If you’re interested, please let me know at charles@traitwell.com.
You can upload your DNA for our totally free pharmacogenomics app here.
Sequence-once, test-many pharmacogenomics promises much but so far has yielded modest gains. The key is to go beyond disease categories to continuous dispositions, and reorganize medical practice away from exclusive specialization on particular dis- eases towards comprehensive gene-centered practice. Differential medicine. Necessary, but not necessarily easy. Whence opportunity.
Introduction
It was all so beautiful. When the human genome was first sequenced—most of it, anyway—before the year 2000 was out, the era of personalized medicine beckoned. Francis S. Collins had the public’s ear. Not as a trophy, but as a willing receptacle. We’re all different, so gloriously different, and once we can read your genes all your peculiar idiosyncrasies– evolution’s gift—can be incorporated into a treatment plan personalized just for you. What happened?
Taking Variation Seriously
Not so long ago—as recently as 150 years ago—variation was little understood, the province of astronomers and their ‘personal equations’. It seemed to be a nuisance more than anything else. Measurement error, perhaps unavoidable. Something to be discounted and averaged out, or corrected. Then Charles Darwin and Francis Galton elevated variation from the back-seat to a front-seat first-class phenomenon. Evolutionary theory depends on it. Some variation is just pure noise to be sure, without any systematic explanation. But much of it is not. As always, some reduction was needed to understand this.
We observe phenotypic variation, in our bodies and even more broadly in all the things we build, one way or another (the extended phenotype). In large part this comes from genetic variation. This was clear to Darwin and his cousin even before we knew what genes are exactly. What we did know is that they are inherited at a biological level—handed down. But not all phenotypic variation is genetic, since there are many other sources of variation, including pure noise and all those other things commonly—all too lazily—ascribed to ‘the environment’. That last category is in truth no more than a remainder bucket: it contains absolutely anything that is not genetic, including chance developmental errors in the long biochemical chain and beyond—between instructions that seek to build things and their eventual products, from bodies to Mozart symphonies.
There are deep-seated reasons for the genetic component of phenotypic variation.4 Sex- ual reproduction makes it happen through recombination of genes. Those genes randomly mutate, in all sorts of interesting ways, due to external insults like radiation and chemical exposure, besides copying errors, but only rarely. Most of those mutations are highly harm- ful, where they have any effect, and selected against. Sex is useful for a number of reasons, all of which play some part in making it a good reproductive strategy.
There is an arms race between our genes–really the genes of any complex organism—and the diseases which prey on us and them. It has been estimated that bubonic plague wiped out about a third of the entire population of Europe in the 14th century. The disease and its carrier yersinia pestis persist today but it claims few victims, perhaps because the most vulnerable genotypes were selected out a long time ago.
If we reproduced by cloning ourselves, as some simpler organisms do, we would all be copies with only some random differences due to odd mutations. Easy prey for an evolved disease, and disease is important to this discussion.
Shuffling our genes has enabled us to keep pace with and even out-sprint disease—so far at least. That’s because genes have consequences, and shuffling them has consequences. Different combinations of genes produce different results. If they didn’t, shuffling them would be a waste of time. There is a real cost to sexual reproduction, since fully one half of the population (men) sacrifice their opportunity to directly clone themselves. That’s a huge deficit. The advantages of sex must be overwhelming to make it back up in increased survival of descendants.
There are other benefits to gene recombination beyond this. If we cloned ourselves, then as mutations creep in randomly over time we would inexorably build up a ‘genetic load’ of mutations in all our descendants, which would after all just be exact copies of ourselves, and copies of copies etc. Sexual reproduction allows bad mutations to be shuffled off into reproductive dead ends, which go no further after natural selection.
The human genetic code is redundant, with two copies of most information, and a good backup can be recovered through recombination. Every now and then, the mutant proves better than the original. But that works too. (Without natural selection genetic load will build up in our sexually reproducing system too, though much more slowly, and there is some reason to believe it is now doing so.)
It should hardly come as a surprise to find then that when we measure real-world health outcomes, we usually find a substantial association between differences in those outcomes and differences in genes. It is positively to be expected. But the same is true for a much broader range of traits. We will return to this theme shortly.
Genetic variation may be at a gross level (copy number variants, that is insertions or deletions of entire subsequences) or at a finer level (single nucleotide polymorphisms) within the genome. It may also occur in mitochondrial DNA, carried separately and never recombined (inherited intact except for occasional mutations). What can vary and survive selection has likely already done so in practice and will be encountered in a large enough population in some proportion.
Single-Gene Disorders
In the earliest days of genetic epidemiology, before DNA sequencing was invented, knowl- edge extended only to so-called single-gene or Mendelian disorders, like cystic fibrosis and Huntington’s Disease. The Online Mendelian Inheritance in Man (OMIM) repository lists around 10,000 of these, and it is no longer feasible to publish it in book form (though once it was). Pedigree analysis revealed that some families carried (hypothesized) genes with a large effect, striking enough to produce obvious physiological disorders, often with increased mortality. Ordinarily such a gene is rare in the broader population.
Once the structure of DNA was identified by Crick and Watson, and knowledge of chromosomes matured, it became possible to map the location of some of these genes by patient detective work. Such techniques exploit linkage, whereby certain genes are not inherited independently but are associated physically, perhap, though not always, because they are on the same chromosome. Markers like blood groups can also be used to help find such mappings. With a location identified, selective sequencing could later be done to detect variants.
The reason that genes with large effect are rare is interesting and important for this discussion. Typically a mutation is harmful, though very rarely it may be beneficial, since it is much easier to make things worse by randomly changing things than it is to improve them (try that on an old radio set). If it is beneficial, selection will drive it throughout the whole population, over a period that is proportional to its benefit (which depends on its so-called ‘penetrance’ or ‘effect size’). Even a small benefit will be driven to ‘fixation’ given enough time to saturation, where just about everybody has it. (There are exceptions to this where the benefit depends on the variant being relatively rare, and other edge cases beyond our scope here.) If it is harmful it will, by exactly the same argument, be eliminated relentlessly by natural selection.
Even supposedly single-gene disorders can still demonstrate variable outcomes (e.g. mor- tality) in practice, depending on the other gene variants carried. It is likely that many of the so-called Mendelian disorders are more correctly thought of as special cases of polygenic disorders. Cystic fibrosis is a well-documented example of this sort of thing. (Khoury, Little, and Burke 2004, 199).
Polygenic Dispositions
Disease categories are discrete. You are supposed to either have the disease or not. In practice it is less clear-cut for many people. They may be on the borderline, it may be unclear. Moreoever, you might not have the disease yet, but show a markedly elevated risk of getting in future (as in the case of ‘pre-diabetes’). An alternative way to think of this is to place people on a continuum, where one end of the continuum definitely does not meet the level of disease, and the other definitely does. We may call this a disposition. All people can be placed somewhere on that continuum, and may be surprised to learn their position. When we are not talking specifically of disease or disorder, we may speak of traits: phenotypic characteristics that are displayed, like cooperativeness or aggression.
It has been clear since the early 1900s that where genes are involved in traits (including diseases) usually multiple genes or involved. The traits are thus polygenic. The trait is typically normally distributed within the population, taking on (at least approximately) continuous values as one of the above ‘dispositions’.
Early attempts to find associations with single genes used poor statistical methods, raising hopes that were dashed when the associated interventions failed and the association also failed to replicate—unfortunately the former often preceded the latter, squandering time, resources. and goodwill. With the advent of affordable sequencing of large subsets of genes, Genome-Wide Association Studies have been teasing out these links. With stringent scientific standards the links found have proved to be replicable with good predictive power.
Organophosphates
A good illustration of the principles behind differential response to influences, mediated by genes, is given by organophosphates, widely used as pesticides, e.g. chlorpyrifos, a chlori- nated version. In sufficient doses, and especially when oxidized, these substances are highly neurotoxic to animals, causing hundreds of thousands of deaths worldwide, aside from pro-longed illness requiring costly treatment. Federal regulations in the United States limit permissible exposure to these agents. But those regulations are based on averages and do not consider variation of genotypes and associated phenotypes.
There are people in the population who are significantly more sensitive to organophos- phates, due to genetic variation. Doubtful meat to anyone, possible poison to some. This is traceable to polymorphisms of the PON1 gene on the long arm of the 7th chromosome (thus 7q21-7q22). That gene appears to regulate production of the associated paraoxonase enzyme, circulated in the blood with HDL lipoproteins. Paraoxonase metabolizes the dangerous oxon metabolites produced by organophosphates.
Standard regulations are not adequate for cases where genes predispose the carriers to super-sensitivity, and they can benefit from reduced exposure. There is reason to believe that this sort of variation is present for many toxins. Accounting for variable sensitivity will also lower healthcare costs for all concerned, given the serious consequences of overexposure..
There are two principal directions for pharmacogenomics—rather than “pharmacogenetics”, since the whole genome in all its resplendent aspects must now be taken into account not just the protein-manufacturing genes—pharmacodynamics and pharmacokinetics.
Pharmacokinetics
Pharmacokinetics concerns itself with differences in the effects of drug metabolizing enzymes, as a matter of degree. This is intimately tied to the optimal dose. Enzymes break things down biochemically and are (unsurprisingly) under genetic control. Their names end in “-ase”, leading JBS Haldane to quip that DNA helix strands might be separable using un- twister-ase.
In individuals who metabolize a drug (or rather its products) slowly—which means they do not get rid of it—a higher dose may produce more prolonged exposure leading to an adverse drug reaction (ADR, in medical argot).
In 2013, some researchers estimated the annual cost of ADRs to run “up to 30.1 billion dollars”, suggesting that considerable savings might be achieved by reducing them.
Variability in this was observed as early as the 1940s and 1950s for drugs such as primaquine (an anti-malarial), isoniazid (for TB) and succinylcholine (for anesthesia). This was soon linked through family studies to variation in the enzymes N-acetyl transferase, pseudocholinesterase and glucose-6-phosphate dehydrogenase.
Well-known recent examples include 6-mercato-purine (6-MP), an anti-leukaemia drug, and the blood anticoagulant warfarin. In the former case, there are gene polymorphisms pro- ducing variation in the enzyme thiopurine S-methyltransferase (TPMT), which metabolizes the 6-MP drug, inactivating it over time. Failure to inactivate 6-MP after its initial effect leads to toxic overexposure. Similarly, in the case of warfarin, the enzyme P450 CYP2C9 metabolizes it, and those who are deficient in CYP2C9 due to their gene polymorphisms may need more monitoring and lower doses to prevent several bleeding due to prolonged loss of coagulation.
Pharmacodynamics
Pharmacodynamics concerns itself with the reaction of biochemical receptors to drugs. That varies by individual and even (on average) by group. This makes certain drugs more effective in producing the desired physiological response. Thus pravastatin is much more effective for some gene alleles affecting the enzyme CETP.
In the past, additional variability in pharmacodynamic reactions has been observed. This has commonly been ascribed to gene interactions with lifestyle choices and other factors, though with weak evidence. Some of these have proven to be polymorphisms in enzymes instead, under the control of other genes. More discoveries of this kind should be expected and should serve as a caution against facile explanations, especially where those are not part of the original study design. The important subject of enzyme polymorphism leads us back to pharmacokinetics.
Drug discovery
Once a genetic association is found, many possibilities exist for leveraging that information. Finding causal associations between genes and outcomes invites interventions which target the observed products of those genes, perhaps to amplify or attenuate them. Even if an association has a small effect, it may unravel part of the way in which the body functions physiologically and yield interventions anyway. We will not discuss that drug-discovery path further here, since it is an overly-rich topic in its own right.
Screening and Set Ways
A disease (or any trait) factor with a small effect size is called ‘low-penetrance’ in epidemi- ology. Here penetrance refers to the percentage of cases with the gene who actually develop the disease or condition. Outside of the very rare single-gene disorders with high penetrance (discussed above) most genes associated with outcomes have modest, even tiny, effect sizes. The effect size may be expressed as an odds ratio—that is, the odds of getting the disease with that gene, divided by the odds for getting it without that gene. (Recall that the odds = p / (1-p) where p is a probability.) This information is gleaned from large case-control studies, and from more sophisticated models using other data sets (e.g. Mendelian ran- domization, case-only studies, GWAS with logistic regression or machine-learning). When a weighted combination of these is used as a predictor, the resulting polygenic score (PG) has a substantially larger effect, but it typically still modest. Weights for this sum may be optimized by cross-validation and similar techniques.
An example may be useful for thinking about effect sizes and gene frequencies. Suppose that a particular gene of interest has a frequency of 0.5% in the general population. For every 200 people tested, only one will have it. The cost of testing 200 people, call it 200 ∗ C, will have to be offset by the benefit of finding one case. If the gene has 25% penetrance (pretty high) and saves, through appropriate treatment, a year of hospitalization from an adverse drug reaction, that must be reduced (in the heartless logic of healthcare spending) to financial terms, say to S. Given enough cases we expect to save 0.25 ∗ S. To ensure 0.25∗S −200∗C > 0, we may need to reduce C, or find stronger effects than 0.25, or improve our chances of finding the gene (reducing the 200 tests, say to 100), or amortize the cost of the 200 sequences over many simultaneous tests.
It turns out that many (perhaps most) of the genetic polymorphisms involved in phar- macodynamics are not evenly distributed. They stratify by race for example. Thus for the CYP1A1 family of gene polymorphisms—which control production of the cytochrome P450 superfamily of monooxygenase enzymes necessary for metabolization of many drugs and their products—marked differences are known. Whereas 89% of Europeans carry the CYP1A1*1 allele, only about 62% of Asians and 66% of Africans do. But the CYP1A1*2A allele is found in 6% of Europeans, but 15% of Asians and 22% of Africans. This uneven distribution applies to the other alleles too in varying proportions. Many more examples are known. Thus there are easy methods available for enriching the subpopulation tested.
Effect size here should not be confused with explanatory power. Where something is due to multiple independent causes, using just one of them will explain a small amount of the observations. Consider deaths and strychnine poisoning. Few deaths observed in any year are due to strychnine poisoning. A model which used it would explain very few of the incidents. But the effect size of strychnine at a large enough dosage is as high as it can get: death. When considering any particular death, it may explain everything about that death. Assuming that you have no idea in the first place what caused the death you are considering, it is relevant that strychnine is not a promising candidate, since it is relatively rare, but it may be no less promising than any other single factor. If you can search simultaneously for a wide-range of poisons, you may effectively consider it.
Modest effect sizes are par for the course in general epidemiology. Factors involving lifestyle choices, demographic origins, socioeconomic class, government programs and the like typically have even smaller effect sizes than polygenic scores, especially when genomes are properly controlled for in the first place to moderate confounding. The solution is tocombine all these predictors into a general model, rather than throw the genetic baby out with the effect-size bathwater.
Even if a combined model yields a modest effect size, that is still no reason to scorn its usefulness. It may be argued that the cost of gene sequencing does not repay the number of cases identified, since a weak association may require a very large number of subjects to be tested in order to find a few cases. Analyses like that may be conducted within a formal framework for evaluating costs and utilities, such as quality-adjusted life years (QALYs).
However the results of incremental cost-benefit analyses are skewed by the current struc- ture of medical practice, where a patient presents to either a general practitioner, and then is referred to a specialist on the basis of observation, or goes directly to the specialist. Suppose that this is a cardiac specialist. The cardiac specialist may then evaluate the potential ben- efit of genetic testing for (usually rare) genetic polymorphisms, typically of the single gene variety. It is easy to conclude that the exercise is not worth the cost at this stage, unless it can be seen that the patient comes from a background where the genes concerned ar much more common, as in a racial group known to have unusually high levels of the genes, making it much more likely that they will be found. But this whole approach is flawed as will be seen below.
Another argument commonly made here may be called the argument from complexity. It is stated that genes are just one of many factors, that very many genes may be involved, and therefore that the etiology of the disease is ‘too complex’ for ‘simplistic’ approaches. (The astute reader will recognize this sort of argument from many other contexts.) This is fallacious. So-called complexity can actually be easier to deal with using appropriate techniques at a higher level. When Mendelian genetics was first developed, an early practitioner, the biologist William Bateson, famously argued that polygenic effects were so ‘complex’ that they defied analysis, especially among humans.
The solution was already at hand and preceded the discrete Mendelian model, though it was scorned by Bateson, who was classically trained, as an ‘actuarial method’, foreign to his kind of biologist. Biometric methods operate at a higher level, studying the changes within a population by using continuous distributions of traits, which shift or taper in response to selection. RA Fisher and Sewall Wright were able to show that both approaches are valid, and may be reduced to each other by altering the granularity considered. One may study continuous human traits using a Mendelian approach, and it does get messy very quickly, but this “complexity” is resolved neatly just by switching the level of the model.
Sequence-once, Test-repeatedly.
Formal cost-benefit analysis may be correct in depreciating genetic sequencing and testing where the sequence must be done on-demand, when the disease presents itself to an individual specialist. However, there is a scaling error involved.in this. Repeated testing, as the patient proceeds from specialist to specialist, exacerbates it. Here the solution is to sequence once, then test as often as desired for a multiplicity of possibilities. Algorithmically analysing genes which have already been sequenced is relatively cheap. The sequencing cost is fixed and borne once.
Some re conceptualization of medical practice recommends itself. Genetic testing should take place earlier rather than later. Either the genes have already been sequenced for other purposes, or they accumulate at the time that they are done. They should not have to be done repeatedly. DNA is (more or less) forever. In one sweep, multiple possibilities—very many possibilities—can be taken into account. Aggregation at the disease level takes place here. That may be done prospectively, even before the patient presents with a condition, to assist early intervention and provide guidance for lifetime outcomes (consider parents and their children, who may be sequenced even before birth). Doing this reorients the process to be patient-centric, rather than provider-specific.
At the moment practice lags behind theory here, since high fidelity long-read sequencing for specific regions , suitable for clinical use,is not yet realized in this way. Even so-called whole-genome sequencing falls short (the name is an unfortunate misnomer). However this technology is changing rapidly, and one can expect adequate reusable sequencing for most clinical purposes in the near future. Even now a great many genetic variants can be identified even from the common sequencing offered direct to consumers.
As with all statistical effects, it is not wise to suppose that modest genetic effects cannot be translated into large values. Again this is a matter of scale. From the point of view of an individual practitioner, looking at one case, a small advantage may mean little for a particular decision. But from the point of view of a large organization, aggregating over a very large number of cases—say an entire healthcare system, a multinational corporation, a government body, the military or similar—persistent gains from modest wins add up rapidly to make a large difference not perceptible at the coalface. If that organization incorporates testing into its workflow, and optimizes its effectiveness as much as possible by taking other factors into account, reuses any sequencing it does, and ideally tests earlier rather than later in the process, then it can accrue big gains.
Opportunities going forward.
Those beautiful hopes for pharmacogenomics from the year 2,000 were disappointed by the fact that merely being able to locate and address genome positions doesn’t tell you what they do, alone or together. It has taken time to unravel that. Genome-wide association studies have been key to solving that problem.
Further progress on pharmacogenomics depends on overcoming structural obstacles in the practice of medicine (as discussed above), and discovery of more gene associations.
Discovery so far has depended heavily on scale. To find associations you need very large data sets, because it turns out that polygenic traits are usually affected by a individually- tiny contributions from a great many genes. Experimentation with human subjects here is not possible, so observational data must be gathered from ‘natural experiments’ in existing populations. To get large sample sizes you typically need to scrape together data from numerous smaller studies, which are then meta-analyzed. Given these constraints, it is marvellous that we know so much already.
With much larger collections of data, discovery will accelerate. Much of the data used to date has come from low-cost sequencing chipsets, which necessarily sample only gene variants which are reasonably common in populations (which is to say, not vanishingly rare). The use of whole genome sequences, which can describe copy number variations and other structural changes on the genome not limited to SNP mutations, will greatly help, but this comes with added computing burden. Each such sequence is very large, provoking innovative ways to dealing with it.
Challenges and opportunities come together. The payoff from advances in pharmacogenomics— which is to take variation seriously—will be huge.
References
Collins, Francis S. (1999). “Shattuck lecture–medical and societal consequences of the Human Genome Project”. In: New England Journal of Medicine 341.1 (July 1, 1999), pp. 28–37. Darwin, Charles (1868). Variation of Plants and Animals Under Domestication. First edition. 2 volumes. London: John Murray, 1868.
Dawkins, Richard (1982). The Extended Phenotype. The Gene as the Unit of Selection. London: Oxford, 1982.
Fisher, R. A. (1918). “The correlation between relatives on the supposition of Mendelian inheritance”. In: Transactions of the Royal Society of Edinburgh 52 (1918), pp. 399–433. Galton, Francis (1889). Natural Inheritance. London: Macmillan, 1889.
Khoury, Muin J., Julian Little, and Wylie Burke, eds. (2004). Human Genome Epidemiology. Oxford: Oxford University Press, 2004.
Plomin, Robert (2018). Blueprint. New York: MIT Press, 2018.
Plomin, Robert et al. (2012). Behavioral Genetics. Sixth edition. New York: Worth Publishers, 2012.
Polderman, Tinca J. C. et al. (2015). “Meta-analysis of the heritability of human traits based on fifty years of twin studies”. In: Nature Genetics 47 (May 18, 2015), p. 702.
Ridley, Mark (2008). The Cooperative Gene. New York: Free Press, 2008.
Smith, John Maynard (1978). The Evolution of Sex. Reissue. Cambridge: Cambridge University Press, 1978.
Smith, Moyra (2011). Phenotypic Variation. Exploration and Functional Genomics. Oxford: Oxford University Press, 2011.
Sultana J and P. Cutroneo and G. Trifiro` (2013). “Clinical and economic burden of adverse drug reactions”. In: Journal of pharmacology & pharmacotherapeutics 4.Suppl 1 (Dec. 2013), S73–S77.