Traitwell's Future of Genomics Series: Identity

We're continuing our discussion about the future of genomics.

Feb 26, 2023

Well, who are you? (Who are you? Who, who, who, who?) I really wanna know (Who are you? Who, who, who, who?) Tell me, who are you?

We’re continuing our analysis of the future of genomics. For those who are curious, please visit Traitwell.com. Be sure to check out our free apps.

Identity

The simplest commercial applications of genomics make no use of traits or prediction, exploiting only the fact that DNA is unique to an individual. This is true even of ‘identical twins’, who can still be distinguished genetically even though they share almost all their DNA. This has obvious legal value for determining identity of living subjects (e.g. for border control, inheritance, criminal records) or of the deceased (e.g. forensic traces at murder scenes). Two kinds of situation exist: a. genetic information has been deliberately recorded and stored in an official registry, where it matches, and b. no registry match is obtainable and indirect methods must be used, typically through relatives for which genetic information can be obtained.

Where official registries are kept often only a small subset of DNA information is kept, large enough to be unique but not otherwise informative. In this way government bodies may choose to relieve themselves of the burden of knowledge about other facts obtainable from genomics, but others may prefer to know.

By calculating and storing a signed digest obtained from a one-way function, rather than the genetic information itself, ignorance can be guaranteed even more thoroughly, and some storage space may be saved. It is impossible as a practical matter to invert the digested information to obtain the original genetic data (a deliberate obfuscation of the data). When a match is checked, the same one-way function is applied to the candidate genetic information, and compared to the stored one-way digests.

There has been little uptake of this to date for border control internationally, to identify foreign national coming and going—but that will quickly change in the future since governments typically grant themselves extensive power to control their borders as they see fit. Similarly, DNA registries for convicted criminals are not common but can be expected to expand rapidly in the future. It is likely that more extensive use of DNA will prove to be valuable enough that governments will prefer to store more extensive genetic information in each case, rather than limiting themselves to not-otherwise-informative information.

Forensic use of DNA to identify victims of crimes and perpetrators from (often degraded) information obtained at, say, crime scenes is already well-known and in extensive practice. If official registries can be searched then this is greatly simplified, given the propensity of criminals to re-offend. In practice such official registries are less common and indirect methods have to be used, matching on relatives where those have volunteered their genetic data to commercial or non-profit DNA services, e.g. for ancestry research. Most useful for this purpose are just-distant-enough relatives like third-cousins, since there are many of these and they are still closely related enough to make a good match. This may be definite in some cases, giving a direct match, but more typically allows the search for suspects to be filtered from a set based on other evidence.

Often there are no DNA matches to be found in databases, but there are emerging strategies for working around this by doing genotype-to-phenotype inference. It will become increasingly feasible to derive hypotheses about the physical appearance and other traits (the ‘phenotype’) associated with a DNA sample, especially where more complete DNA information is available to work on. Facial appearance is controlled by a relatively manageable set of genes, as family resemblances demonstrate, enabling that inference. As with matches of relatives, this will enable filtering of suspect sets and collection of samples from suspects to test more definitely. The technology is currently in its infancy but will rapidly improve in coming years.

Another capability which will develop is the creation of fully privacy-preserving yet full-computable DNA storage. Recall the use of one-way functions mentioned above. Special one-way transformations will be created which can be run on complete DNA sequences, producing values in a private space where computation is done. Instead of computing (say, distances) using the original DNA, one computes using the transformed DNA only, getting the same results, but without every being able to obtain the original DNA from the transforms. This is much more powerful than techniques for merely anonymizing data currently in use, because of the special property that operations are preserved in the ‘transform space’.

Notice also that this discussion applies to other species too, including pets and domesticated animals. Registries of these may be kept, say to facilitate identification of lost and stray animals which have been recovered, so as to reunite them with their rightful owners, or settle disputes. Pet or livestock owners may register their animals with services built for this purpose, and pay for it, or use samples like hair after the animal goes missing. This is more reliable than other means, and with an inexpensive microarray may be implemented very cheaply. These services may be integrated with veterinarians as an on-ramp. Farmers have long branded their animals for the purpose of identification, showing the widespread need for disambiguation.

The idea of a DNA registry can be generalized to encompass memorials. The practice of erecting memorials to loved ones, and to the great and good, is universal and resonates with deep-seated elements of human nature. A registry containing digital memorial artefacts, including DNA, provides a powerful technological update to this practice. The results may be public-facing or restricted to a limited set of people with an interest in, or right to know about, the subject. Inference to phenotypes and traits may form an extension to more basic genetic data. Incorporation of other artefacts, such as photographs, archived social network activity, certificates and the like may heighten interest. The ability to generate automatically scripted but parametrizable videos from this content, incorporating it into a narrative, is also appealing.

The idea of digital DNA memorials applies to pets no less than to people, since increasingly in modern life pets take the role as proxies for children in childless or empty-nest households. Wealthy pet owners already spend large sums of money on pet cemeteries. This extends the idea to the digital realm. Many owners will wish to have their own DNA linked to their pets in this eternal realm. Moreover, reference storage of physical samples from pets can enable future cloning of those animals (a consideration widely considered to be unethical for humans). Indeed, owners may choose to clone other animals from the registry, based on established knowledge of their temperament and behavior, and the permission of the owners. This will be a stud-book based on proven-knowledge with much more predictable results. The same possibilities apply to livestock too, where the importance of breeding is well-established and matching is currently accomplished through more cumbersome means.

Charles Johnson's Thoughts and Adventures

Discussion about this post