It is by no means a coincidence that an explosion of knowledge about the human genome has occurred simultaneously with huge breakthroughs in computing capability and information technology. Sequencing the genome, after all, depended on being able to digitize the representation of the nucleotides in DNA. The genome’s mechanisms of operation involve intercellular messaging that has intriguing analogues in electronic communication. At a conceptual level, the genome and the computer operating system are by now firmly established as rich and relevant metaphors for each other.
So it was delightful but not altogether surprising to find amid last week’s extraordinary trove of new genomic research the use of computer “subroutines” as a metaphor to explain a body of emerging knowledge about “the complex patterns of dispersed regulation” that now challenge the scientific community’s understanding of what genes are and how they work.
“Given that counting genes in the genome is such a large-scale computational endeavor and that genes fundamentally deal with information processing, the lexicon of computer science naturally has been increasingly applied to describing them,” Yale University’s Mark Gerstein and colleagues write in Genome Research. “Insofar as the nucleotides of the genome are put together into a code that is executed through the process of transcription and translation, the genome can be though of as an operating system for a living being.”
Metaphors are nice, but they don’t make headlines. And in a sense, the release of twenty-nine papers last week describing the work of the ENCODE consortium wasn’t news, either. Gerstein’s paper, for example, summarized a body of work that has been coming out continuously since the first full genome sequencings were completed in 2001 and even before, and which is already familiar in the field. And perhaps any of us, even without an advanced degree in computational biology, might have suspected intuitively that “junk” DNA would turn out to be a misnomer.
But it is news, or it should be, that the extravagant expectations triggered by the 2001 breakthrough must now officially be tempered with the understanding that the intricacies of the genome will humble before they exalt. “It’s a lot more complicated than we thought,” said Francis Collins, who was on the podium when the sequencing was announced and again last week.
Briefly stated (or not so briefly, in overview articles in Genome Research and Nature), the ENCODE project has found that unannotated regions of the genome previously thought to be more or less inert in fact perform a wide variety of signaling and regulating functions, and that the activity of these intergenic regions produces permutations and combinations of gene messaging that are almost infinitely denser and busier than the previous model based on more discrete genetic form and function would suggest.
The unmistakable implication, from the point of view of both basic science and medical applications, is that the more we learn, the less we find out that we know. Thomas Gingeras, for example, devotes the better part of an entire article, cited above, to “transcripts of unknown function,” or TUFs. The genome “is an elegant but cryptic store of information,” says an open-access article in Nature authored corporately by the consortium. The authors continue:
“At present, we have an incomplete understanding of the protein-coding portions of the genome, and markedly less understanding of both non-protein-coding transcripts and genomic elements that temporally and spatially regulate gene expression. To understand the human genome, and by extension the biological processes it orchestrates and the ways in which its defects can give rise to disease, we need a more transparent view of the information it encodes.”
So the flood of medical miracles that genome sequencing seemed to presage will have to wait. In fact, the consensus in the ENCODE consortium seems to be that we’re not even sure any more what the definition of a gene really is. In the meantime, whatever else it turns out to be, the genome makes a terrific metaphor. Think of the blogosphere, for example, where products attach, overlap, replicate, and regulate each other, from near and distant regions, with astonishing prolixity. TUF stuff.