Prediction: Gene Expression as a comprehensive diagnostic platform

This is part 2 of my series on the brain, neuroscience, and medicine.

Part 1: Neurobiology, psychology, and the missing link(s)
Part 2: Gene Expression as a comprehensive diagnostic platform
Part 3: Neural resonance + neuroacoustics
Part 4: Location, location, location!

I have seen the future of medical diagnosis-- it's elegant, accurate, immediate, mostly doctor-less, comprehensive, and very computationally intensive. I don't know when it'll arrive, but it's racing toward us and when it hits, it'll change everything.

In short-- the future of medical diagnosis is to use a gene expression panel along with known functional and correlative connections between gene expression and pathology to perform thousands of parallel tests for every single human illness we know of-- no matter whether it's acute, chronic, pathogenic, mental, or lifestyle.

What do you mean? And how would it work?

The basis for using gene expression as a comprehensive diagnostic platform goes something like this:

- Gene expression is a measure of which (and to what extent) genes are being made into proteins and RNA. A gene expression test is much like a traditional genetic test, but since it goes beyond merely listing which genes your body has, and shows how much your body is using each one, it's a much better view of what's actually going on inside your body. Genes may be a blueprint of physiological potential-- but gene expression is a snapshot of physiological function.

- The Vast majority of illnesses leave a significant imprint on a person's gene expression. A failing kidney, an inflamed appendix, obesity, a manic episode-- these will influence which genes are activated, and in very specific ways. It's possible, and I think fairly probable, that the imprints distinct physiological insults leave on gene expression will themselves be fairly distinct, and so in theory we should be able to work backward from gene expression to physiological insult.

- Once we've gathered a Large collection of gene expression-known illness pairs (we could build this dataset by requiring e.g., hospitals to collect a gene expression sample when a diagnosis is made), we can start to train computers to identify what gene expression conditions are connected to each illness. Finding these sorts of connections is almost impossible for humans, but there exist computational approaches which in theory are fairly ideal.[1]

So in short, it won't be easy, but I think there's really nothing standing in the way of gene expression tests which use broad-spectrum correlative analysis to screen for all known illnesses at once.

I suspect the possibility of gene expression as a comprehensive diagnostic platform will start to become a "cool" thing for bioscience visionaries to yak about over the next 5 years. The large-scale data collection is, I think, the biggest hurdle, though finding solid correlations in a massive dataset against a background of variable application of diagnostic criteria is also non-trivial. But it's coming.

- Training computer programs (e.g., classification ANNs) requires a lot of good data, as does screening out false positives in a sample as wide as a full genome. Getting enough *good* samples where all the right diagnoses have been made will be challenging.
- Gene expression analysis is still having growing pains. E.g., "Protein sequencing gone awry: 1 sample, 27 labs, 20 results".
- Crunching the numbers on which of 22,000+ different genes (and potentially some non-protein-coding, RNA-producing genes) are correlated with each illness is far from a trivial problem.

- Will gene expression from multiple locations be needed to diagnose some illnesses?
- It seems fairly safe to say we'll be able to diagnose e.g., kidney failure or malaria from gene expression data. But what about internal bleeding? And what about some of the more tricky or subjective mental illnesses? This technology will have its theoretical limits: what are they?

[1] This task falls outside the scope of this writeup, but just going with what I know, I'd take a set of gene expression-known illnesses pairs, divide the gene data up into smaller, more tractable pieces (perhaps along specific gene-network faultlines, perhaps randomly?), and train a classifier neural network on the pieces, which will attempt to predict a specific illness based on what it finds to be the most significant data in the subset. Layer these subset-based classifier neural networks under a 'master' classifier neural network which gives the final yes/no prediction. Test this model on progressively larger out-of-sample data sets. Repeat for each illness. There are undoubtedly solutions orders of magnitude better than this-- but it's a baseline start.

ETA 15-20 years.

Edit, 1-22-10: Based on the available information, I think gene expression is a strongly representative abstraction level from which to draw. However, I see strong arguments for also including the metabolome and metagenome if it's feasible to do so.

Edit, 6-22-10: My ETA may even be too conservative: a collection of researchers from various California universities recently published a method for using gene expression for diagnosis by associating arbitrary gene expression profiles with clustered sets of expression profiles with known diagnoses.

Edit, 6-12-11: This idea depends on the mid-term availability of incredibly cheap gene expression sequencing. I don't think this is unrealistic, given these sorts of trends (courtesy of genome.gov).

Edit, 6-29-11: Gene expression includes an incredible amount of context and nuance, which provides it with a significant advantage over the current (very imperfect) practice of using simple biomarkers.

No comments: