This post was originally published on my dev blog, CodeSymphony.co.
I’m a programmer, but I’m also a nature lover, and I enjoy learning more about all of the sciences, especially biology. Recently, I’ve come to realize how much programming and biology share in common.
The basic building block of life is the cell. Actually, cells don’t have to just be building blocks. Single-celled creatures are just one single cell. And yet they have to confront all of the same basic challenges to life that you and I do.
Are Cells Computers?
Are cells living computers? No. They are so much more than that. But, just like you and I have a brain that has amazing computational power, cells have some aspects that are computer-like as well. Cells don’t have brains of course, or anything analogous to a nervous system. But they do have something else, an aspect that we don’t even understand yet in regard to the brain. They have software. Actually, we can go further than that. Cells have a complete OS.
There are several programming languages involved; one of the most well known is DNA. “But wait, isn’t DNA for storing information?” Thanks for asking! Actually, yes, you are correct, DNA is used by the cell to store huge volumes of information, which includes the blueprint not only for the cell’s structure, but also for its development. In your cells’ nuclei is all of the information needed to construct and maintain your body. How much is this? Over 3.2 billion base-pairs of DNA.
The Biotic Byte
Let’s convert that number to something more familiar. Instead of base-pairs, we could use bytes. Let’s take a minute to talk about bytes, just to show that this is a valid comparison. Bytes are actually groups of smaller units, called bits. Bits are binary; they can only be one of two things, a zero or a one. A byte is a string of exactly 8 bits. There are 2^8 or 256 different possible combinations of 8 bits, and so there are 256 unique bytes.
A strand of DNA is made up of base-pairs. These are in groups of three, called codons. We can think of these codons like bytes, and like bytes they are also made up of smaller units, the base-pairs. Unlike bits, which come in only two types, DNA is made bases that come in 4 different letters, A, C, T, and G. That means that twice as much information can be stored in a single letter as can be represented by a bit. So 4 letters of DNA can store the same amount of information as one byte.
Now that we know how to convert codons to bytes, we can do the math. We have 3.2 billion base-pairs or letters, so to get the number of bytes we just divide by 4: 3.2 billion / 4 ≈ 0.8 billion. So the size of the human genome is approximately 800 million bytes, or 763 megabytes.
Now think of this: Each cell in your body has two copies of the genome (except for red blood cells, which have none). And it’s estimated that there are 37.2 trillion cells in the average adult human body. Even if we assume that 17 trillion of these are red blood cells, that means that your body contains 23 trillion gigabytes of DNA. That could also be written as 22 million petabytes, or 21 zettabytes. To put this in perspective, the world’s total effective two-way telecommunications capacity was “only” 65,000 petabytes per-day in 2007. At that rate, to transmit all of the information encoded on all of the DNA in your body, it would take almost a whole year.
A year. And yet all of that information fits inside of you. Despite the fact that the strands of DNA in a single cell would stretch out to about 2 m (6 ft) long if laid end to end, in the nucleus they packed into a whopping diameter of just 6-10 millionths of a meter. That means all of the DNA in your body could fit into a 22 cm (8.5 in) cube. Let’s compare that size to how much room it would take to store the same amount of information on computers. Let’s imagine we put it all onto 1 terabyte hard drives that measure 3 in by 4 in by 0.5 in. They would make a cube about 424 ft (130 m) on a side. A building of that size would have a volume of 76 million cu ft, which would make it the eighth largest building in the world.
Not Just For Information Storage
DNA is obviously an extremely efficient medium of information storage. We’ve looked at it from the angle of just how much your body contains. But we can also look at it from the other angle. A single copy of the entire human genome takes up only 0.8 gigabytes. Compare that with the raw size of OS X Yosemite, which is 5.18 gigabytes. Windows 8 requires about 6–8 gigabytes. In other words, modern computer operating systems take almost 10 times as much code as it takes to create and run your body.
DNA is like a computer program but far, far more advanced than any software ever created.—Bill Gates, founder of Microsoft, in The Road Ahead
The really amazing thing about DNA—and this is what I started out to say a while back—is that it isn’t just a blueprint. Most of it doesn’t encode genes. Not even close. The protein-coding portion takes up less than 2% of your DNA, or about 15 megabytes. So what does the rest of the DNA do? Lot’s of things, actually. It does so much, in fact, that we aren’t even beginning to understand it all. But we do know enough to know that DNA is far more than a blueprint. Is it a computer program? Sort of. It really goes beyond that, but that’s the closest thing to it we’ve ever created.
As a programmer, it is amazing how much DNA is like a programming language. However, it is even more amazing how much DNA goes beyond modern programming.
How can DNA program for so much in such little space? We can’t yet fully answer that question, but we’re starting to find clues. One is that DNA isn’t just one programming language. It is several, all at once. The same DNA strand can code for several different codes, in both directions. I can’t imagine trying to write code that has to do one thing when read forwards and another when read backwards. Most of our languages couldn’t possibly do that, because of their syntax. They are inherently one-way.
Of course, some languages are simpler (like BASIC), and could potentially work forwards and backwards. These languages are also far less human-readable. They are already hard for us to grok as it is, so how in the world would we ever be able to write meaningful two-way code like that? It might seem like it would be easy to do, if we just wrote the one-way code and used computer algorithms to compress it into two-way code. But that’s far easier said than done.
The Modular Genome
Among programming best practices is that of writing modular code. Instead of creating one huge, garbled, interconnected whole, a project can be split into discrete parts that are interoperable.
While I was contemplating writing this post, I happened to come across an article that revealed that some genomes are like this. Actually, all genomes are modular, in the sense that they are made up of discrete genes. But what has been discovered in this case is something different. The DNA isn’t just modular, it is actually split into discrete packages.
The genome of the unicellular ciliate Stylonychia lemnae is really astounding. These creatures actually maintain two copies of their genome in separate nuclei. In one nucleus, called the micronucleus, all of the DNA is stored in a single chromosome. In the other nucleus the DNA is split into thousands of different chromosomes. More than 16 thousand, to be exact. This type of nucleus is much larger than the other, and is called the macronucleus.
The moment I read this, I thought of packagist.org. Thousands of different discrete modules maintained in a single repository. Actually though, it is much more like the plugin repository on WordPress.org, which isn’t just a listing directory, but actually holds all of the code for the 37,000+ plugins in a single SVN repository.
The fascinating thing is that the macronulceus is about 10 times larger than the micronucleus. In effect, this means that the copy of the genome which is used in genetic transmission is kept under 10x compression. 10x! It is amazing that the genome can be compressed this much, and yet still be usable for genetic recombination.
Languages like PHP get compiled into machine code. Some compilers have features that modify the compiled code in various ways to try to improve its performance. This is called compile-time optimization. It’s usually not trivial to do this, because the compiler is risking the possibility of introducing a bug instead of an optimization. It can also mean compilation itself is much less performant, because the compiler has to run sophisticated algorithms over the code.
In the genome, we might think of the transcription of DNA to RNA as compilation. It’s been known for some time that the nucleus sometimes makes modifications to the RNA after transcription. That’s kind of like compile-time optimization. But in fact, it is much more than that. Sometimes the changes are very simple, and affect just a single base. It’s been recently discovered that this type of RNA editing may be very common. But it has also been known for some time that much more complex forms of RNA editing occur as well. This is called alternative splicing, and it involves taking a gene and splitting it into its modular components. These are then rearranged from their usual configuration, with some being doubled or removed. Then they might be combined with pieces of a completely different gene.
This goes beyond our conventional compile-time optimizations. It’d be like compiling two different components of a program, breaking them down into smaller pieces, and rearranging them to create something entirely new.
As a programmer, all of this is fascinating. I can sit here and write computer programs because of the trillions of programs being run inside of my body’s cells. This naturally leads us to a question: where did those programs come from? Who wrote them?
You might answer, “I don’t know.” But a staunch evolutionist will tell you that is the wrong answer. (Unless you catch him off guard.) They will tell you no-one wrote the program. As a programmer, that’s unbelievable. As a programmer, I know that programs don’t just happen, they take intelligence. And just being “smart” isn’t enough: you have to have skill too, you have to know the language. Even with high intelligence and superb skill, how often do we get it right the first time? How often do we have to do lot’s of testing to make sure the thing really works?
Yet evolutionists would have us believe that the unimaginable complexity of the genome happened by accident, that a programming language just created itself, and that, over time, a program was shaped through typos in the code.
Of course, as a programmer, I know that is ludicrous. One typo or mistake can easily kill a program. Even if a typo isn’t syntactically invalid, it can still cause the program to stop working properly. And even if that doesn’t happen, it’s still highly probable that a small bug has been introduced by it—and those small bugs are the real killers. You can argue that natural selection will, in effect, “weed out” those really bad bugs. And that’s true (though the reproduction rate isn’t high enough to sustain that level of mutation for millions of years). But you can’t say that about the small bugs. They’re little changes that don’t really seem to have much effect—most of the time. Instead, they’ll build up in the population until it is driven to the point of extinction.
Just imagine a program you’ve written being eroded this way over time. Before long, it would cease to do anything useful at all.
As a programmer, it is obvious: someone programmed me. And not just anyone either. Someone who has unbelievable intelligence, skill, and artistry. Someone who can build something infinitely more complex than Microsoft Windows, using less code, and even have that thing reproduce itself. Do you know anyone like that? It clearly wasn’t one of us. It clearly wasn’t any other form of biological life either (from here or elsewhere), because all life is based on programs. All life requires a Programmer.
As one living programmer, let me ask you: have you met the Programmer of all life? Have you met the living Programmer?