In a previous post (this one) I described how DNA is structured and how is organized in the cell. I mentioned that some chromosome regions are accessible and others closed and that this situation can quickly change in response to different stimuli. I also mentioned that DNA can form broad loops thanks to some specific factors (cohesin and CTCF) and this contributes to isolation of DNA domains within the same chromosome.

This said, let’s take a step further. What goes on onto the DNA? Why is it said that DNA is the brain of the cell? Because DNA stores all the necessary information of life in genes and related regulatory regions. This is a big step, isn’t it? Let’s see how it works.

What is a gene?

The official definition is “a coding sequence of DNA”. To use other words, I will borrow a paragraph from another post, in which I wrote: “The genome consists of a coding part and a non-coding part, which alternate in all the chromosomes. The coding part consists of genes, while the non-coding part includes regulatory regions and others to be defined. Genes (approximately 20,000 in the human genome) code for proteins and each gene encodes for a protein. Proteins are the effectors of all functions in the cell. To make an example, we can say that each gene represents a command in an instruction booklet, and proteins are the physical actions of the person who executes the commands. Some proteins are constantly present in the cell (like proteins that give its “shape”), while others are produced only when necessary (like insulin that is produced when glucose level in the blood increases), and then they are quickly degraded. Furthermore, not all 20,000 genes are active, which means that we do not have all the 20,000 kinds of proteins expressed in each cell. Some genes are universal and their proteins are omnipresent, while others are specific for the cell type (insulin is produced only by the beta cells in pancreas). It is difficult to estimate, but approximately 10 000 genes are expressed on average in a cell.”

When a protein is needed in the cell, the related gene is copied by a machinery called RNA polymerase into a “disposable” version of it, called messenger RNA (mRNA). This process is called transcription. The mRNA works as template to be translated into the desired protein. If you see the DNA as a big book, the mRNA is the copy of a single page. The reason why a protein is translated from the mRNA and not directly from the DNA is mainly to avoid overcrowding of the DNA and to prevent any damage to it. The use of mRNA presents also other advantages: more copies can be used at the same time, increasing the protein translation rate, and when the protein is not needed anymore, mRNA can be rapidly degraded and recycled into other mRNAs (very ecologic!).

Now let’s move on to a more practical issue. How is a gene recognized?

To copy a gene into mRNA, the RNA polymerase needs to know exactly the start and the end of the gene. As you may remember, DNA is a sequence of 4 different bricks called nucleotides, and the starting and ending points of a gene are determined by specific sequences of nucleotides. Therefore, the RNA polymerase knows exactly when to start copying and when to stop. Upstream the gene, there is a regulatory sequence called promoter, which is responsible of RNA polymerase machinery assembly with its several co-factors. Promoters can be longer or shorter, and are characterized by specific sequences that determine which co-factors are recruited onto the DNA. The sequence recognized by a co-factor is called “motif”. Different promoters are composed by different motifs, depending on which co-factor is needed for transcription. In a cell it is possible to find hundreds of factors involved in mRNA production. Other motifs are recurrent, but may differ in one or more nucleotides. These motifs recruit the same factor, but with different strength, depending on the affinity between the co-factor and the motif’s sequence.

Co-factors recruited by “common” motifs are required for basic transcription, while the “specific” ones respond specifically to stimuli that supervise transcription, allowing, implementing or halting it.

Other regulatory regions are called enhancers and insulators. They interact with promoters, although they are located away from it, but still in the same DNA domain (remember the loops?). Interaction is allowed by small loops that bring them in close proximity. The main factor that allows sub-looping between enhancers and promoters is called YY1 (the name comes from the Tao symbol Yin-Yang) and it implements transcription of genes whose promoter contacts enhancer. Instead, other factors promote insulator-promoter interaction when a gene needs to be silenced. This silencing is achieved by packing the DNA region that contains the gene to make it inaccessible to RNA polymerase machinery, therefore preventing its transcription.

In the previous post (part 1) I showed a photo and I promised to explain in a next post what the arrows and red squares represent.


This is the picture, and after the explanation I provided in this post, I guess it becomes easy to understand that the arrows are sites where transcription starts (at the beginning of a gene) and red squares are enhancers and other regulatory elements that interact with promoters. You can see at the bottom-right of the picture a cluster of three red squares. There are some complex regions in the genome, where regulatory elements can control more than one gene, forming the so-called super-enhancers.

I hope you enjoyed this second part of our journey onto the DNA!