Principles of Biochemistry/Cell Metabolism II: RNA transcription

Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes. During transcription, a DNA sequence is read by RNA polymerase, which produces a complementary, antiparallel RNA strand. As opposed to DNA replication, transcription results in an RNA complement that includes uracil (U) in all instances where thymine (T) would have occurred in a DNA complement. Transcription can be explained easily in 4 or 5 simple steps, each moving like a wave along the DNA. RNA polymerase unwinds/"unzips" the DNA by breaking the hydrogen bonds between complimentary nucleotides. RNA nucleotides are paired with complementary DNA bases. RNA sugar-phosphate backbone forms with assistance from RNA polymerase. Hydrogen bonds of the untwisted RNA+DNA helix break, freeing the newly synthesized RNA strand. If the cell has a nucleus, the RNA is further processed and then moves through the small nuclear pores to the cytoplasm. Transcription is the first step leading to gene expression. The stretch of DNA transcribed into an RNA molecule is called a transcription unit and encodes at least one gene. If the gene transcribed encodes a protein, the result of transcription is messenger RNA (mRNA), which will then be used to create that protein via the process of translation. Alternatively, the transcribed gene may encode for either ribosomal RNA (rRNA) or transfer RNA (tRNA), other components of the protein-assembly process, or other ribozymes. A DNA transcription unit encoding for a protein contains not only the sequence that will eventually be directly translated into the protein (the coding sequence) but also regulatory sequences that direct and regulate the synthesis of that protein. The regulatory sequence before (upstream or above from) the coding sequence is called the five prime untranslated region (5'UTR), and the sequence following (downstream from) the coding sequence is called the three prime untranslated region (3'UTR). Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication. As in DNA replication, DNA is read from 3' → 5' during transcription. Meanwhile, the complementary RNA is created from the 5' → 3' direction. This means its 5' end is created first in base pairing. Although DNA is arranged as two antiparallel strands in a double helix, only one of the two DNA strands, called the template strand, is used for transcription. This is because RNA is only single-stranded, as opposed to double-stranded DNA. The other DNA strand is called the coding strand, because its sequence is the same as the newly created RNA transcript (except for the substitution of uracil for thymine). The use of only the 3' → 5' strand eliminates the need for the Okazaki fragments seen in DNA replication. Transcription is divided into 5 stages: pre-initiation, initiation, promoter clearance, elongation and termination.

Initiation of transcription
Transcription initiation is more complex in eukaryotes. Eukaryotic RNA polymerase does not directly recognize the core promoter sequences. Instead, a collection of proteins called transcription factors mediate the binding of RNA polymerase and the initiation of transcription. Only after certain transcription factors are attached to the promoter does the RNA polymerase bind to it. The completed assembly of transcription factors and RNA polymerase bind to the promoter, forming a transcription initiation complex. Transcription in the archaea domain is similar to transcription in eukaryotes.

In bacteria, transcription begins with the binding of RNA polymerase to the promoter in DNA. RNA polymerase is a core enzyme consisting of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. At the start of initiation, the core enzyme is associated with a sigma factor that aids in finding the appropriate -35 and -10 base pairs downstream of promoter sequences.

What is sigma factor?

A sigma factor (σ factor) is a prokaryotic transcription initiation factor that enables specific binding of RNA polymerase to gene promoters. Different sigma factors are activated in response to different environmental conditions. Every molecule of RNA polymerase contains exactly one sigma factor subunit, which in the model bacterium Escherichia coli is one of those listed below. E. coli has seven sigma factors; the number of sigma factors varies between bacterial species. Sigma factors are distinguished by their characteristic molecular weights. For example, σ70 refers to the sigma factor with a molecular weight of 70 kDa.

Transcription Factors
Transcription factors are essential for the regulation of gene expression and are, as a consequence, found in all living organisms. The number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene. There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors. Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins. Furthermore, genes are often flanked by several binding sites for distinct transcription factors, and efficient expression of each of these genes requires the cooperative action of several different transcription factors (see, for example, hepatocyte nuclear factors). Hence, the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.

In molecular biology, a transcription factor (sometimes called a sequence-specific DNA-binding factor) is a protein that binds to specific DNA sequences, thereby controlling the movement (or transcription) of genetic information from DNA to mRNA. Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes.

General transcription factors or GTFs are intimately involved in the process of gene regulation, and most are required for life. TATA binding protein, (TBP) is a GTF that binds to the TATAA box (T=Thymine, A=Adenine) the motif of nucleic acids that is directly upstream from the coding region in all genes. TBP is responsible for the recruitment of the RNA Pol II holoenzyme, the final event in transcription initiation. These proteins are ubiquitous and interact with the core promoter region of DNA, which contains the transcription start site(s) of all class II genes. Not all GTFs play a role in transcriptional initiation; some are required for the second general step in transcription, elongation. For example, members of the FACT complex (Spt16/Pob3 in S. cerevisiae, SUPT16H/SSRP1 in humans) facilitate the rapid movement of RNA Pol II over the encoding region of genes. This is accomplished by moving the histone octamer out of the way of an active polymerase and thereby decondensing the chromatin.

Transcription factors are modular in structure and contain the following domains:

DNA-binding domain (DBD), which attach to specific sequences of DNA enhancer or Promoter: Necessary component for all vectors: used to drive transcription of the vector's transgene promoter sequences) adjacent to regulated genes.  DNA sequences that bind transcription factors are often referred to as response elements.

Trans-activating domain (TAD), which contain binding sites for other proteins such as [transcription coregulators. These binding sites are frequently referred to as activation functions (AFs).

An optional signal sensing domain (SSD) (e.g., a ligand binding domain), which senses external signals and, in response, transmit these signals to the rest of the transcription complex, resulting in up- or down-regulation of gene expression. Also, the DBD and signal-sensing domains may reside on separate proteins that associate within the transcription complex to regulate gene expression.

Elongation of RNA


After the first bond is synthesized, the RNA polymerase must clear the promoter. During this time there is a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive initiation and is common for both eukaryotes and prokaryotes. Abortive initiation continues to occur until the σ factor rearranges, resulting in the transcription elongation complex (which gives a 35 bp moving footprint). The σ factor is released before 80 nucleotides of mRNA are synthesized.Once the transcript reaches approximately 23 nucleotides, it no longer slips and elongation can occur. This, like most of the remainder of transcription, is an energy-dependent process, consuming adenosine triphosphate (ATP). One strand of the DNA, the template strand (or noncoding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy. Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand and newly-formed RNA can also be used as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils, and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one less oxygen atom) in its sugar-phosphate backbone). Unlike DNA replication, mRNA transcription can involve multiple RNA polymerases on a single DNA template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA molecules can be rapidly produced from a single copy of a gene. Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure.

Transcription termination


Bacteria use two different strategies for transcription termination. In Rho-independent transcription termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C-rich hairpin loop followed by a run of Us. When the hairpin forms, the mechanical stress breaks the weak rU-dA bonds, now filling the DNA-RNA hybrid. This pulls the poly-U transcript out of the active site of the RNA polymerase, in effect, terminating transcription. In the "Rho-dependent" type of termination, a protein factor called "Rho" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex. Transcription termination in eukaryotes is less understood but involves cleavage of the new transcript followed by template-independent addition of As at its new 3' end, in a process called polyadenylation.

Rho-dependent termination
A Rho factor acts on an RNA substrate. Rho's key function is its helicase activity, for which energy is provided by an RNA-dependent ATP hydrolysis. The initial binding site for Rho is an extended (~70 nucleotides, sometimes 80-100 nucleotides) single-stranded region, rich in cytosine and poor in guanine, called the rho utilization site or rut, in the RNA being synthesised, upstream of the actual terminator sequence. Several rho binding sequences have been discovered. No consensus is found among these, but the different sequences each seem specific, as small mutations in the sequence disrupts its function. Rho binds to RNA and then uses its ATPase activity to provide the energy to translocate along the RNA until it reaches the RNA-DNA helical region, where it unwinds the hybrid duplex structure. RNA polymerase pauses at the termination sequence, which is due to the fact that there is a specific site around 100nt away from the Rho binding site called the Rho-sensitive pause site. So, even though the RNA polymerase is about 40nt per second faster than Rho, it does not pose a problem for the Rho termination mechanism as the RNA polymerase allows Rho factor to catch up.

Rho-independent termination
Rho-independent termination (also known as Intrinsic termination) is a mechanism in both eukaryotes and prokaryotes that causes mRNA transcription to be stopped. In this mechanism, the mRNA contains a sequence that can base pair with itself to form a stem-loop structure 7-20 base pairs in length that is also rich in cytosine-guanine base pairs. These bases form three hydrogen bonds between each other and are therefore particularly strong. Following the stem-loop structure is a chain of uracil residues. The bonds between uracil and adenine are very weak. A protein bound to RNA polymerase (nusA) binds to the stem-loop structure tightly enough to cause the polymerase to temporarily stall. This pausing of the polymerase coincides with transcription of the poly-uracil sequence. The weak Adenine-Uracil bonds destabilize the RNA-DNA duplex, causing it to unwind and dissociate from the RNA polymerase. Stem-loop structures that are not followed by a poly-Uracil sequence cause the RNA polymerase to pause, but it will typically continue transcription after a brief time because the duplex is too stable to unwind far enough to cause termination. Rho-independent transcription termination is a frequent mechanism underlying the activity of cis-acting RNA regulatory elements, such as riboswitches.