An Introduction to Molecular Biology/Transcription of RNA and its modification

The "life cycle" of an mRNA in a eukaryotic cell. RNA is transcribed in the nucleus; processed, it is transported to the cytoplasm and translated by the ribosome. At the end of its life, the mRNA is degraded.

In transcription, the codons of a gene are copied into messenger RNA by RNA polymerase. In simple word or formation of RNA from DNA is known as transcription. This RNA copy is then decoded by a ribosome that reads the RNA sequence by base-pairing the messenger RNA to transfer RNA, which carries amino acids. Since there are 4 bases in 3-letter combinations, there are 64 possible codons (43 combinations). These encode the twenty standard amino acids, giving most amino acids more than one possible codon. There are also three 'stop' or 'nonsense' codons signifying the end of the coding region; these are the TAA, TGA and TAG codons.

Transcription
Transcription can be explained easily in 4 or 5 simple steps, each moving like a wave along the DNA.

Unwinding of DNA /"unzips" as the Hydrogen Bonds Break.

The free nucleotides of the RNA, pair with complementary DNA bases.

RNA sugar-phosphate backbone forms. (Aided by RNA Polymerase.)

Hydrogen bonds of the untwisted RNA+DNA "ladder" break, freeing the new RNA.

If the cell has a nucleus, the RNA is further processed and then moves through the small nuclear pores to the cytoplasm. The major steps of transcription process are:


 * 1) Formation of Pre-initiation complex
 * 2) Initiation of transcription
 * 3) Promoter clearance
 * 4) Elongation of RNA
 * 5) Termination

Formation of Pre-initiation complex
In eukaryotes, RNA polymerase, and therefore the initiation of transcription, requires the presence of a core promoter sequence in the DNA. Promoters are regions of DNA that promote transcription and, in eukaryotes, are found at -30, -75, and -90 base pairs upstream from the start site of transcription. Core promoters are sequences within the promoter that are essential for transcription initiation. RNA polymerase is able to bind to core promoters in the presence of various specific transcription factors.

The most common type of core promoter in eukaryotes is a short DNA sequence known as a TATA box, found 25-30 base pairs upstream from the start site of transcription. The TATA box, as a core promoter, is the binding site for a transcription factor known as TATA-binding protein (TBP), which is itself a subunit of another transcription factor, called Transcription Factor II D (TFIID). After TFIID binds to the TATA box via the TBP, five more transcription factors and RNA polymerase combine around the TATA box in a series of stages to form a preinitiation complex. One transcription factor, DNA helicase, has helicase activity and so is involved in the separating of opposing strands of double-stranded DNA to provide access to a single-stranded DNA template. However, only a low, or basal, rate of transcription is driven by the preinitiation complex alone. Other proteins known as activators and repressors, along with any associated coactivators or corepressors, are responsible for modulating transcription rate.

Thus, preinitiation complex contains:
 * Core Promoter Sequence,
 * Transcription Factors,
 * DNA Helicase,
 * RNA Polymerase,
 * Activators and Repressors.

The transcription preinitiation in archaea is, in essence, homologous to that of eukaryotes, but is much less complex. The archaeal preinitiation complex assembles at a TATA-box binding site; however, in archaea, this complex is composed of only RNA polymerase II, TBP, and TFB (the archaeal homologue of eukaryotic transcription factor II B (TFIIB)). Typically the PIC is made up of six general transcription factors: TFIIA (GTF2A1, GTF2A2), TFIIB (GTF2B), B-TFIID (BTAF1, TBP), TFIID (BTAF1, BTF3, BTF3L4, EDF1, TAF1-15, 16 total), TFIIE, TFIIF, and TFIIH. Also, at some point during its assembly it is joined with RNA polymerase II and the remaining components of the holoenzyme.

The formation of the transcription preinitiation complex (PIC) is analogous to the mechanism seen in bacterial initiation. In bacteria, the sigma factor recognizes and binds to the promoter sequence. In eukaryotes, the transcription factors perform this role.
 * 1) The TATA binding protein (TBP, a subunit of TFIID), TBPL1, or TBPL2 can bind the promoter or TATA box. Most genes lack a TATA box and use an initiator element (INR) or downstream core promoter instead. Nevertheless, TBP is always involved and is forced to bind without sequence specificity. TAFs from TFIID can also be involved when the TATA box is absent. A TFIID TAF will bind sequence specifically, and force the TBP to bind non-sequence specifically, bringing the remaining portions of TFIID to the promoter.
 * 2) TFIIA interacts with the TBP subunit of TFIID and aids in the binding of TBP to TATA-box containing promoter DNA. Although TFIIA does not recognize DNA itself, its interactions with TBP allow it to stabilize and facilitate formation of the PIC.
 * 3) The N-terminal domain of TFIIB brings the DNA into proper position for entry into the active site of RNA polymerase II. TFIIB binds partially sequence specifically, with some preference for BRE. The TFIID-TFIIA-TFIIB (DAB)-promoter complex subsequently recruits RNA polymerase II and TFIIF.
 * 4) TFIIF (two subunits, RAP30 and RAP74, showing some similarity to bacterial sigma factors) and Pol II enter the complex together. TFIIF helps to speed up the polymerization process.
 * 5) TFIIE joins the growing complex and recruits TFIIH. TFIIE may be involved in DNA melting at the promoter: it contains a zinc ribbon motif that can bind single stranded DNA. TFIIE helps to open and close the Pol II’s ‘Jaw’ like structure, which enables movement down the DNA strand.
 * 6) DNA may be wrapped one complete turn around the preinitiation complex and it is TFIIF that helps keep this tight wrapping. In the process, the torsional strain on the DNA may aid in DNA melting at the promoter, forming the transcription bubble.
 * 7) TFIIH and TFIIJ enter the complex together. TFIIH is a large protein complex that contains among others the CDK7/cyclin H kinase complex and a DNA helicase. TFIIH has three functions: it binds specifically to the template strand to ensure that the correct strand of DNA is transcribed and melts or unwinds the DNA (ATP dependently) to separate the two strands using its helicase activity. It has a kinase activity that phosphorylates the C-terminal domain (CTD) of Pol II at the amino acid serine. This switches the RNA polymerase to start producing RNA. Finally it is essential for Nucleotide Excision Repair (NER) of damaged DNA. TFIIH and TFIIE strongly interact with one another. TFIIE affects TFIIH’s catalytic activity. Without TFIIE, TFIIH will not unwind the promoter.
 * 8) TFIIH helps create the transcription bubble and may be required for transcription if the DNA template is not already denatured or if it is supercoiled.
 * 9) Mediator then encases all the transcription factors and Pol II. It interacts with enhancers, areas very far away (upstream or downstream) that help regulate transcription.

TATA-binding protein and Transcription factor II D
TBP is a subunit of the eukaryotic transcription factor TFIID. TFIID is the first protein to bind to DNA during the formation of the pre-initiation transcription complex of RNA polymerase II (RNA Pol II). Binding of TFIID to the TATA box in the promoter region of the gene initiates the recruitment of other factors required for RNA Pol II to begin transcription. Some of the other recruited transcription factors include TFIIA, TFIIB, and TFIIF. Each of these transcription factors is formed from the interaction of many protein subunits, indicating that transcription is a heavily regulated process.TBP is also a necessary component of RNA polymerase I and RNA polymerase III, and is, it is thought, the only common subunit required by all three of the RNA polymerases.

The TATA-binding protein (TBP) is a transcription factor that binds specifically to a DNA sequence called the TATA box. This DNA sequence is found about 35 base pairs upstream of the transcription start site in some eukaryotic gene promoters. TBP, along with a variety of TBP-associated factors, make up the TFIID, a general transcription factor that in turn makes up part of the RNA polymerase II preinitiation complex. As one of the few proteins in the preinitiation complex that binds DNA in a sequence-specific manner, it helps position RNA polymerase II over the transcription start site of the gene. However, it is estimated that only 10-20% of human promoters have TATA boxes. Therefore, TBP is probably not the only protein involved in positioning RNA polymerase II.

Transcription factor II D (TFIID) is one of several general transcription factors that make up the RNA polymerase II preinitiation complex. Before the start of transcription, the transcription Factor II D (TFIID) complex, consisting of TFIID, TBP, and at least nine other polypeptides, binds to the TATA box in the core promoter of the gene. TFIID is itself composed of several subunits called TBP-associated factors (TAFs, of which there are 16) and the TATA Binding Protein (TBP). In a test tube, only TBP is necessary for transcription at promoters that contain a TATA box. TAFs, however, add promoter selectivity, especially if there is no TATA box sequence for TBP to bind to. TAFs are included in two distinct complexes, TFIID and B-TFIID. The TFIID complex is composed of TBP and more than eight TAFs. But, the majority of TBP is present in the B-TFIID complex, which is composed of TBP and TAFII170 (BTAF1) in a 1:1 ratio.

RNA polymerase
RNAP was discovered independently by Sam Weiss, Audrey Stevens, and Jerard Hurwitz in 1960.

RNA polymerase (RNAP) is an enzyme that produces RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes as templates, a process called transcription. RNA polymerase enzymes are essential to life and are found in all organisms and many viruses. In chemical terms, RNAP is a nucleotidyl transferase that polymerizes ribonucleotides at the 3' end of an RNA transcript.

RNA polymerase in eukaryotes
Eukaryotes have several types of RNAP, characterized by the type of RNA they synthesize: The following RNA polymerases are very common in eukaryotic cells.

1 RNA polymerase I

2 RNA polymerase II

3 RNA polymerase III

4 RNA polymerase IV

5 RNA polymerase V

RNA polymerase I
RNA polymerase I synthesizes a pre-rRNA 45S, which matures into 28S, 18S and 5.8S rRNAs which will form the major RNA sections of the ribosome. Pol I consists of 8-14 protein subunits (polypeptides). All 12 subunits have identical or related counterparts in PolII and Pol III. rDNA transcription is confined to the nucleolus where several hundreds of copies of rRNA genes are present, arranged as tandem head-to-tail repeats.

Pol I transcribes one large transcript, encoding an rDNA gene over and over again. This gene encodes the 18S, the 5.8S, and the 28S RNA molecules of the ribosome in eukaryotes. The transcripts are cleaved by snoRNA. The 5S ribosomal RNA is transcribed by Pol III. Because of the simplicity of Pol I transcription, it is the fastest-acting polymerase. When rRNA synthesis is stimulated, SL1 (selectivity factor 1) will bind to the promoters of rDNA genes that were previously silent, and recruit a pre-initiation complex to which Pol I will bind and start transcription of rRNA. Changes in rRNA transcription can also occur via changes in the rate of transcription. While the exact mechanism through which Pol I increases its rate of transcription is yet unknown, evidence has shown that rRNA synthesis can increase or decrease without changes in the number of actively transcribed rDNA.

RNA polymerase II
RNA polymerase II synthesizes precursors of mRNAs and most snRNA and microRNAs. This is the most studied type, and due to the high level of control required over transcription a range of transcription factors are required for its binding to promoters. RNA polymerase II is a 550 kDa complex which contains 12 subunits. The eukaryote|eukaryotic core RNA polymerase II was first purified using transcription assays. The purified enzyme has typically 10-12 subunits (12 in humans and yeast) and is incapable of specific promoter recognition. Many subunit-subunit interactions are known. Computer generated image of POLR2A gene colorized subunits: green - RPB1 domain 1, blue - RPB1 domain 2, sand - RPB1 domain 3, light blue - RPB1 domain 4, brown - RPB1 domain 6, and magenta - RPB1 CTD.DNA-directed RNA polymerase II subunit RPB1 is an enzyme that in humans is encoded by the POLR2A gene. RPB1 is the largest subunit of RNA polymerase II. It contains a C-terminus|carboxy terminal domain (CTD) composed of up to 52 heptapeptide repeats (YSPTSPS) that are essential for polymerase activity. In combination with several other polymerase subunits, it forms the DNA binding domain of the polymerase, a groove in which the DNA template is transcribed into RNA. It strongly interacts with RPB8. RPB2 (POLR2B) is the second largest subunit which in combination with at least two other polymerase subunits forms a structure within the polymerase that maintains contact in the active site of the enzyme between the DNA template and the newly synthesized RNA. The third largest subunit RPB3 (POLR2C) exists as a heterodimer with POLR2J forming a core subassembly. RPB3 strongly interacts with RPB1-5, 7, 10-12. RNA polymerase II subunit B4 (RPB4) encoded by the POLR2D gene is the fourth largest subunit and may have a stress protective role. In humans RPB5 is encoded by the POLR2E gene. Two molecules of this subunit are present in each RNA polymerase II. RPB5 strongly interacts with RPB1, RPB3, and RPB6. RPB6 (POLR2F) forms a structure with at least two other subunits that stabilizes the transcribing polymerase on the DNA template. POLR2G encodes RPB7 that may play a role in regulating polymerase function. RPB7 interacts strongly with RPB1 and RPB5. RPB8 (POLR2H) interacts with subunits RPB1-3, 5, and 7. The groove in which the DNA template is transcribed into RNA is composed of RPB9 (POLR2I) and RPB1.RPB10 is the product of gene POLR2L. It interacts with RPB1-3 and 5, and strongly with RPB3. The RPB11 subunit is itself composed of three subunits in humans: POLR2J (RPB11-a), POLR2J2 (RPB11-b), and POLR2J3 (RPB11-c).Also interacting with RPB3 is RPB12 (POLR2K).

C-terminal domain (CTD) of RNA Pol II
RNAPII can exist in two forms: RNAPII0, with a highly phosphorylated CTD, and RNAPIIA, with a nonphosphorylated CTD.The carboxy-terminal domain (CTD) of RNA polymerase II is that portion of the polymerase that is involved in the initiation of transcription, the capping of the RNA transcript, and attachment to the spliceosome for RNA splicing. The CTD typically consists of up to 52 repeats of the sequence Tyr-Ser-Pro-Thr-Ser-Pro-Ser. The carboxy-terminal repeat domain (CTD) is essential for life. Cells containing only RNAPII with none or only up to one-third of its repeats are inviable.

The CTD is an extension appended to the C terminus of RPB1, the largest subunit of RNA polymerase II. It serves as a flexible binding scaffold for numerous nuclear factors, determined by the phosphorylation patterns on the CTD repeats. Each repeat contains an evolutionary conserved and repeated heptapeptide, Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7, which is subjected to reversible phosphorylations during each transcription cycle. This domain is inherently unstructured yet evolutionarily conserved, and in eukaryotes it comprises from 25 to 52 tandem copies of the consensus repeat heptad. As the CTD is frequently not required for general transcription factor (GTF)-mediated initiation and RNA synthesis, it does not form a part of the catalytic essence of RNAPII, but performs other functions.

Phosphorylation occurs principally on Ser2 and Ser5 of the repeats of CTD, although these positions are not equivalent. The phosphorylation state changes as RNAPII progresses through the transcription cycle: The initiating RNAPII is form IIA, and the elongating enzyme is form II0. While RNAPII0 does consist of RNAPs with hyperphosphorylated CTDs, the pattern of phosphorylation on individual CTDs can vary due to differential phosphorylation of Ser2 versus Ser5 residues and/or to differential phosphorylation of repeats along the length of the CTD. The PCTD (phosphoCTD of an RNAPII0) physically links pre-mRNA processing to transcription by tethering processing factors to elongating RNAPII, e.g., 5′-end capping, 3′-end cleavage, and polyadenylation. Ser5 phosphorylation (Ser5PO4) near the 5′ ends of genes depends principally on the kinase activity of TFIIH (Kin28 in yeast; CDK7 in metazoans). The transcription factor TFIIH is a kinase and will hyperphosphorylate the CTD of RNAP, and in doing so, causes the RNAP complex to move away from the initiation site. Subsequent to the action of TFIIH kinase, Ser2 residues are phosphorylated by CTDK-I in yeast (CDK9 kinase in metazoans). Ctk1 (CDK9) acts in compliment to phosphorylation of serine 5 and is, thus, seen in middle to late elongation. CDK8 and cyclin C (CCNC) are components of the RNA polymerase II holoenzyme that phosphorylate the carboxyl-terminal domain (CTD). CDK8 regulates transcription by targeting the CDK7/cyclin H subunits of the general transcription initiation factor IIH (TFIIH), thereby providing a link between the mediator and the basal transcription machinery. The gene CTDP1 encodes a phosphatase that interacts with the carboxy-terminus of transcription initiation factor TFIIF, a transcription factor that regulates elongation as well as initiation by RNA polymerase II.

RNA polymerase III
RNA polymerase III synthesizes tRNAs, rRNA 5S and other small RNAs found in the nucleus and cytosol. RNA polymerase III (also called Pol III) transcribes DNA to synthesize ribosomal 5S rRNA, tRNA and other small RNAs. The genes transcribed by RNA Pol III fall in the category of "housekeeping" genes whose expression is required in all cell types and most environmental conditions. Therefore the regulation of Pol III transcription is primarily tied to the regulation of cell growth and the cell cycle, thus requiring fewer regulatory proteins than RNA polymerase II. In the process of transcription (by any polymerase) there are three main stages:Initiation; requiring construction of the RNA polymerase complex on the gene's promoter. Elongation; the writing of the RNA transcript.Termination; the finishing of RNA writing and disassembly of the RNA polymerase complex.

RNA polymerase IV
RNA polymerase IV synthesizes siRNA in plants. Polymerase IV is specific to plants genomes and is required for the synthesis of over 90% of all siRNA. RNA polymerase silences the transposons and repetitive DNA in the siRNA pathway. The siRNA plays a major role in defending the genome against the invading viruses and transposable elements by RNA directed DNA methylation. Polymerase IV and ROS1 demethylase unlocks and recondenses the 5S rDNA chromatin, which is present in seed and used for the development of adult features in plants.Polymerase IV is involved in setting the methylation patterns in the 5S genes during plant maturation. In arabidopsis polymerase IV works with binding protein DCL3 and a RNA polymerase II RDR2 in a silencing pathway which Polymerase IV would produce RNA, which is changed to dsRNA by RDR2 then converted to siRNA by DCL3.


 * RNA polymerase V synthesizes RNAs involved in siRNA-directed heterochromatin formation in plants.

There are other RNA polymerase types in mitochondria and chloroplasts. And there are RNA-dependent RNA polymerases involved in RNA interference.

Initiation of transcription
Transcription initiation is more complex in eukaryotes. Eukaryotic RNA polymerase does not directly recognize the core promoter sequences. Instead, a collection of proteins called transcription factors mediate the binding of RNA polymerase and the initiation of transcription. Only after certain transcription factors are attached to the promoter does the RNA polymerase bind to it. The completed assembly of transcription factors and RNA polymerase bind to the promoter, forming a transcription initiation complex. Transcription in the archaea domain is similar to transcription in eukaryotes.

In bacteria, transcription begins with the binding of RNA polymerase to the promoter in DNA. RNA polymerase is a core enzyme consisting of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. At the start of initiation, the core enzyme is associated with a sigma factor that aids in finding the appropriate -35 and -10 base pairs downstream of promoter sequences.

What is sigma factor?

A sigma factor (σ factor) is a prokaryotic transcription initiation factor that enables specific binding of RNA polymerase to gene promoters. Different sigma factors are activated in response to different environmental conditions. Every molecule of RNA polymerase contains exactly one sigma factor subunit, which in the model bacterium Escherichia coli is one of those listed below. E. coli has seven sigma factors; the number of sigma factors varies between bacterial species. Sigma factors are distinguished by their characteristic molecular weights. For example, σ70 refers to the sigma factor with a molecular weight of 70 kDa.

Transcription Factors
Transcription factors are essential for the regulation of gene expression and are, as a consequence, found in all living organisms. The number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene. There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors. Therefore, approximately 10% of genes in the genome code for transcription factors, which makes this family the single largest family of human proteins. Furthermore, genes are often flanked by several binding sites for distinct transcription factors, and efficient expression of each of these genes requires the cooperative action of several different transcription factors (see, for example, hepatocyte nuclear factors). Hence, the combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development.

In molecular biology, a transcription factor (sometimes called a sequence-specific DNA-binding factor) is a protein that binds to specific DNA sequences, thereby controlling the movement (or transcription) of genetic information from DNA to mRNA. Transcription factors perform this function alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase (the enzyme that performs the transcription of genetic information from DNA to RNA) to specific genes.

General transcription factors or GTFs are intimately involved in the process of gene regulation, and most are required for life. TATA binding protein, (TBP) is a GTF that binds to the TATAA box (T=Thymine, A=Adenine) the motif of nucleic acids that is directly upstream from the coding region in all genes. TBP is responsible for the recruitment of the RNA Pol II holoenzyme, the final event in transcription initiation. These proteins are ubiquitous and interact with the core promoter region of DNA, which contains the transcription start site(s) of all class II genes. Not all GTFs play a role in transcriptional initiation; some are required for the second general step in transcription, elongation. For example, members of the FACT complex (Spt16/Pob3 in S. cerevisiae, SUPT16H/SSRP1 in humans) facilitate the rapid movement of RNA Pol II over the encoding region of genes. This is accomplished by moving the histone octamer out of the way of an active polymerase and thereby decondensing the chromatin.

Transcription factors are modular in structure and contain the following domains:

DNA-binding domain (DBD), which attach to specific sequences of DNA enhancer or Promoter: Necessary component for all vectors: used to drive transcription of the vector's transgene promoter sequences) adjacent to regulated genes.  DNA sequences that bind transcription factors are often referred to as response elements.

Trans-activating domain (TAD), which contain binding sites for other proteins such as [transcription coregulators. These binding sites are frequently referred to as activation functions (AFs).

An optional signal sensing domain (SSD) (e.g., a ligand binding domain), which senses external signals and, in response, transmit these signals to the rest of the transcription complex, resulting in up- or down-regulation of gene expression. Also, the DBD and signal-sensing domains may reside on separate proteins that associate within the transcription complex to regulate gene expression.

Elongation of RNA


After the first bond is synthesized, the RNA polymerase must clear the promoter. During this time there is a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive initiation and is common for both eukaryotes and prokaryotes. Abortive initiation continues to occur until the σ factor rearranges, resulting in the transcription elongation complex (which gives a 35 bp moving footprint). The σ factor is released before 80 nucleotides of mRNA are synthesized.Once the transcript reaches approximately 23 nucleotides, it no longer slips and elongation can occur. This, like most of the remainder of transcription, is an energy-dependent process, consuming adenosine triphosphate (ATP). One strand of the DNA, the template strand (or noncoding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy. Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand and newly-formed RNA can also be used as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils, and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one less oxygen atom) in its sugar-phosphate backbone). Unlike DNA replication, mRNA transcription can involve multiple RNA polymerases on a single DNA template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA molecules can be rapidly produced from a single copy of a gene. Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure.

Transcription termination


Bacteria use two different strategies for transcription termination. In Rho-independent transcription termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C-rich hairpin loop followed by a run of Us. When the hairpin forms, the mechanical stress breaks the weak rU-dA bonds, now filling the DNA-RNA hybrid. This pulls the poly-U transcript out of the active site of the RNA polymerase, in effect, terminating transcription. In the "Rho-dependent" type of termination, a protein factor called "Rho" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex. Transcription termination in eukaryotes is less understood but involves cleavage of the new transcript followed by template-independent addition of As at its new 3' end, in a process called polyadenylation.

Rho-dependent termination
A Rho factor acts on an RNA substrate. Rho's key function is its helicase activity, for which energy is provided by an RNA-dependent ATP hydrolysis. The initial binding site for Rho is an extended (~70 nucleotides, sometimes 80-100 nucleotides) single-stranded region, rich in cytosine and poor in guanine, called the rho utilization site or rut, in the RNA being synthesised, upstream of the actual terminator sequence. Several rho binding sequences have been discovered. No consensus is found among these, but the different sequences each seem specific, as small mutations in the sequence disrupts its function. Rho binds to RNA and then uses its ATPase activity to provide the energy to translocate along the RNA until it reaches the RNA-DNA helical region, where it unwinds the hybrid duplex structure. RNA polymerase pauses at the termination sequence, which is due to the fact that there is a specific site around 100nt away from the Rho binding site called the Rho-sensitive pause site. So, even though the RNA polymerase is about 40nt per second faster than Rho, it does not pose a problem for the Rho termination mechanism as the RNA polymerase allows Rho factor to catch up.

Rho-independent termination
Rho-independent termination (also known as Intrinsic termination) is a mechanism in both eukaryotes and prokaryotes that causes mRNA transcription to be stopped. In this mechanism, the mRNA contains a sequence that can base pair with itself to form a stem-loop structure 7-20 base pairs in length that is also rich in cytosine-guanine base pairs. These bases form three hydrogen bonds between each other and are therefore particularly strong. Following the stem-loop structure is a chain of uracil residues. The bonds between uracil and adenine are very weak. A protein bound to RNA polymerase (nusA) binds to the stem-loop structure tightly enough to cause the polymerase to temporarily stall. This pausing of the polymerase coincides with transcription of the poly-uracil sequence. The weak Adenine-Uracil bonds destabilize the RNA-DNA duplex, causing it to unwind and dissociate from the RNA polymerase. Stem-loop structures that are not followed by a poly-Uracil sequence cause the RNA polymerase to pause, but it will typically continue transcription after a brief time because the duplex is too stable to unwind far enough to cause termination. Rho-independent transcription termination is a frequent mechanism underlying the activity of cis-acting RNA regulatory elements, such as riboswitches.

mRNA and its modification
The pre-mRNA molecule undergoes three main modifications. These modifications are 5' capping, 3' polyadenylation, and RNA splicing, which occur in the cell nucleus before the RNA is translated.

Capping of mRNA
The 5' cap looks like the 3' end of an RNA molecule (the 5' carbon of the cap ribose is bonded, and the 3' unbonded). This provides significant resistance to 5' exonucleases. Capping of the pre-mRNA involves the addition of 7-methylguanosine (m7G) to the 5' end. To achieve this, the terminal 5' phosphate requires removal, which is done with the aid of a phosphatase enzyme. The enzyme guanosyl transferase then catalyses the reaction, which produces the diphosphate 5' end. The diphosphate 5' prime end then attacks the α phosphorus atom of a GTP molecule in order to add the guanine residue in a 5'5' triphosphate link. The enzyme (guanine-N7-)-methyltransferase ("cap MTase") transfers a methyl group from S-adenosyl methionine to the guanine ring.This type of cap, with just the (m7G) in position is called a cap 0 structure. The ribose of the adjacent nucleotide may also be methylated to give a cap 1. Methylation of nucleotides downstream of the RNA molecule produce cap 2, cap 3 structures and so on. In these cases the methyl groups are added to the 2' OH groups of the ribose sugar. The cap protects the 5' end of the primary RNA transcript from attack by ribonucleases that have specificity to the 3'5' phosphodiester bonds.

The starting point is the unaltered 5' end of an RNA molecule. This features a final nucleotide followed by three phosphate groups attached to the 5' carbon.

One of the terminal phosphate groups is removed (by RNA terminal phosphatase), leaving two terminal phosphates.

GTP is added to the terminal phosphates (by a guanylyl transferase), losing two phosphate groups (from the GTP) in the process. This results in the 5' to 5' triphosphate linkage.

The 7-nitrogen of guanine is methylated (by a methyl transferase).

Other methyltransferases are optionally used to carry out methylation of 5' proximal nucleotides.

The 5' cap has 4 main functions:

Regulation of nuclear export.

Prevention of degradation by exonucleases.

Promotion of translation (see ribosome and translation).

Promotion of 5' proximal intron excision.

Nuclear export of RNA is regulated by the Cap binding complex (CBC), which binds exclusively to capped RNA. The CBC is then recognized by the nuclear pore complex and exported. Once in the cytoplasm after the pioneer round of translation, the CBC is replaced by the translation factors eIF-4E and eIF-4G. This complex is then recognized by other translation initiation machinery including the ribosome. Cap prevents 5' degradation in two ways. First, degradation of the mRNA by 5' exonucleases is prevented (as mentioned above) by functionally looking like a 3' end. Second, the CBC complex and the eIF-4E/eIF-4G block the access of decapping enzymes to the cap. This increases the half-life of the mRNA, essential in eukaryotes as the export process takes significant time. Decapping of an mRNA is catalyzed by the decapping complex made up of at least Dcp1 and Dcp2, which must compete with eIF-4E to bind the cap. Thus the 5' cap is a marker of an actively translating mRNA and is used by cells to regulate mRNA half-lives in response to new stimuli. Undesirable mRNAs are sent to P-bodies for temporary storage or decapping, the details of which are still being resolved. The mechanism of 5' proximal intron excision promotion is not well understood, but the 5' cap appears to loop around and interact with the spliceosome in the splicing process, promoting intron excision.

Polyadenylation of mRNA
The pre-mRNA processing at the 3' end of the RNA molecule involves cleavage of its 3' end and then the addition of about 200 adenine residues to form a poly(A) tail. The cleavage and adenylation reactions occur if a polyadenylation signal sequence (5'- AAUAAA-3') is located near the 3' end of the pre-mRNA molecule, which is followed by another sequence, which is usually (5'-CA-3'). The second signal is the site of cleavage. A GU-rich sequence is also usually present further downstream on the pre-mRNA molecule. After the synthesis of the sequence elements, two multisubunit proteins called cleavage and polyadenylation specificity factor (CPSF) and cleavage stimulation factor (CStF) are transferred from RNA Polymerase II to the RNA molecule. The two factors bind to the sequence elements. A protein complex forms that contains additional cleavage factors and the enzyme Polyadenylate Polymerase (PAP). This complex cleaves the RNA between the polyadenylation sequence and the GU-rich sequence at the cleavage site marked by the (5'-CA-3') sequences. Poly(A) polymerase then adds about 200 adenine units to the new 3' end of the RNA molecule using ATP as a precursor. As the poly(A) tails is synthesised, it binds multiple copies of poly(A) binding protein, which protects the 3'end from ribonuclease digestion.

Poly(A)-binding protein or "PABP"

Poly(A)-binding protein (or "PABP") is a RNA-binding protein which binds to the poly(A) tail of mRNA. The poly(A) tail is located on the 3' end of mRNA. The nuclear isoforms selectively binds to around 50 nucleotides and stimulates the activity of Polyadenylate polymerase.The expression of mammalian Poly(A)-binding protein is regulated at the translational level by a feed-back mechanism: the mRNA encoding PABP contains in its 5' UTR an A-rich sequence which binds Poly(A)-binding protein. This leads to repression of translation The cytosolic isoform of eukaryotes Poly(A) binding protein binds to the initiation factor eIF-4G via its C-terminal domain. EIF-4G is bound to eIF-4E, another initiation factor bound to the 5' cap on the 5' end of mRNA. This binding forms the characteristic loop structure of eukaryotic protein synthesis. Poly(A)-binding protein interacting proteins in the cytosol compete for the eIF-4G binding sites. Poly(A)-binding protein has also been shown to interact with a termination factor (eRF3)

Alternative polyadenylation

Many protein-coding genes have more than one polyadenylation site, so a gene can code for several mRNAs that differ in their 3' end. Since alternative polyadenylation changes the length of the 3' untranslated region, it can change which binding sites for microRNAs the 3' untranslated region contains. MicroRNAs tend to repress translation and promote degradation of the mRNAs they bind to, although there are examples of microRNAs that stabilise transcripts. Alternative polyadenylation can also shorten the coding region, thus making the mRNA code for a different protein, but this is much less common than just shortening the 3' untranslated region. The choice of poly(A) site depends on the expression of the proteins that take part in polyadenylation. For example, the expression of CstF-64, a subunit of cleavage stimulatory factor (CstF), increases in macrophages in response to lipopolysaccharides (a group of bacterial compounds that trigger an immune response). This results in the selection of weak poly(A) sites and thus shorter transcripts. This removes regulatory elements in the 3' untranslated regions of mRNAs for defense-related products like lysozyme and TNF-α. These mRNAs then have longer half-lives and produce more of these proteins.RNA-binding proteins other than those in the polyadenylation machinery can also affect whether a polyadenyation site is used, as can DNA methylation near the polyadenylation signal.

Splicing of RNA


Splicing is the process by which pre-mRNA is modified to remove certain stretches of non-coding sequences called introns; the stretches that remain include protein-coding sequences and are called exons. Sometimes pre-mRNA messages may be spliced in several different ways, allowing a single gene to encode multiple proteins. This process is called alternative splicing. Splicing is usually performed by an RNA-protein complex called the spliceosome, but some RNA molecules are also capable of catalyzing their own splicing Introns One plausible hypothesis for the observed distribution of introns is that ancient predecessors of modern-day eukaryotes contained large numbers of introns, and that selective pressure to control genome size in fast-growing species may have led to the elimination of many ancient introns. Complicating this issue is that finding that many introns are themselves mobile genetic elements, and can be inserted into and deleted from genes. Shortly after the discovery of introns, investigators offered competing theories that offer alternative scenarios for the origin and early evolution of spliceosomal introns. Other classes of introns such as self-splicing and tRNA introns are not subject to much debate, but see for the former. These are popularly referred to as the Introns-Early (IE) and the Introns-Late (IL) views. The IE model, championed by Walter Gilbert, proposes that introns are extremely old and numerously present in the earliest theoretical ancestors of prokaryotes and eukaryotes, the progenotes. In this model, introns were subsequently lost from prokaryotic organisms, allowing them to attain growth efficiency. A central prediction of this theory is that the early introns were mediators that facilitated the recombination of exons that represented the protein domains. This model cannot account for some observed positional variation of introns shared among related genes.

The IL model proposes that introns were recently inserted into originally intron-less contiguous genes after the divergence of eukaryotes and prokaryotes. In this model, introns probably originated from transposable elements. This model is based on the observation that the spliceosomal introns are restricted to eukaryotes alone. However, there is considerable debate over the presence of introns in the early prokaryote-eukaryote ancestors and the subsequent intron loss-gain during eukaryotic evolution. The evolution of introns and of the intron-exon structure may be largely independent of the evolution of coding-sequences.

Introns were first discovered in protein-coding genes of adenovirus, but are now known to occur within a wide variety of genes throughout all of the biological kingdoms. The frequency of introns within different genomes can vary widely across the spectrum of biological organisms. For example, introns are extremely common within the nuclear genome of higher vertebrates (e.g. humans and mice), where protein-coding genes almost always contain multiple introns, while introns are rare within the nuclear genes of some eukaryotic microorganisms, for example baker's yeast (Saccharomyces cerevisiae). In contrast, the mitochondrial genomes of vertebrates are entirely devoid of introns, while those of eukaryotic microorganisms may contain many introns.

Spliceosome and its assembly


Each spliceosome is composed of five small nuclear RNA proteins, called snRNPs, (pronounced "snurps") and a range of non-snRNP associated protein factors. The snRNPs that make up the nuclear spliceosome are named U1, U2, U4, U5, and U6, and participate in several RNA-RNA and RNA-protein interactions. The RNA component of the snRNP is rich in uridine (the nucleoside analog of the uracil nucleotide). The canonical assembly of the spliceosome occurs anew on each hnRNA. The hnRNA contains specific sequence elements that are recognized and utilized during spliceosome assembly. These include the 5' end splice, the branch point sequence, the polypyrimidine tract, and the 3' end splice site. The spliceosome catalyzes the removal of introns, and the ligation of the flanking exons. Introns typically have a GU nucleotide sequence at the 5' end splice site, and an AG at the 3' end splice site. The 3' splice site can be further defined by a variable length of polypyrimidines, called the polypyrimidine tract (PPT), which serves the dual function of recruiting factors to the 3' splice site and possibly recruiting factors to the branch point sequence (BPS). The BPS contains the conserved Adenosine required for the first step of splicing. A group of less abundant snRNPs, U11, U12, U4atac, and U6atac, together with U5, are subunits of the so-called minor spliceosome that splices a rare class of pre-mRNA introns, denoted U12-type. These snRNPs form the U12 spliceosome are located in the cytosol. New evidence derived from the first crystal structure of a group II intron suggests that the spliceosome is actually a ribozyme, and that it uses a two–metal ion mechanism for catalysis.

The model for formation of the spliceosome active site involves an ordered, stepwise assembly of discrete snRNP particles on the hnRNA substrate. The first recognition of hnRNAs involves U1 snRNP binding to the 5' end splice site of the hnRNA and other non-snRNP associated factors to form the commitment complex, or early (E) complex in mammals. The commitment complex is an ATP-independent complex that commits the hnRNA to the splicing pathway. U2 snRNP is recruited to the branch region through interactions with the E complex component U2AF (U2 snRNP auxiliary factor) and possibly U1 snRNP. In an ATP-dependent reaction, U2 snRNP becomes tightly associated with the branch point sequence (BPS) to form complex A. A duplex formed between U2 snRNP and the hnRNA branch region bulges out the branch adenosine specifying it as the nucleophile for the first transesterification.

The presence of a pseudouridine residue in U2 snRNA, nearly opposite of the branch site, results in an altered conformation of the RNA-RNA duplex upon the U2 snRNP binding. Specifically, the altered structure of the duplex induced by the pseudouridine places the 2' OH of the bulged adenosine in a favorable position for the first step of splicing. The U4/U5/U6 tri-snRNP is recruited to the assembling spliceosome to form complex B, and following several rearrangements, complex C (the spliceosome) is activated for catalysis. It is unclear how the triple snRNP is recruited to complex A, but this process may be mediated through protein-protein interactions and/or base pairing interactions between U2 snRNA and U6 snRNA.

The U5 snRNP interacts with sequences at the 5' and 3' splice sites via the invariant loop of U5 snRNA and U5 protein components interact with the 3' splice site region.

Upon recruitment of the triple snRNP, several RNA-RNA rearrangements precede the first catalytic step and further rearrangements occur in the catalytically active spliceosome. Several of the RNA-RNA interactions are mutually exclusive; however, it is not known what triggers these interactions, nor the order of these rearrangements. The first rearrangement is probably the displacement of U1 snRNP from the 5' splice site and formation of a U6 snRNA interaction. It is known that U1 snRNP is only weakly associated with fully formed spliceosomes, and U1 snRNP is inhibitory to the formation of a U6-5' splice site interaction on a model of substrate oligonucleotide containing a short 5' exon and 5' splice site. Binding of U2 snRNP to the branch point sequence (BPS) is one example of an RNA-RNA interaction displacing a protein-RNA interaction. Upon recruitment of U2 snRNP, the branch binding protein SF1 in the commitment complex is displaced since the binding site of U2 snRNA and SF1 are mutually exclusive events.

Within the U2 snRNA, there are other mutually exclusive rearrangements that occur between competing conformations. For example, in the active form, stem loop IIa is favored; in the inactive form a mutually exclusive interaction between the loop and a downstream sequence predominates. It is unclear how U4 is displaced from U6 snRNAm, although RNA has been implicated in spliceosome assembly, and may function to unwind U4/U6 and promote the formation of a U2/U6 snRNA interaction. The interactions of U4/U6 stem loops I and II dissociate and the freed stem loop II region of U6 folds on itself to form an intramolecular stem loop and U4 is no longer required in further spliceosome assembly. The freed stem loop I region of U6 base pairs with U2 snRNA forming the U2/U6 helix I. However, the helix I structure is mutually exclusive with the 3' half of an internal 5' stem loop region of U2 snRNA.

The RNA components of snRNPs interact with the intron and may be involved in catalysis. Two types of spliceosomes have been identified (the major and minor) which contain different snRNPs. Major The major spliceosome splices introns containing GU at the 5' splice site and AG at the 3' splice site. It is composed of the U1, U2, U4, U5, and U6 snRNPs and is active in the nucleus. In addition, a number of proteins including U2AF and SF1 are required for the assembly of the spliceosome.

E Complex-U1 binds to the GU sequence at the 5' splice site, along with accessory proteins/enzymes ASF/SF2, U2AF (binds at the Py-AG site), SF1/BBP (BBP=Branch Binding Protein);

A Complex-U2 binds to the branch site and ATP is hydrolyzed;

B1 Complex-U5/U4/U6 trimer binds, and the U5 binds exons at the 5' site, with U6 binding to U2;

B2 Complex-U1 is released, U5 shifts from exon to intron and the U6 binds at the 5' splice site;

C1 Complex-U4 is released, U6/U2 catalyzes transesterification, that make 5'end of introns ligate to the A on intron and from a lariat ,U5 binds exon at 3' splice site, and the 5' site is cleaved, resulting in the formation of the lariat;

C2 Complex-U2/U5/U6 remain bound to the lariat, and the 3' site is cleaved and exons are ligated using ATP hydrolysis. The spliced RNA is released and the lariat debranches.

This type of splicing is termed canonical splicing or termed the lariat pathway, which accounts for more than 99% of splicing. By contrast, when the intronic flanking sequences do not follow the GU-AG rule, noncanonical splicing is said to occur (see "minor spliceosome" below). Minor spliceosome

The minor spliceosome is a ribonucleoprotein complex that catalyses the removal (splicing) of an atypical class of spliceosomal introns (U12-type) from eukaryotic messenger RNAs in plant, insects, vertebrates and some fungi (Rhizopus oryzae). This process is called noncanonical splicing, as opposed to U2-dependent canonical splicing. U12-type introns represent less than 1% of all introns in human cells. However they are found in genes performing essential cellular functions. The minor spliceosome is very similar to the major spliceosome, however it splices out rare introns with different splice site sequences. While the minor and major spliceosomes contain the same U5 snRNP, the minor spliceosome has different, but functionally analogous snRNPs for U1, U2, U4, and U6, which are respectively called U11, U12, U4atac, and U6atac. Like the major spliceosome, it is only found in the nucleus. Trans-splicing Trans-splicing is a form of splicing that joins two exons that are not within the same RNA transcript

Self splicing
Self splicing occurs for rare introns that form a ribozyme, performing the functions of the spliceosome by RNA alone. There are three kinds of self-splicing introns, Group I, Group II and Group III. Group I and II introns perform splicing similar to the spliceosome without requiring any protein. This similarity suggests that Group I and II introns may be evolutionarily related to the spliceosome. Self-splicing may also be very ancient, and may have existed in an RNA world present before protein. Two transesterifications characterize the mechanism in which group I introns are spliced: 3'OH of a free guanine nucleoside (or one located in the intron) or a nucleotide cofactor (GMP, GDP, GTP) attacks phosphate at the 5' splice site. 3'OH of the 5'exon becomes a nucleophile and the second transesterification results in the joining of the two exons. The mechanism in which group II introns are spliced (two transesterification reaction like group I introns) is as follows: The 2'OH of a specific adenosine in the intron attacks the 5' splice site, thereby forming the lariat The 3'OH of the 5' exon triggers the second transesterification at the 3' splice site thereby joining the exons together.

Group I Group I introns are distributed in bacteria, lower eukaryotes and higher plants. However, their occurrence in bacteria seems to be more sporadic than in lower eukaryotes, and they have become prevalent in higher plants. The genes that group I introns interrupt differ significantly: They interrupt rRNA, mRNA and tRNA genes in bacterial genomes, as well as in mitochondrial and chloroplast genomes of lower eukaryotes, but only invade rRNA genes in the nuclear genome of lower eukaryotes. In higher plants, these introns seem to be restricted to a few tRNA and mRNA genes of the chloroplasts and mitochondria. Both intron-early and intron-late theories have found evidences in explaining the origin of group I introns. Some group I introns encode homing endonuclease (HEG), which catalyzes intron mobility. It is proposed that HEGs move the intron from one location to another, from one organism to another and thus account for the wide spreading of the selfish group I introns. No biological role has been identified for group I introns thus far except for splicing of themselves from the precursor to prevent the death of the host that they live by. A small number of group I introns are also found to encode a class of proteins called maturases that facilitate the intron splicing.

Splicing of group I introns is processed by two sequential ester-transfer reactions. The exogenous guanosine or guanosine nucleotide (exoG) first docks onto the active G-binding site located in P7, and its 3'-OH is aligned to attack the phosphodiester bond at the 5' splice site located in P1, resulting in a free 3'-OH group at the upstream exon and the exoG being attached to the 5' end of the intron. Then the terminal G (omega G) of the intron swaps the exoG and occupies the G-binding site to organize the second ester-transfer reaction, the 3'-OH group of the upstream exon in P1 is aligned to attacks the 3' splice site in P10, leading to the ligation of the adjacent upstream and downstream exons and free of the catalytic intron. Two-metal-ion mechanism seen in protein polymerases and phosphatases was proposed to be used by group I and group II introns to process the phosphoryl transfer reactions, which was unambiguously proven by a recently resolved high-resolution structure of the Azoarcus group I intron.

Group II catalytic intron

Group II catalytic introns are found in rRNA, tRNA and mRNA of organelles in fungi, plants and protists, and also in mRNA in bacteria. They are large self-splicing ribozymes and have 6 structural domains (usually designated dI to dVI). This model and alignment represents only domains V and VI. A subset of group II introns also encode essential splicing proteins in intronic ORFs. The length of these introns can therefore be up to 3kb. Splicing occurs in almost identical fashion to nuclear pre-mRNA splicing with two transesterification steps. The 2' hydroxyl of a bulged adenosine in domain VI attacks the 5' splice site, followed by nucleophilic attack on the 3' splice site by the 3' OH of the upstream exon. Protein machinery is required for splicing in vivo, and long range intron-intron and intron-exon interactions are important for splice site positioning. Group II introns are further sub-classified into groups IIA and IIB, which differ in splice site consensus, and the distance of the bulged adenosine in domain VI (the prospective branch point forming the lariat) from the 3' splice site.

Group III introns

Montandon,P. and Stutz,E. (1984) and Hallick,R.B. et al. (1988 and 1989) reported examples of a novel type of introns in Euglena chloroplast. In 1989, David A.Christopher and Richard B.Hallick proposed the title, Group III introns to identify this new class with the following characteristics: Group III introns are much shorter than other self-splicing intron classes, ranging from 95 to 110 nucleotides amongst those known to Christopher and Hallick, and identified in chloroplasts. On the other hand, Christopher and Hallick stated: "By contrast, the smallest Euglena chloroplast group II intron ... is 277 nucleotides." Their conserved sequences proximal to the splicing sites have similarities to those of group II introns, but have fewer conserved positions. They do not map into the conserved secondary structure of group II introns. (Indeed Christopher and Hallick were unable to identify any conserved secondary structure elements among group III introns.) They are usually associated with genes involved in translation and transcription. They are very A+T rich. In 1994, discovery of a group III intron with a length of one order of magnitude longer indicated that length alone is not the determinant of splicing in Group III introns (Copertino DW., Hall ET.) Splicing of group III introns occurs through lariat and circular RNA formation. Similarities between group III and nuclear introns include conserved 5' boundary sequences, lariat formation, lack of internal structure, and ability to use alternate splice boundaries.

RNA-editing


The RNA-editing system seen in the animal may have evolved from mononucleotide deaminases, which have led to larger gene families that include the apobec-1 and adar genes. These genes share close identity with the bacterial deaminases involved in nucleotide metabolism. The adenosine deaminase of E. coli cannot deaminate a nucleoside in the RNA; the enzyme’s reaction pocket is too small to for the RNA strand to bind to. However, this active site is widened by amino acid changes in the corresponding human analog genes, APOBEC-1 and ADAR, allowing deamination. The insertional editing seen in the trypanosome mitochondria has no relation with the nucleoside conversion process. The enzymes involved have been shown in other studies to be recruited and adapted from different sources. But, the specificity of nucleotide insertion via the interaction between the gRNA and mRNA are similar to the tRNA editing processes in the animal and Acanthamoeba mithochondria. Furthermore, the eukaryotic ribose methylation of rRNAs by guide RNA molecules may provide another link between RNA editing and modification. As a consequence, the numerous studies suggest that RNA editing may have evolved in specific lineages of speciation, due to the subtle differences in their mechanism. The data does not support the existence of RNA editing in the RNA world, since its mechanism is not linked to any hypothesized process that may have existed at that time. Therefore, RNA editing appears to have evolved at a later time to compensate for the changes in gene sequences and to increase variation.

Editing by insertion or deletion
RNA editing through the addition and deletion of uracil has been found in kinetoplasts from the mitochondria of Trypanosoma brucei. Editing of the RNA starts with the base-pairing of the unedited primary transcript with a guide RNA (gRNA), which contains complementary sequences to the regions around the insertion/deletion points. The newly formed double-stranded region is then enveloped by an editosome, a large multi-protein complex that catalyzes the editing. The editosome opens the transcript at the first mismatched nucleotide and starts inserting uridines. The inserted uridines will base-pair with the guide RNA, and insertion will continue as long as A or G is present in the guide RNA and will stop when a C or U is encountered. The inserted nucleotides cause a frameshift and result in a translated protein that differs from its gene.

The Editosome Complex The mechanism of the editosome involves an endonucleolytic cut at the mismatch point between the guide RNA and the unedited transcript. The next step is catalyzed by one of the enzymes in the complex, a terminal U-transferase, which adds Us from UTP at the 3’ end of the mRNA. The opened ends are held in place by other proteins in the complex. Another enzyme, a U-specific exoribonuclease, removes the unpaired Us. After editing has made mRNA complementary to gRNA, an RNA ligase rejoins the ends of the edited mRNA transcript. As a consequence, the editosome can edit only in a 3’ to 5’ direction along the primary RNA transcript. The complex can act on only a single guide RNA at a time. Therefore, a RNA transcript requiring extensive editing will need more than one guide RNA and editosome complex.

Editing by deamination

C-U editing The editing involves cytidine deaminase that deaminates a cytidine base into a uridine base. An example of C-to-U editing is with the apolipoprotein B gene in humans. Apo B100 is expressed in the liver and apo B48 is expressed in the intestines. The B100 form has a CAA sequence that is edited to UAA, a stop codon, in the intestines. It is unedited in the liver.

A-I editing A-to-I editing occurs in regions of double-stranded RNA (dsRNA). Adenosine deaminases acting on RNA (ADARs) are the RNA-editing enzymes involved in the hydrolytic deamination of Adenosine to Inosine (A-to-I editing). A-to-I editing can be specific (a single adenosine is edited within the stretch of dsRNA) or promiscuous (up to 50% of the adenosines are edited). Specific editing occurs within short duplexes (e.g., those formed in an mRNA where intronic sequence base pairs with a complementary exonic sequence), while promiscuous editing occurs within longer regions of duplex (e.g., pre- or pri-miRNAs, duplexes arising from transgene or viral expression, duplexes arising from paired repetitive elements). There are many effects of A-to-I editing, arising from the fact that I behaves as if it is G both in translation and when forming secondary structures. These effects include alteration of coding capacity, altered miRNA or siRNA target populations, heterochromatin formation, nuclear sequestration, cytoplasmic sequestration, endonucleolytic cleavage by Tudor-SN, inhibition of miRNA and siRNA processing ,and altered splicing.