Proteomics/Protein Primary Structure/Alternative Splicing

This Section:

Alternative splicing: Biological function
Alternative splicing is the process in which the primary transcript of a gene is reorganized to produce one of multiple mature mRNA transcripts. By manipulating exonic regions, the sequence of the amino acids produced from the mRNA can be controlled, resulting in the production of different protein sequences with different structures and functions. Alternative splicing has been observed as a mechanism to produce tissue specific proteins from a single gene. This allows for increased cellular specialization without the involvement of genetic modification. This is a very critical aspect of development in that it allows the determination of cell fates in a manner that still allows for flexibility and modification. Alternative splicing has especially been studied in the context of neuronal development, however many spliced variants have been identified that are only expressed in certain tissues during development that seem to play a critical role in triggering cell-type specific gene expression.

Alternative Splicing is a major source of protein diversity in living organisms. It has been estimated that at least 30% of all genes in the human genome are alternatively spliced and this number continues to expand. It was originally thought that the number of alternatively spliced genes accounted for only 5% of proteins in humans. With the unveiling of the human genome it was revealed that the human genome contains less than 30,000 genes. This could potentially account for the huge gap between the relatively small number of genes and the vast number of proteins in the proteome.

It has been suggested that alternative splicing is the source of higher level complexity in eukaryotes. This idea is based on the thought that more complex organisms will alternatively splice their genes more often to obtain more possible mRNA sequences. However evidence shows that the level of alternative splicing between different complexities of organisms is not significant. Providing evidence of the contrary, a study was done using ESTs (expressed sequence tags ). ESTs, being a portion of a transcribed mRNA, are very effective at detecting alternatively spliced genes. As more EST studies are done, it has become apparent that there is a greater number of alternatively spliced genes than previously thought (see above).

Examples of Alternative Splicing
Alternative splicing has been implicated in several diseases. An example of a disease that plays a role in alternative splicing is Rett syndrome (RTT). This disease is found primarily in girls and is characterized by problems with forming connections between neurons, or synapses Girls with this disease typically develop normally until reaching 6-18 months. Beyond this age, they begin to show a decrease in motor and language skills that is typically accompanied by hand wringing motions. It is believed that the gene MeCP2 produces a mutant protein that disrupts its normal function as co-regulator of alternative splicing in some proteins. When this mutant version of MeCP2 is produced, a protein that lacks critical functions of the wild type is produced that does not allow regulation of alternative splicing. MeCP2 was originally identified as a methylation dependent transcriptional repressor. However it was discovered that MeCP2 has alternative functions, that was first elucidated by MeCP2's interaction with Y box binding protein (YB-1 a component of mRNPs). The complex of YB-1 and MeCP2 was shown to regulate splicing of reporter minigenes.

Microarray data from mice with mutations causing phenotypes resembling Rett showed a significant difference in the number of alternatively spliced genes vs wild type mice. Alternatively spliced transcripts matched cell-based assays which showed MeCP2's role in cassette exon changes.

Splicing Mechanism
Splicing occurs through the mechanism of the Spliceosome, a complex composed of a number of proteins and snRNA's. including the snRNAs U1, U2, U3, U4, U5 and U6. These snRNAs recognize the splice sites and then recruit other proteins that connect the splice sites. These splice sites are then brought together through the interaction of these proteins which form the spliceosome. Once the spliceosome has formed, the sites are cleaved to bring the correct exons (or introns) together. Splice sites in organisms have not been well characterized. Some progress has been made in the area of computational prediction using HMM profiling and other forms of machine learning. This will be discussed later in this article. The 5' end of an intron is characterized with a consensus sequence most commonly ending in "GU". This region is known as the splice donar site. The 3' end of an intron is known as the splice acceptor site and contains a consensus sequence with an AG at the intron/exon boundry. There is also an adenine, usually close to the 3' end of an intron, known as the branch point that plays a role in the splicing mechanism. During splicing, U1 begins the formation of a complex at the splice donor site, while U2 recruits portions of the spliceosome complex to the splice acceptor site. U4, U5, and U6 bridge the gap between these two sites, causing the intron to fold. The DNA is digested, at the splice donar site. This portion of the intron is then ligated to the branch point, forming a lariat. This brings the exonic regions is close proximity to each other, allowing for ligation. Finally, the splice acceptor site is cut and the exonic regions are joined.



There are 4 common types of alternative splicing, they are as follows:


 * Alternative promoter selection: A different promoter is used for different splice variants. This results in a different start of the mRNA transcript.


 * Alternative selection of cleavage/polyadenylation sites: Different exons are spliced based on recognition of different cleavage or polyadenylation sites, entire exons can be skipped.  Results in a different exon at the 3’ end of the transcript.


 * Intron retention: Introns are used as coding regions. A sequence that is normally considered an Intron is retained in the final transcript that serves as a template for translation.
 * Exon cassette: Entire exons can be skipped in the middle of the protein, resulting in a different transcript



The Growing Need for Protein Identification:
Proteins are the basic unit of structure and function within living systems. Consequently the field of proteomics has taken on added significance in the modern study of biology. The genomic revolution, which culminated in the sequencing of many genomes, has generated a tremendous amount of data. The field of proteomics has unfortunately been lagging behind, resulting in a disconnect between genomic information and observable phenotypes. Originally, proteins and protein dependent pathways were studied individually. Recently an emphasis on systems biology has lead to changes in this methodology. Entire cells are being characterized with high throughput techniques.

Mass Spectrometry has become the gold standard of protein identification. Briefly, proteins are broken down into peptides, suspended into a gas phase by one of a various number of methods, ionized, and sent through a detector which can determine the mass to charge ratio of various peptides. Mass spectroscopy can be easily automated and combined with other forms of protein separation, making it an ideal candidate for high throughput analysis. In addition, thousands of peptides can be identified from a single source at one time, making this technique much more applicable to systems biology then older techniques such as Edman degradation. Mass spectroscopy can also be used to identify single proteins of interest that have been isolated using other techniques, such as chromatography.

Splicing and Protein Identification
A major drawback of mass spec analysis, and even identification by Edman degredation, is that proteins must be digested into peptides before identification. Often for use with shotgun sequencing , proteins are digested before any form of isolation process is done. In order to confirm identifications protein databases are searched in order to match unique peptides to entire proteins. This process is made more complicated by the immense level of sequence homology shared by peptides produced by alternative splicing. . These proteins, although sharing similar primary structures may have very different or even antagonistic functions, making their identification critical from a biological perspective. What is more, the level of alternative splicing that occurs is not well characterized so it is not even well known which proteins cannot be unambiguously identified. These problems will likely remain until alternative splicing is well documented or can be predicted efficiently computationally.

Effects to analyze alternative splicing:
In vivo analysis is currently the most accurate way to identify alternative splicing, either at the transcript, or in some cases at the protein level. Numerous databases exist that document proteins known to undergo alternative splicing, including the Alternative Splicing Database, and the Transcript Diversity database. While these are good references, as discussed above, the estimates of even the amount of alternative splicing present in eukaryotic organisms vary dramatically. There is, therefore, very little information on how complete these databases are. Steps have also been taken to computationally predict alternative splicing. Generally these algorithms combine gene finding approaches with experimental data. Splice sites are recognized and rated in efficiency based on consensus sequences. Sequences are then matched to known expressed sequence tags to make predictions. Tools such as BLAST, Spidey, and SIM4 can be adapted for these processes. Most modern computational tools encounter difficulty in comparing genomic data to sequences as small and variable as splice sites. False positives and negatives are fairly common. New approaches are still being developed.