Structural Biochemistry/Analyzing protein structure and function using ancestral gene reconstruction

Introduction
Learning how protein sequence determines structure and function as well as learning the processes that generated the diverse structures and functions of extant proteins requires knowledge of the distribution of structures and functions through the multidimensional space of possible protein sequences. However, characterizing that distribution can be very difficult due to the vast number of possible sequences and the time required to experimentally generate and study them. One answer to this problem is to analyze the evolutionary record. Evolution is one enormous experiment involving the diversification and optimization of protein structure. The outcomes of that massive experiment are preserved in the sequences, structures, and functions of modern-day protein families. Evolutionary analysis of these families can provide key insights into the nature of protein sequence space and the determinants of protein structure and function.

Horizontal and Vertical Analysis
One way to study protein families is to identify candidate amino acid differences between divergent family members using sequence-based or structural analysis. This can be followed by testing the functional role of these residues by exchanging them between family members using site-directed mutagenesis. This is known as the "horizontal" approach, which identifies residues that are important to one function because exchanging them will result in impaired or nonfunctional protein. However, this approach rarely identifies the set of residues sufficient to switch the function of one protein to that of another. Protein function evolves as mutations accumulated through time, or vertically, in ancestral protein lineages. On the other hand, horizontal comparisons of modern proteins involve only the tips of the evolutionary tree. The horizontal approach has two major flaws. First, it is inefficient in that many functionally irrelevant sequence differences may have accumulate during intervals in which the function of interest did not change. Second, lineage-specific sequence changes may lead to epistasis, or the interdependence between mutations that cause a single change to have different effects in different protein family members.

An explicitly phylogenetic approach to study functional diversity within the protein families can solve these issues. A vertical strategy would address mutations that occur along the branch in the family tree on which functional diversification occurred. This strategy is more efficient in that only mutations that occurred during limited period of evolutionary time need to be investigated. Furthermore, this vertical strategy can avoid the effect of epistatic interactions by using the protein background in which the sequence changes actually occurred. A vertical strategy will even identify restrictive and permissive epistatic mutations.

Resurrecting ancient proteins
Studying evolution along a family branch can be difficult in that it requires access to the nodes on either end of the branch. However, ancestral sequence reconstruction (ASR), a new strategy for studying molecular evolution, can address this problem. ASR is a mature technique that has been used to study many protein families including GFP-like proteins, steroid receptors, opsins, etc. ASR first infers ancestral sequences from an alignment of extant protein sequences. The maximum likelihood sequence at any ancestral node on the phylogeny is the sequence with the highest probability of generating all of the sequence data in modern-day proteins. Once the ancestral protein sequence is uncovered, a DNA molecule coding for it can be synthesized. This allows the ancestral protein to be expressed and characterized experimentally. The following case studies demonstrate the effectiveness of ASR studies to quantitatively dissect the interactions that determine function, reveal multiple amino acids that underlie function, and determine the role of epistasis in shaping protein evolution.

Opsins: quantifying functional interactions

The study of the opsins, a family of G-protein coupled receptors that absorb light in the vertebrate visual system, demonstrates the benefits of using ASR to study the effects of function-switching mutations. All opsins use the same covalently attached chromophore. However, each opsin has a distinct wavelength of maximum absorption. Comparative studies in modern opsins are difficult to interpret due to the complexity of sequence determinants of wavelength of maximum absorption. However, researchers were able to use ASR to dissect these interactions and yielded results universally applicable to the family as a whole.

GFP-like proteins

The work by Mikhail Matz's laboratory on GFP-like proteins from scleratinian corals demonstrated effective use of ASR in identifying those residues. By using ASR to characterize ancient sequences throughout the family, Matz and colleagues found that GFP-like protein in the ancestral Faviina fluoresced in the green, followed by a variety of other colors. Matz and colleagues then determined to identify the mutations responsible for the evolution of red fluorescence from this green ancestor in the great star coral Montastrea cavernosa. By using ASR, they were able to identify mutations that would have been impossible to identify using a horizontal approach.

Implications
These ASR case studies demonstrate that ASR promises new insights into the physical-chemical determinants that have shaped protein evolution and historical determinants of protein architecture. Furthermore, ASR has build a bridge between mechanistic biochemistry and evolutionary biology, fields of study that have been largely separate.