Proteomics/Proteomics and Drug Discovery


 * Written by: [mailto:pxk9006@rit.edu Piotr Kowalski] and [mailto:pok7810@rit.edu Patrick Kenney]

Introduction to the Drug Discovery Process
he process of drug discovery within the modern scientific context is quite complex, integrating many disciplines, including structural biology, metabolomics, proteomics and computer science, just to name a few. The process is generally quite tedious and expensive, given the sheer amount of possibilities of drug-to-target interactions in-vivo, and the necessity of successfully passing rigorous pharmacokinetic studies and toxicology assays prior to even being considered for clinical trials. Though a more detailed explanation is offered further into this text, several key components of the drug discovery process include target selection, lead identification and preclinical and clinical candidate selection. The schematic on the right outlines the steps involved in the drug discovery process.

The recent boom of the proteomics field, or the analysis of the ever dynamic organismal proteome, has brought many advances with respect to the very nature of how the current drug discovery process is undertaken. The potential the field of proteomics brings in for identifying proteins involved in disease pathogenesis and physiological pathway reconstruction facilitates the ever increasing discovery of new, novel drug targets, their respective modes of action mechanistically, and their biological toxicology.

The challenge in the drug discovery process is to find the exact causes of an underlying disease and find a way to negate them or bring them to normal levels. A mechanistic understanding of the nature of the disease in question is essential if we are to elucidate any target-specific remedy for it. While the causes of many documented clinical problems vary greatly in their nature and origin, in some cases, the cause is found at the protein level, involving protein function, protein regulation, or protein-protein interactions. One example of such a disorder would be alkaptonuria, characterized by a defect in the gene coding for the enzyme homogentisic acid oxidase, inhibiting the metabolism of homogentisic acid to maleylacetoacetic acid, within the phenylalanine degradation pathway. While the underlying cause of this inborn disease is due to a single gene genetic defect, the clinical manifestations, which include excretion of black urine, are a function of the built up of homogentisic acid resulting from a defective [protein] enzyme.

Recent advances in applied genomics helped in the target identification process, since it allowed for high throughput screening of expressed genes. However, studies have shown that there is a poor correlation between the regulation of transcripts and actual protein quantities. The reasons for this are that genome analysis does not account for post-translational processes such as protein modifications and protein degradation. Therefore, the methods employed in the drug-discovery process started to shift from genomics to proteomics. Analysis of the dynamic organismal proteome, as opposed to the static genome, will certainly bring a much more accurate approach to identifying not only applicable biomarkers that will aid in diagnosis, but also effective remedies for diseases of varying origins.

The field of proteomics faces some daunting challenges, in comparison to genomics, for several reasons. First, protein science lacks an analogue of the polymerase chain reaction (PCR), which can generate many copies of a single, native molecule in vivo (nucleic acids in the case of PCR). However, several recent approaches have been applied in an effort to ameliorate this quandary. Methods of chemical synthesis exist, being limited by yield, particularly when it comes to synthesizing lengthy peptides. In-vivo expression synthesis methods exist as well, however, this approach cannot be applied to producing proteins which may alter normal cellular function. Also, cell-free synthesis ribosome kits can also be employed for accurate and rapid protein synthesis, though the intrinsic presence of ribosome inactivating enzymes contributes to the instability of these systems. Second, in contrast to DNA, protein levels vary significantly depending on cell type and environment. Third, protein abundance is not directly correlated to protein activity. Protein activity is often determined by post-transcriptional modifications such as phosphorylation. Protein activity, not protein abundance, is of interest in the drug discovery process. Finally, proteins form many interactions with other proteins or small molecules. Elucidation of these interactions would greatly speed up the drug discovery process. One way this is currently being done is through ligand bound X-ray crystallographic studies.

The ideal proteomics technique suited for drug discovery would have the following features: it should be able to separate membrane proteins and detect low abundance proteins, two abilities not quite yet realized, yet required in current separations and analytical techniques. Furthermore, it should be able to identify protein activity independent of protein abundance. It also should reveal protein-protein and protein-small-molecule interactions. This method should also be implemented easily, be automatable, and perform at high-throughput speed. Proteomics researchers are addressing these issues, and new methods are being developed.

Virtual drug libraries are being developed, both in the public and private sectors. These databases contain potential drug compounds; these compounds may or may not exist outside of a computer database, and new compounds developed through various methods of synthesis are continually added. Methods of modifying existing database entries to create new isomers and derivatives are also used, to more adequately cover a range of potential drug compounds. Docking and scoring are implemented using known and hypothetical drug targets on a protein, coupled with the databases of virtual chemical compounds. In docking, various computational methods are used to position a chemical properly within a protein binding site. Genetic algorithms and Monte Carlo methods are two popular algorithms for evolving an optimum binding position. This process screens for chemicals that are potential drugs, which initially are termed as hits. After docking, scoring is carried out using mathematical models. These models determine the chemical binding strength and energy state of the drug-protein complex. Those hits with high ranking scores are subsequently subjected to in-vivo tests; hits with positive scores in both areas are then known to be leads.

Evaluation of docked and scored complexes are then made, selecting an arbitrary number of top hits to be further screened manually. The first two steps are done entirely in silico; however, the best complexes now need to be examined using software visualization, often in three-dimensional setups. This allows scientists to ensure that the determined docking orientation looks acceptable, and that the scoring is correct based on known interaction energies such as hydrogen bonds and ionic interactions.

The compounds that make it through docking, scoring, and evaluation become drug leads, and are then passed on to undergo drug testing techniques by scientists in a wet lab, to ensure that only compounds with effects relatively unique to the target system and safe to the rest of organism are considered. However, the drug company has already saved much time and money up to this point by having computers do chemical screening, rather than human scientists.