Novel Automated Platform for Proteoform Driven Top-Down Mass Spectrometry Proteomics
Corbett, John Rawson
MetadataShow full item record
Top-Down proteomics studies protein complexity at the intact proteoform level in order to study chemical modifications, such as co-post translational modifications and non-enzymatic protein processing (e.g., redox active modifications, glycation). With this approach, information content associated with the diversity of chemical/biological processes, such as glycosylation, lipidation, and proteolysis that occur in vivo, is captured facilitating an enhanced representative observation of biological complexity. To obtain this information, a traditional Top-Down approach uses liquid chromatography separations in conjunction with mass spectrometry and database querying techniques in order to identify proteoforms. For example, this approach was used in a study highlighting differentially expressed levels of phosphor-proteoforms within cardiac myofilaments and their association with different degrees of congestive heart failure. Although these strategies have been well characterized, such an approach is not applicable towards large scale proteome analysis due to the high heterogeneity of expressed proteoforms. For this type of analysis, multiple dimensions of orthogonal chromatographic separations are used to antagonize proteoform complexity, with prior attempts identifying over 3,000 unique proteoforms from the HeLa S3 cell line. These Top-Down platforms have also been used towards completing proteome scale label-free quantitative studies; however, such approaches have often struggled due to limited quantitative dynamic range. Additionally, chromatographic separation strategies have been protein driven reducing proteoform observation to only the most abundant species, and in some cases a complete loss of proteoform information (i.e., related glycoproteoforms) due to limitations associated with charging/ionization efficiency, ion transfer, and mass spectrometer resolving power. To address these obstacles, a novel platform that utilizes the concept of isoelectric point separation has been implemented in order to complete chromatographic separations at the proteoform level. Utilizing high resolution in solution isoelectric focusing with superficially porous liquid chromatography and Fourier-transform mass spectrometry, a ~5x improvement of observed proteoforms from cardiac myofibril tissue (1D: 112 vs. 2D: 582 proteoforms) was determined with species ranging from 3 – 230 kDa in size. In addition, novel data processing strategies that are capable of distinguishing related proteoform information content separated into different mass spectra have been implemented with the objective to establish the three quantitative levels of Top-Down proteomics (proteoform, protein, and proteoform ratios). Standard proteins with different physiochemical properties and modification classes were studied to create calibration curves under non-spiked and spiked conditions (i.e., E. coli matrix effect) with a linear dynamic range of 102 – 103 and low femtomole limits of detection values established. Additionally, results indicate that proteoform ratio information content, outside of matrix effects, is independent of protein loading. To aid in automating the data processing strategies associated with mass spectral deconvolution and data binning procedures, triplicate E. coli proteome analyses have been completed with a sliding window approach illustrating reproducible spectral intensity values (~15.1% relative standard deviation) and chromatographic precision tolerances of ± 0.2 pI units and ± 12 seconds for weighted pI and hydrophobicity calculations respectively. Using this platform, Lipocalin-type Prostaglandin D-Synthase, a highly glycosylated cerebrospinal fluid (CSF) protein, was fully characterized with 200+ proteoforms identified, a 65x improvement compared to other non-pI based Top-Down platforms that are chromatographically protein driven. In the future, the completion of CSF proteome profiling investigations will contribute to the interpretation of changes in proteoform modifications and expression levels and the correlation to unique pathobiology associated with different neurodegenerative and neuroinflammatory diseases.