A LARGE SYNTHETIC PEPTIDE AND PHOSPHOPEPTIDE REFERENCE LIBRARY FOR MASS SPECTROMETRY-BASED PROTEOMICS

Project Members: Harald Marx, Dr. Simone Lemeer, Prof. Dr. Bernhard Küster
Contact: kuster[at]tum.de and harald.marx[at]tum.de

Abstract

We present a peptide library and data resource of >100,000 synthetic unmodified peptides plus their phosphorylated counterparts with known sequences and phosphorylation sites. Mass spectrometric analysis generated a data set which can be used in numerous objective ways to develop, evaluate and improve experimental and computational proteomic strategies. We have evaluated the merits of different search engines (Mascot, Andromeda) and fragmentation methods (HCD, ETD) for peptide identification. We have also compared the sensitivity and accuracy of phosphorylation site localization tools (MDscore, PTMscore, phosphoRS) and characterized the peptides' chromatographic behavior. We find that HCD identifies more (phospho-)peptides than ETD, that phosphopeptides generally elute later from reversed phase columns and are easier to identify than unmodified peptides and that current computational tools for proteomics can still be significantly improved. Many more applications of the resource can be envisaged which is why we are making it available to the research community.


Model



Methods

Mascot

Raw MS data files were converted into Mascot generic format files (MGF) using Mascot Distiller (2.4.2.0, www.matrixscience.com). Important parameters included: i) signal to noise ratio of 20 for MS/MS and ii) time domain off (no merging of spectra of the same precursor). The MGF files were searched against human IPI v3.72 including the sequences of all 96 libraries,using the Mascot search engine (2.3.1, 24). Search settings: Decoy search using a randomized version of the human IPI v3.72 including the sequences of all 96 libraries was enabled; monoisotopic peptide mass (considering up to two 13C isotopes); trypsin/P as protease; a maximum of four missed cleavages; peptide charge +2 and +3; peptide tol. ± 5 ppm; MS/MS tol. ± 0.02 Da; instrument type ESI-Trap (for HCD data) or ETD-Trap (for ETD data) respectively; variable modifications: oxidation (M), phospho (ST), phospho (Y). The result files were exported to pepXML and Mascot XML with default options provided by Mascot.

MaxQuant

MaxQuant, version 1.3.0.3 was used to generate peak lists from the MS/MS spectra for database searching. High-resolution profile MS/MS data was deconvoluted before extraction of the ten most abundant peaks per 100 Th. All statistical filters in MaxQuant such as peptide and protein false discovery rates and mass deviation filters were disabled in order to score all submitted MS/MS spectra. Peptide masses were recalibrated by MaxQuant prior to Andromeda searches. Peak lists were searched against human IPI v3.72 (supplemented with additional 96 entries, each comprising concatenations of all theoretically possible peptides within a synthesized library). Oxidation (M), phosphorylation (STY) were used as variable modifications. A mass tolerance of 5 ppm was used for the peptide mass. Both HCD and ETD data were searched with a 0.02 Da tolerance window. Trypsin/P was set as proteolytic enzyme and a maximum of four miss cleavages were allowed. The MS/MS.txt output file of the software was used for further data analysis.

ProteomeDiscoverer

PhosphoRS site localization was performed using the PhosphoRS 2.0 embedded in the Proteome Discoverer 1.3 software (Thermo Fisher Scientific, Bremen, Germany). Raw MS data files were converted into Mascot generic format files (MGF) using Mascot Distiller (2.4.2.0) as described above. The MGF files were searched against human IPI v3.72 (supplemented with additional 96 entries, each comprising concatenations of all theoretically possible peptides within a synthesized library) using the Mascot search engine (2.3.1) embedded in the Proteome Discoverer 1.3 software. In the spectrum selector node, the unrecognized mass analyzer replacements was set as FTMS and the unrecognized activation type replacements as HCD or ETD, respectively. Search settings for Mascot were identical to described above. Phosphorylation site localization was performed on the Mascot results using PhosphoRS 2.0. The result files were exported to csv format.