Decoy spectral libraries - MA - Kuster Lab

Title: Generating and evaluating decoy spectral libraries

Type: MSc thesis

Category: Algorithm development

Programming language: [ any (e.g. Python) ]

Language: [ English ]

Prior experience: [ programming skills required, no biological knowledge necessary ] 

Complexity/Risk: [ low ]

Contact person: Matthew The

Brief background description

Spectral libraries are a popular technique for data independent acquisition (DIA), however, the reported error rates are often viewed with skepticism. Initial analysis has indicated that this mistrust is indeed justified and, thus, more work is needed to understand and solve this issue. One of our main hypotheses is that current decoy library generation methods do not deal well with fragment intensities, the prediction of which is a hot topic in the field right now.


  • Lam et al. 2010, Artificial Decoy Spectral Libraries for False Discovery Rate Estimation in Spectral Library Searching in Proteomics
  • Zhang et al. 2019, Reverse and Random Decoy Methods for False Discovery Rate Estimation in High Mass Accuracy Peptide Spectral Library Searches

Brief description of the project:

First, characterization of current spectral library techniques is necessary to find out where the problematic cases lie. For example, we expect that current libraries contain many spectra with multiple peptides (chimeric spectra) and also that spectra coming from peptides not present in the library could match incorrectly to similar peptide sequences in the library with high misplaced confidence. Finally, we hope to show that by predicting decoy spectra with our new deep-learning algorithm, Prosit, we can obtain well-calibrated scores.

Expected result

An algorithm that generates decoy spectral libraries and produces well-calibrated scores, while dealing appropriately with confounding factors such as chimeric spectra and peptides missing from the database. The results could lead to a short article in a peer-reviewed journal.