Metaproteomics re-analysis - MA - Kuster Lab

Title: Deep learning assisted identification of peptides and proteins in human gut samples

Type: MA

Category: [ ML/DL , DS ]

Programming language: [ R | python ]

Language: [ English ]

Prior experience: [ basic/advanced knowledge of R and Python, basics in Machine/Deep learning ]

Complexity/Risk: [ low ]

Contact person: Mathias & Patroklos

Brief background description (couple of sentences + literature): Metaproteomics is the study of all protein samples recovered directly from environmental sources. The human gut is one of those sources, where potentially hundreds or thousands of different bacteria are present. The identification of present peptides and proteins is particularly challenging here because of the unknown complexity of the sample, the diversity of the present biological material, the large search space and the homology of present proteins. Classical database search engines fail to efficiently separate correct from incorrect peptide identifications and thus report far fewer present peptides than expected. Prosit, our deep learning architecture for the prediction of tandem mass spectra and retention times of peptides, can aid this process by increasing the specificity at which peptides can be identified.

Brief description of the project (couple of sentences): The goal of this project is to re-analyze an in-house dataset of 212 fecal samples from 56 hospitalized acute leukemia patients with multidrug-resistant Enterobacteriaceae (MRE) gut colonization. The data was analyzed using metagenomics and metaproteomics, resulting in patient-specific databases and proteomic profiles. For this, you will use the Prosit rescoring pipeline, which re-scores peptide spectrum matches by comparing experimental to predicted spectra. The resulting list of peptides should be re-investigated with the goal to find potential markers indicating progression, resistance and sensitivity of individual patients. 

Expected result: Prior results indicate that re-scoring pipelines are able to significantly increase the number of identified peptides and thus increase the coverage of the present proteins. The aim is to publish the findings in a follow-up paper and showcase that the Prosit-assisted pipeline is able to alleviate some of the current issues. Also, there is the potential of the discovery of novel biomarkers.