Analysis of large proteomics datasets: pitfalls, challenges and solutions (#21)
Since it is becoming feasible to collect shotgun proteomics data on the scale of the whole human genome it is crucial that computational workflows for the identification and quantification of peptides and proteins are equipped for dealing with mass spectral datasets of enormous sizes. The challenges are two-fold: 1) the requirements for the computational efficiency are daunting including demands for high degree of parallelization and efficient I/O. 2) Prescriptions for restricting the false discovery rate for peptides and proteins need to be utilized in order to ensure validity of the results. We show how these and other crucial problems are solved in the MaxQuant software and present examples of its application to several large-scale proteomics datasets.