Computational prediction of ubiquitination proteins using evolutionary profiles and functional domains

Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis and localization. Identification of ubiquitination proteins is of fundamental importance for understanding molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well studied model organisms. To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If we can predict whether a protein can be ubiquitinated or not, it is meaningful by itself and helpful for predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites. In this study, we developed the computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction. The method extracts features from sequence conservation information via a grey system model, as well as functional domain annotation and subcellular localization. Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 0.898, with a Matthew’s correlation coefficient of 0.796.