A Deep Neural Network Method for Predicting Mitochondria-Localized Proteins in Plants

Targeting and translocation of proteins to the appropriate subcellular compartments is crucial for cell organization and function. Newly synthesized proteins are transported to mitochondria with the assistance of targeting sequences, which are complex, containing either an N-terminal presequence or a multitude of internal signals to target this organelle. Compared with experimental approaches, computational predictions provide an efficient and cost-effective way to infer subcellular localization for any given protein. However, it is still challenging to predict plant mitochondrial localized proteins accurately due to various limitations, and the performance of current tools is unsatisfactory. We present a novel computational approach for large-scale prediction of plant mitochondrial proteins. We collected protein subcellular localization data in plants from databases and literature, and extracted different types of features from the training data, including amino acid composition, protein sequence profile, and gene co-expression information. We then trained deep neural networks for predicting plant mitochondrial proteins. Testing on a non-redundant dataset of potato mitochondrial and Swiss-Prot proteins, our method achieves considerable improvements over existing tools in predicting mitochondrialocalized proteins in plants.