Introduction

As one important post-translational modification of prokaryotic proteins, pupylation plays a key role in regulating various biological processes. The accurate identification of pupylation sites is crucial for understanding the underlying mechanisms of pupylation. Although several computational methods have been developed for the identification of pupylation sites, the prediction accuracy of them is still unsatisfactory. Here, a novel bioinformatics tool named IMP-PUP is proposed to improve the prediction of pupylation sites. IMP-PUP is constructed on the composition of k-spaced amino acid pairs and trained with a modified semi-supervised self-training support vector machine (SVM) algorithm. The proposed algorithm iteratively trains a series of support vector machine classifiers on both annotated and non-annotated pupylated proteins. Computational results show that IMP-PUP achieves the area under receiver operating characteristic curve of 0.91, 0.73 and 0.75 on our training set, Tung's testing set and our testing set, respectively, which are better than those of Different Error Cost SVM algorithm and original self-training SVM algorithm. Independent tests also show that IMP-PUP significantly outperforms other three existing pupylation sites predictors GPS-PUP, iPUP and pbPUP. Therefore, IMP-PUP can be a useful tool for accurate prediction of pupylation sites.

IMP-PUP software MATLAB code

The whole software MATLAB code of IMP-PUP is availabe by clicking here.

Usage of this MATLAB software package:

  1. Prepare your sequence(s) file in fasta format and name it "Xinput.fasta"
  2. Run the program "IMP_PUP.m"
  3. Get the result file "Pre_result.xls"

Please note that this software package only supports the 32-bit version of Windows operating system.

Dataset

Our training set, our independent testing set, Tung's independent testing set and unlabeled training set are availabe by clicking here.

Reference

Zhe Ju, Hong Gu, Predicting pupylation sites in prokaryotic proteins using semi-supervised self-training support vector machine algorithm, Analytical Biochemistry (2016), doi: 10.1016/j.ab.2016.05.005.

If you have any questions, please contact Zhe Ju