Reported performance of existing transmembrane (TM) topology prediction methods were often based on evaluations which neglected the risk of signal peptides (SP) being predicted as putative TM as well. Here, we evaluated 12 selected TM topology prediction methods (TMpred, TopPred II, DAS, TMAP, MEMSAT 2, SOSUI, PRED-TMR2, TMHMM 2.0, HMMTOP 2.0, SPLIT 3.5, TM Finder, and MPEx) for the effect of SP in prediction performance considering three SP treatments, namely: “remain” (untreated), “removed first”, and “removed later”. The results showed that the presence of SP significantly affected the prediction performance of the 12 selected TM topology prediction methods for all three predicted attributes (the number of transmembrane segments (TMSs), the number of TMSs plus position, and the N-tail location) and for the predicted topology (combined predictions of three attributes) by causing a reduction in prediction accuracy. Lower prediction accuracies were obtained if SP is left untreated (remain) while significant increases were observed if SP is removed either first or later. However, between “removed first” and “removed later” SP treatments, the difference was statistically insignificant. In addition, we found that machine learning-based prediction methods were less affected by the presence of SP than hydropathy-based methods, but still the potential risk of degrading the prediction performance is there however to a lesser degree. Thus, when performing genome-wide analysis, the SP issue should be addressed during TM topology prediction.
In Silico Biology 2(4): 485-94.