Prediction of High-Risk Types of Human Papillomaviruses Using Statistical Model of Protein “Sequence Space”




Journal Title

Journal ISSN

Volume Title


Hindawi Publishing Corporation


Discrimination of high-risk types of human papillomaviruses plays an important role in the diagnosis and remedy of cervical cancer. Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now. In this paper, we proposed using protein "sequence space" to explore this information and used it to predict high-risk types of HPVs. The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches. The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.



Papillomaviruses, Proteins, Statistics--Models, Sequence spaces

"This work is supported by National Natural Science Foundation of China (61370015, 61170316, and 61272312), research grants from Zhejiang Provincial Natural Science Foundation of China (LY14F020046), Medicine and Health Foundation of Zhejiang Province (2011-2011RCA012), and 521 Talent Cultivation Plan of Zhejiang Sci-Tech University."


CC BY 3.0 (Attribution), ©2015 The Authors.