UbiProber (Visitors: 6916)
Systematic dissection of the ubiquitylation proteome is emerging as an appealing but challenging research topic because of the significant roles ubiquitylation play not only in protein degradation but also in many other cellular functions. High-throughput experimental studies using mass spectrometry have identified many ubiquitylation sites, primarily from eukaryotes. However, the vast majority of ubiquitylation sites remain undiscovered, even in well-studied systems. Because mass spectrometry–based experimental approaches for identifying ubiquitylation events are costly, time-consuming and biased toward abundant proteins and proteotypic peptides, in silico prediction of ubiquitylation sites is a potentially useful alternative strategy for whole proteome annotation. Because of various limitations, current ubiquitylation site prediction tools were not well designed to comprehensively assess proteomes. We present a novel tool known as UbiProber, specifically designed for large-scale predictions of both general and species-specific ubiquitylation sites. We collected proteomics data for ubiquitylation from multiple species from several reliable sources and used them to train prediction models by a comprehensive machine-learning approach that integrates the information from key positions and key amino acid residues. Cross-validation tests reveal that UbiProber achieves some improvement over existing tools in predicting species-specific ubiquitylation sites. Moreover, independent tests show that UbiProber improves the areas under receiver operating characteristic curves by ~15% by using the Combined model.
Reference: Xiang Chen, Jianding Qiu*, Shaoping Shi, Shengbao Suo, Shuyun Huang, Ruping Liang. Incorporating key positions and amino acids features to identify general and species-specific ubiquitin conjugation sites, Bioinformatics, 29(13): 1614-1622.
PupPred (Visitors: 1190)
Prokaryotic ubiquitin-like protein (Pup) is the first identified prokaryotic protein that is functionally analogous to ubiquitin. Recent studies have shed light on the Pup activation and conjugation to target proteins to be a signal for the selective degradation proteins in Mycobacterium tuberculosis (Mtb). That is a pupylation process, which can control protein stability as an important post-translation modification in actinomycetes. Detecting possible pupylation sites is crucial and fundamental for understanding the molecular mechanisms of Pup. Yet comparative studies suggest that the development of accurate and complete repertories of pupylation is still in its early stages. Unbiased screening for pupylation sites by experimental methods is time consuming and expensive; in silico prediction can provide functional candidates and help narrow down the experimental efforts. Here, we present an effective classifier of PupPred for predicting pupylation sites, which shows better performance than existing classifiers. Importantly, this work not only investigates the sequence-derived, structural and evolutionary hallmarks around pupylation sites but also compares the differences of pupylation and ubiquitylation from the environmental, conservative and functional characterization of substrates. These prediction and analysis results may be helpful for further experimental investigation of degradation proteins in prokaryotes. Finally, the PupPred server is available at http://bioinfo.ncu.edu.cn/PupPred.aspx.
Reference: Xiang Chen, Jian-Ding Qiu*, Shao-Ping Shi, Sheng-Bao Suo, Ru-Ping Liang. Systematic Analysis and Prediction of Pupylation Sites in Prokaryotic Proteins, PLoS ONE, 2013, 8(9): 74002.
WAP-Palm (Visitors: 1470)
As an extremely important and ubiquitous post-translational lipid modification, palmitoylation plays a significant role in a variety of biological and physiological processes. Unlike other lipid modifications, protein palmitoylation and depalmitoylation are highly dynamic and can regulate both protein function and localization. The dynamic nature of palmitoylation is poorly understood because of the limitations in current assay methods. The in vivo or in vitro experimental identification of palmitoylation sites is both time consuming and expensive. Due to the large volume of protein sequences generated in the post-genomic era, it is extraordinarily important in both basic research and drug discovery to rapidly identify the attributes of a new protein's palmitoylation sites. In this work, a new computational method, WAP-Palm, combining multiple feature extraction, has been developed to predict the palmitoylation sites of proteins. The performance of the WAP-Palm model is measured herein and was found to have a sensitivity of 81.53%, a specificity of 90.45%, an accuracy of 85.99% and a Matthews correlation coefficient of 72.26% in 10-fold cross-validation test. The results obtained from both the cross-validation and independent tests suggest that the WAP-Palm model might facilitate the identification and annotation of protein palmitoylation locations. The online service is available at http://bioinfo.ncu.edu.cn/WAP-Palm.aspx.
Reference: Shaoping Shi, Xingyu Sun, Jianding Qiu*, Shengbao Suo, Xiang Chen, Shuyun Huang, Ruping Liang, The prediction of palmitoylation site locations using a multiple feature extraction methods, Journal of Molecular Graphics and Modelling, 2013, 40: 125-130.
AcetylAAVs (Visitors: 1281)
Next-generation sequencing (NGS) technologies are yielding ever higher volumes of genetic variation data. Given this large amount of data, it has become both a possibility and a priority to determine what the functional implication of genetic variations is. Considering the essential roles of acetylation in protein functions, it is highly likely that acetylation related genetic variations change protein functions. In this work, we performed a proteome-wide analysis of amino acid variations that could potentially influence protein lysine acetylation characteristics in human variant proteins. Here, we defined the AcetylAAVs as acetylation related amino acid variations that affect acetylation sites or their interacting acetyltransferases, and categorized three types of AcetylAAVs. Using the developed prediction system, named KAcePred, we detected that 50.87% of amino acid variations are potential AcetylAAVs and 12.32% of disease mutations could result in AcetylAAVs. More interestingly, from the statistical analysis, we found that the amino acid variations that directly create new potential lysine acetylation sites have more chance to cause diseases. It can be anticipated that the analysis of AcetylAAVs might be useful to screen important polymorphisms and help to identify the mechanism of genetic diseases. A user-friendly web interface for analysis of AcetylAAVs is now freely available at http://bioinfo.ncu.edu.cn/AcetylAAVs_Home.aspx.
Reference: Shengbao Suo, Jianding Qiu*, Shaoping Shi, Xiang Chen, Shuyun Huang, Ruping Liang, Proteome-wide analysis of amino acid variations that influences protein lysine acetylation, Journal of Proteome Research, 2013, 12 (2): 949-958.
PMeS (Visitors: 6634)
Protein methylation is predominantly found on lysine and arginine residues, and carries many important biological functions, including gene regulation and signal transduction. Given their important involvement in gene expression, protein methylation and their regulatory enzymes are implicated in a variety of human disease states such as cancer, coronary heart disease and neurodegenerative disorders. Thus, identification of methylation sites can be very helpful for the drug designs of various related diseases. In this study, we developed a method called PMeS to improve the prediction of protein methylation sites based on an enhanced feature encoding scheme and support vector machine. The enhanced feature encoding scheme was composed of the sparse property coding, normalized van der Waals volume, position weight amino acid composition and accessible surface area. The PMeS achieved a promising performance with a sensitivity of 92.45%, a specificity of 93.18%, an accuracy of 92.82% and a Matthew’s correlation coefficient of 85.69% for arginine as well as a sensitivity of 84.38%, a specificity of 93.94%, an accuracy of 89.16% and a Matthew’s correlation coefficient of 78.68% for lysine in 10-fold cross validation. Compared with other existing methods, the PMeS provides better predictive performance and greater robustness. It can be anticipated that the PMeS might be useful to guide future experiments needed to identify potential methylation sites in proteins of interest. The online service is available at http://bioinfo.ncu.edu.cn/inquiries_PMeS.aspx.
Reference: Shaoping Shi, Jianding Qiu*, Xingyu Sun, Shengbao Suo, Shuyun Huang, Ruping Liang, PMeS: Prediction of methylation sites based on enhanced feature encoding scheme, PLoS ONE, 2012, 7(6): e38772.
PLMLA (Visitors: 2342)
Post-translational lysine methylation and acetylation are two major modifications of lysine residues. They play critical roles in various biological processes, especially in gene regulation. Identification of protein methylation and acetylation sites would be a foundation for understanding their modification dynamics and molecular mechanism. This work presents a method called PLMLA that incorporates protein sequence information, secondary structure and amino acid properties to predict methylation and acetylation of lysine residues in whole protein sequences. We apply an encoding scheme based on grouped weight and position weight amino acid composition to extract sequence information and physicochemical properties around lysine sites. The prediction accuracy for methyllysine and acetyllysine are 83.02% and 83.08%, respectively. Feature analysis reveals that methyllysine is likely to occur at the coil region and acetyllysine prefers to occur at the helix region of protein. The upstream residues away from the central site may be close to methylated lysine in three-dimensional structure and have a significant influence on methyllysine, while the positively charged residues may have a significant influence on acetyllysine. The online service is available at http://bioinfo.ncu.edu.cn/inquiries_PLMLA.aspx.
Reference: Shaoping Shi, Jianding Qiu*, Xingyu Sun, Shengbao Suo, Shuyun Huang, Ruping Liang, PLMLA: Prediction of lysine methylation and lysine acetylation by combining multiple features, Molecular BioSystems, 2012, 8 (5): 1520-1527.
PredSulSite (Visitors: 1660)
Tyrosine sulfation is a ubiquitous posttranslational modification that regulates extracellular protein–protein interactions, intracellular protein transportation modulation, and protein proteolytic process. However, identifying tyrosine sulfation sites remains a challenge due to the lability of sulfation sequences. In this study, we developed a method called PredSulSite that incorporates protein secondary structure, physicochemical properties of amino acids, and residue sequence order information based on support vector machine to predict sulfotyrosine sites. Three types of encoding algorithms—secondary structure, grouped weight, and autocorrelation function—were applied to mine features from tyrosine sulfation proteins. The prediction model with multiple features achieved an accuracy of 92.89% in 10-fold cross-validation. Feature analysis showed that the coil structure, acidic amino acids, and residue interactions around the tyrosine sulfation sites all contributed to the sulfation site determination. The detailed feature analysis in this work can help us to understand the sulfation mechanism and provide guidance for the related experimental validation. PredSulSite is available as a community resource at http://www.bioinfo.ncu.edu.cn/inquiries_PredSulSite.aspx.
Reference: Shuyun Huang, Shaoping Shi, Jianding Qiu*, Xingyu Sun, Shengbao Suo, PredSulSite: Prediction of protein tyrosine sulfation sites with multiple features and analysis, Analytical Biochemistry, 2012, 428(1): 16-23.
PSKAcePred (Visitors: 1635)
Protein lysine acetylation is a type of reversible post-translational modification that plays a vital role in many cellular processes, such as transcriptional regulation, apoptosis and cytokine signaling. To fully decipher the molecular mechanisms of acetylation-related biological processes, an initial but crucial step is the recognition of acetylated substrates and the corresponding acetylation sites. In this study, we developed a position-specific method named PSKAcePred for lysine acetylation prediction based on support vector machines. The residues around the acetylation sites were selected or excluded based on their entropy values. We incorporated features of amino acid composition information, evolutionary similarity and physicochemical properties to predict lysine acetylation sites. The prediction model achieved an accuracy of 79.84% and a Matthews correlation coefficient of 59.72% using the 10-fold cross-validation on balanced positive and negative samples. A feature analysis showed that all features applied in this method contributed to the acetylation process. A position-specific analysis showed that the features derived from the critical neighboring residues contributed profoundly to the acetylation site determination. The detailed analysis in this paper can help us to understand more of the acetylation mechanism and can provide guidance for the related experimental validation.
Reference: Shengbao Suo, Jianding Qiu*. Shaoping Shi, Xingyu Sun, Shuyun Huang, Position-specific analysis and prediction for protein lysine acetylation based on multiple features, PLoS ONE, 2012, 7(11): e49108.
PSEA (Visitors: 1597)
Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in intracellular signal transduction. With the increasing number of kinase-specific phosphorylation sites and disease-related phosphorylation substrates that have been identified by mass spectrometry-based proteomics or traditional experimental methods, the desire to explore the regulatory relationship between protein kinases and disease-related phosphorylation substrates is motivates. In this work, we systematically analyzed the kinases’ characteristic of all disease-related phosphorylation substrates by using our kinase-specific predictors which are developed on the basis of Phosphorylation Set Enrichment Analysis (PSEA) method. We evaluated the efficiency of our method with independent test and concluded that our approach helpful for identifying kinases responsible for phosphorylated substrates. More interestingly, from the systematic analysis, we found that Mitogen-activated protein kinase (MAPK) and Glycogen synthase kinase (GSK) families are more inclined to catalyze the happening of abnormal phosphorylation and future result in diseases. It can be anticipated that the characteristic analysis of abnormal phosphorylation kinases might be useful to promote protein kinase inhibitor drug development for diseases and help to identify the mechanism of phosphorylation related diseases.
Reference: Shengbao Suo, Jianding Qiu*, Shaoping Shi, Xiang Chen, Ruping Liang, PSEA: Kinase-specific prediction and analysis of human phosphorylation substrates, Scientific Reports, 2014, 4, 4524.
SubPhos (Visitors: 826)
Protein phosphorylation is the most common post-translational modification (PTM) regulating major cellular processes such as cell division, growth, and differentiation through highly dynamic and complex signaling pathways. However, the dynamic interplay of protein phosphorylation is not occurring randomly within the cell but is rather finely orchestrated by specific kinases and phosphatases that are unevenly distributed across subcellular compartments. This spatial separation not only regulates protein phosphorylation but can also control the activity of other enzymes and the transfer of other post-translational modifications. To better understand the role of phosphorylation in maintenance of functional distinctions among subcellular compartments, we provide a compartment-wide map of phosphorylation sites from eight human subcellular compartments analyzed by proteomic characterizations. We collect 10265 phosphorylation proteins for different subcellular compartments and provide the data set as a web-based database. Our data set reveals that the subcellular phosphorylation distribution is compartment-type dependent and that phosphorylation displays site-specific sequence motifs that diverge between subcellular compartments. Additionally, we demonstrate that phosphorylation targets compartment-specific pathways involved in fundamental physiological processes. Large-scale comparative phosphoproteomics studies have frequently been done on whole cells or organs by conventional bottom-up mass spectrometry approaches, that is, at the phosphopeptide level. Using this approach, there is no way to know which the phosphopeptide signal originated from. Also, as a consequence of the scale of these studies, important information on the localization of phosphorylation sites in subcellular compartments is not surveyed. Here, we present a first account of the emerging field of subcellular phosphoproteomics where Support Vector Machine (SVM) approach combined with a novel strategy of discrete wavelet transform (DWT) to facilitate the identification of compartment-specific phosphorylation sites and to unravel the intricate regulation of protein phosphorylation. The method was implemented through a novel web tool termed SubPhosPred, which designed currently eight compartment-specific models. Cross-validation tests show that SubPhosPred achieves satisfactory performance. Application of SubPhosPred on proteins related to Alzheimer's disease suggested that Alzheimer's disease are more closely implicated in golgi apparatus and endoplasmic reticulum (P-value<5.00E-2) than other subcellular compartments. Furthermore, we briefly reviewed recent progress in diseases of subcellular level and highlighted insights into its relationship with phosphorylation modification.
Reference: Xiang Chen, Shao-Ping Shi, Sheng-Bao Suo, Hao-Dong Xu, Jian-Ding Qiu*. Proteomic Analysis and Prediction of Human Phosphorylation Sites in Subcellular Level Reveals Subcellular Specificity, Bioinformatics, 2014, Accepted.
SubPhosDB (Visitors: 533)
To better understand the role of phosphorylation in maintenance of functional distinctions among subcellular compartments, we provide a compartment-wide map of phosphorylation sites from eight human subcellular compartments analyzed by proteomic characterizations. We collect 10265 phosphorylation proteins for different subcellular compartments and provide the data set as a web-based database ( SubPhosDB ). Our data set reveals that the subcellular phosphorylation distribution is compartment-type dependent and that phosphorylation displays site-specific sequence motifs that diverge between subcellular compartments. Additionally, we demonstrate that phosphorylation targets compartment-specific pathways involved in fundamental physiological processes.
Reference: Xiang Chen, Shao-Ping Shi, Sheng-Bao Suo, Hao-Dong Xu, Jian-Ding Qiu*. Proteomic Analysis and Prediction of Human Phosphorylation Sites in Subcellular Level Reveals Subcellular Specificity, Bioinformatics, 2014, Accepted
PTMProber (Visitors: 668)
Protein post-translational modifications are a widely important biological regulatory mechanism, and the speed of their discovery using high-throughput experimental strategies is rapidly increasing for certain organisms, particularly for the mammals. While these strategies are widely used, few research groups are routinely producing large data sets of post-translationally modified peptides for other organisms. For this reason, the study of protein post-translational modifications and related research efforts is reined to understanding biological regulation. We introduce a new general strategy called PTMProber designed to globally identify a variety of post-translational modifications in an organism of user interest using known modified sites from other experimentally-investigated organisms. Although the current pre-constructed models only involved several post-translational modifications, and were only correlated with cellular conditions of 329 organisms, PTMProber provides a unique functionality for constructing customized models (such as organism-specific and modification-specific models) from user-provided data sets. PTMProber is available as a web server at http://bioinfo.ncu.edu.cn/PTMProber/index.aspx.
Reference: Xiang Chen, Shao-Ping Shi, Sheng-Bao Suo, Hao-Dong Xu, Jian-Ding Qiu. PTMProber: A blast approach to deciphering post-translational modification site atlas for specific proteome, In preparation.
SuccFind (Visitors: 757)
Lysine succinylation is a newly identified protein post-translational modification pathway present in both prokaryotic and eukaryotic cells, which plays pivotal roles in various biological processes. Succinylation was first discovered to occur at the active site of ho-moserine trans-succinylasealthough this was supposed to indicate an intermediate reaction during the process of a succinyl group from succinyl-CoA transfer to homoserine. Compared to well-known and extensively studied protein phos-phorylation, protein succinylation attracts much less attention and the molecular mechanism of the succinylation is still incompletely understood. And yet annotation of succinylation in proteomes is a first-critical step toward decoding protein function and understand-ing of their physiological roles that have been im-plicated in the pathological processes. Therefour, we developed the first succinylation site online prediction tool, called SuccPred, which is constructed to pre-dict the lysine succinylation sites based on two major categories of characteristics: sequence-derived features, and evolutionary-derived information of sequence. SuccPred can provide more instructive guidance for further experimental investigation of protein succinylation
Reference: Hao-Dong Xu, JianDing. Qiu, et al. (2015) SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy
SUMOAMVR (Visitors: 152)
The complete identification of coding sequences in a number of species has led to announce the beginning of the post-genomic era. Rapid advances in genomic sequencing and the applications of new biotechnology have made it available to study complex phenomena in biological system. As a recently identified type of modification, Small ubiquitin-related modifier (SUMO) conjugation or sumoylation, which is a highly dynamic reversible process and its outcomes are extremely diverse, ranging from changes in localization to altered activity and, in some cases, stability of the modified, has shown to be especially valuable in cellular biology. Motivated by the importance of SUMO conjugation in biological processes, we report here on the first exploratory assess whether sumoylation related genetic variability impacts protein functions and the occurrence of diseases related to SUMO. Here, we defined the SUMOAMVR as sumoylation related amino acid variations that affect sumoylation sites or enzymes involved in the process of connectivity, and categorized four types of potential SUMOAMVR.And it can be anticipated our method can provide more instructive guidance to identify the mechanisms of genetic diseases
Reference: Hao-Dong Xu, JianDing. Qiu, et al. (2014). Systematic Analysis of the Genetic Variability That Impacts SUMO Conjugation and Their Involvement in Human Disease,
PredHydroxy (Visitors: 144)
Annotation of hydroxylation in proteomes is a first-critical step toward decoding protein function and understanding of their physiological roles that have been implicated in the pathological processes and providing useful information for the drug designs of various diseases related with hydroxylation. In this work, we present a web service, called PredHydroxy, which is constructed to predict the proline and lysine hydroxylation sites based on position weight amino acids composition, 8 high-quality amino acid indices and support vector machine. Window size -6 to +6 is employed to construct the prediction model. The web service of P
redHydroxy was implemented in .Net 4.0 framework. Users can submit one or multiple protein sequences containing minimum 13 amino acids in FASTA format to the system and select which kind of post-translational modifications (hydroxyproline or hydroxylysine) need to be predicted. The system efficiently returns the predictions, including protein name, the position of site, flanking amino acids and SVM probability. In addition, PredHydroxy supports continuous stringency adjustment to meet the various confidence requirements of users. To control the false-positive predictions, we suggest users pay more attention to the sites with stringency setting higher than 70%
Reference: Shaoping Shi, Xiang Chen, Haodong Xu, Jianding Qiu*. PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure