Protein phosphorylation catalysed by kinases plays crucial regulatory roles in intracellular signal transduction. With the increasing number of kinase-specific phosphorylation sites and disease-related phosphorylation substrates that have been identified, the desire to explore the regulatory relationship between protein kinases and disease-related phosphorylation substrates is motivated. In this work, we analysed the kinases’ characteristic of all disease-related phosphorylation substrates by using our developed Phosphorylation Set Enrichment Analysis (PSEA) method. We evaluated the efficiency of our method with independent test and concluded that our approach is helpful for identifying kinases responsible for phosphorylated substrates. In addition, we found that Mitogen-activated protein kinase (MAPK) and Glycogen synthase kinase (GSK) families are more associated with abnormal phosphorylation. It can be anticipated that our method might be helpful to identify the mechanism of phosphorylation and the relationship between kinase and phosphorylation related diseases.
This service can predict the kinase types and kinase-specific phosphorylation site for a protein by inputting the amino acid sequence in Fasta format. each query phosphorylation site (S/T/Y) in sequences can get a score. A higher score indicates a higher probability of the phosphorylation site by the selected kinase term. Furthermore, each P-value’s rank in the background P-values was generated. The background P-values were calculated from all specific phosphorylation sites (S/T/Y) on human proteins and then ranked from lowest to highest. Once a P-value is given, its rank in the background can be determined, which would be useful for users to evaluate predicted results. To control the false-positive predictions, we suggest users pay more attention to the lysine sites with P-values lower than the top 10%. Specific phosphorylation sites (S/T/Y) passing the suggested cut-off are highlighted by color in the table of prediction results on the web site. In our opinion, this cut-off should be loosened once interaction between kinase and query protein occurs. In applications, users can adjust the cut-off values according to the trade-off between discovering more putative acetylation sites and making fewer false-positive predictions.The prediction results will also be send to the provided e-mail address. A tutorial for input and output samples may be viewed in Help interface.
Gene Set Enrichment Analysis (GSEA) was developed and used on DNA microarray data to detect coordinated expression changes in a group of functionally related genes and then was applied to find the putative functions of the long non-coding RNAs [1-4]. Taking advantage of the idea of GSEA, we proposed a new method called PSEA (Phosphorylation Set Enrichment Analysis) to detect new sites phosphorylated by a specific kinase, kinase family and kinase group. For each term, we focused on finding sites which were similar in sequence with discovered ones. We treated the phosphorylation sites and their surrounding amino acids as phosphorylated peptides. Phosphorylated peptides from the above three levels of kinase hierarchical classification formed kinase specific peptides set. To determine whether a given peptide could be phosphorylated, we just needed to know whether the given peptide was similar to the phosphorylated peptides in that set. The PSEA method was developed to estimate this similarity and the significance of the similarity. For each query, we assigned a P-value according to its similarity with known phosphorylated ones. The P-values for the query peptides are between 0.001 and 1, with a minimum interval of 0.001. The smaller the P-value, the more significant will be the chance that the given peptides were phosphorylated by the chosed kinase type.
First, we validated the PSEA method by running the leave-one-out method and estimating the background P-value distribution.
Second, we tested the PSEA method on an independent set of other species, which showed a similar performance as the leave-one-out validation.
Third, We compared our method with other existing methods via independent test, and proved that the formula of our kinase-specific method could obtain superior performance.
Download all collected experimental phosphorylation sequence fragments.
Download all experimental phosphorylation sequence fragments which have kinase annotation information.
Download the independent set of phosphorylation sequence fragments for nonhuman eukaryotic species.
Download all collected disease-relatedphosphorylation data.
All comments, suggestions, questions, and bug reports are welcome. For inquiries, please send an e-mail to Sheng-Bao Suo, Department of Chemistry, Nanchang University via heapyssb@yahoo.com.cn.
We acknowledge with thanks the following software or web servers:
Blast2GO     DAVID     STRING     CD-HIT
[1]. M. Guttman et al., Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223 (Mar, 2009).
[2]. A. Subramanian et al., Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. P Natl Acad Sci Usa 102, 15545 (Oct, 2005).
[3]. V. K. Mootha et al., PGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267 (Jul, 2003).
[4]. T. T. Li et al., Characterization and Prediction of Lysine (K)-Acetyl- Transferase Specific Acetylation Sites. Mol. Cell. Proteomics 11, (Jan, 2012).
Copyright © 2013 Jian Ding Qiu's Lab. NanChang University.