Introduction
Next-generation sequencing (NGS) technologies are yielding ever higher volumes of genetic variation data. Given this large amount of data, it has become both a possibility and a priority to determine what the functional implication of genetic variations is. Considering the essential roles of acetylation in protein functions, it is highly likely that acetylation related genetic variations change protein functions. In this work, we performed a proteome-wide analysis of amino acid variations that could potentially influence protein lysine acetylation characteristics in human. Here, we defined the AcetylAAVs as acetylation related amino acid variations that affect acetylation sites or their interacting acetyltransferase, and categorized three types of AcetylAAVs. Using the developed prediction system, we detected that 50.87% of amino acid variations as potential AcetylAAVs and 12.32% of disease mutations could result in AcetylAAVs. More interestingly, from the statistical analysis, we find that the amino acid variations that directly create new potential lysine acetylation sites have more chance to cause the disease. Our method can be used to screen important polymorphisms and help to identify the mechanism of genetic diseases.
 
Tools
In Tools of sequence segmentation, the server can break the amino acid sequence into likely fragments with the window size of 25 by inputting the Swiss-Prot accession number or the protein sequence. This service can also search known lysine acetylation sites from two main protein databases of UniProtKB/Swiss-Prot and PhosphoSitePlus. Users can input the specific sequence fragments in Fasta format to Prediction interface to shorten the computing time. In section of Finding the substitutions from the SwissVar database, users can search known amino acid variations for their query protein from the SwissVar database. This amino acid variation information can be directly used in the Prediction interface. A tutorial for input and output samples may be viewed in Help interface.
 
Prediction
This service can predict the AcetylAAVs types and acetylation states for a protein by inputting the amino acid sequence in Fasta format. For each query protein, this web server uses the constructed lysine acetylation prediction model (KAcePred) to assign a score for each lysine site. A detailed description of KAcePred can be found in the Detail section. The prediction results will also be send to the provided e-mail address. Users can also use the local Matlab GUI program provided for the prediction of AcetylAAVs. This GUI can programmatically analyze the query proteins as the web server, rather than through a manual interaction. A tutorial for input and output samples may be viewed in Help interface.
 
Details
Lysine Acetylation Prediction (KAcePred)
Human lysine acetylated sites sequences were obtained from the  UniProtKB/Swiss-Prot , PhosphoSitePlus, CPLA, HPRD and SysPTM databases. After rejecting homologous sequences with a sensitive cutoff of 0.3, we collected 3932 acetylated sequence fragments from 2220 protein sequences. To get a better predictive performance and decrease the computational complexity, we combined the position specific scoring matrix profiles (PSSM) and the best nine physicochemical properties as an optimal sequence encoding scheme.
Acetylation related amino acid variations (AcetylAAVs)
The AcetylAAVs could be identified when the acetylation sites were altered between the original and variant sequence. Three types of AcetylAAVs were defined as follows: (i) the type I AcetylAAVs are the amino acid variations located on the acetylation positions that directly create (Type I (+)) or remove (Type I (-)) the acetylation sites; (ii) the type II AcetylAAVs are the amino acid variations not located on acetylation positions but on the adjacent positions that create (Type II (+)) or remove (Type II (-)) the acetylation sites; (iii) the type III AcetylAAVs are caused by changes in the types of KATs involved, rather than in the acetylation sites itself, regardless of the positions of the variations.
 
Download
Download the training set of experimental lysine acetylation sequence fragments.
Download the independent set of lysine acetylation sequence fragments identified by nano-HPLC/MS/MS analysis.
Download the variation data of human single amino acid polymorphisms, missense mutations and unclassified.
Download the AcetylAAVs programs and codes implemented by using Matlab programming language.
 
Feedback

All comments, suggestions, questions, and bug reports are welcome. For inquiries, please send an e-mail to Sheng-Bao Suo, Department of Chemistry, Nanchang University via heapyssb@yahoo.com.cn.

 
Acknowledgments
We acknowledge with thanks the following software or web servers:
PSI-BLAST      Motif-x     WebLogo     Two Sample Logo     CD-HIT
 
Citation
Shengbao Suo, Jianding Qiu*, Shaoping Shi, Xiang Chen, Shuyun Huang, Ruping Liang. Proteome-wide analysis of amino acid variations that influences protein lysine acetylation, Journal of Proteome Research, 2013, 12 (2): 949-958.
 
References
[1] Ryu GM, Song P, Kim KW, Oh KS, Park KJ, et al. (2009) Genome-wide analysis to predict protein sequence variations that change phosphorylation sites or their corresponding kinases. Nucleic Acids Res. 37, 1297-1307.
[2] Ren J, Jiang CH, Gao XJ, Liu ZX, Yuan ZN, et al. (2010) PhosSNP for Systematic Analysis of Genetic Polymorphisms That Influence Protein Phosphorylation. Mol. Cell. Proteomics 9, 623-634.
[3] Weinert BT, Wagner SA, Horn H, Henriksen P, Liu WSR, et al. (2011) Proteome-Wide Mapping of the Drosophila Acetylome Demonstrates a High Degree of Conservation of Lysine Acetylation. Sci. Signal. 4.