中国科学院大连化学物理研究所机构知识库
Advanced  
DICP OpenIR  > 中国科学院大连化学物理研究所  > 期刊论文
学科主题: 物理化学
题名: A novel model to predict O-glycosylation sites using a highly unbalanced dataset
作者: Zhou, Kun1, 2;  Ai, Chunzhi1;  Dong, Peipei3;  Fan, Xuran1;  Yang, Ling1
通讯作者: 杨凌
关键词: Protein glycosylation prediction ;  Amino acid index ;  Feature selection ;  PP-LDA
刊名: GLYCOCONJUGATE JOURNAL
发表日期: 2012-10-01
DOI: 10.1007/s10719-012-9434-x
卷: 29, 期:7, 页:551-564
收录类别: SCI
文章类型: Article
部门归属: 18
项目归属: 1806
产权排名: 1,1
WOS标题词: Science & Technology ;  Life Sciences & Biomedicine
类目[WOS]: Biochemistry & Molecular Biology
研究领域[WOS]: Biochemistry & Molecular Biology
英文摘要: In silico approaches have become an alternative method to study O-glycosylation. In this paper, we developed a linear interpretable model for O-glycosylation prediction based on an unbalanced dataset, analyzing the underlying biological knowledge of glycosylation. A training set of 4446 sites involving 468 positive sites and 3978 negative sites was developed during this research. The sites were encoded using the amino acid index (AAindex), and the forward stepwise procedure utilized for feature selection. The linear discriminant analysis with an equal a priori probability (PP-LDA) was employed to develop the interpretable model. Performance of the model was verified using both the internal leave-one-out cross-validation and external validation methods. Two non-linear algorithms, the supervised support vector machine and the unsupervised self-organizing competitive neural network, were used as comparisons. The PP-LDA model exhibited improved classification results with accuracy of 82.1 % for cross-validations and 80.3 % for external prediction. Further analysis of this linear model indicated that the properties at position R-1 and the properties relative to hydrophobicity contributed more to the glycosylation prediction. However, the alpha and turn propensities at the C-terminal, together with physicochemical properties at the N-terminal, are also relative to the glycosylation activity. This model is not only capable of predicting the possibility of glycosylation using an unbalanced dataset, but is also helpful to understand the underlying biological mechanisms of glycosylation. Considering the publicly accessibility of our prediction model, a downloadable program is provided in our supply materials.
关键词[WOS]: POLYPEPTIDE N-ACETYLGALACTOSAMINYLTRANSFERASE ;  AMINO-ACID-SEQUENCE ;  MAMMALIAN PROTEINS ;  GALNAC-TRANSFERASE ;  POSTTRANSLATIONAL MODIFICATIONS ;  NEURAL-NETWORK ;  UDP-GALNAC ;  IN-VITRO ;  SPECIFICITY ;  SELECTION
语种: 英语
WOS记录号: WOS:000308356000009
Citation statistics: 
内容类型: 期刊论文
URI标识: http://cas-ir.dicp.ac.cn/handle/321008/118136
Appears in Collections:中国科学院大连化学物理研究所_期刊论文

Files in This Item: Download All
File Name/ File Size Content Type Version Access License
2012inkxNLf5Zi.PDF(482KB)----开放获取--View Download

作者单位: 1.Chinese Acad Sci, Dalian Inst Chem Phys, Lab Pharmaceut Resource Discovery, Dalian 116023, Peoples R China
2.Chinese Acad Sci, Grad Sch, Beijing 100049, Peoples R China
3.Western Med Dalian Med Univ, Res Inst Integrated Tradit, Dalian 116044, Peoples R China

Recommended Citation:
Zhou, Kun,Ai, Chunzhi,Dong, Peipei,et al. A novel model to predict O-glycosylation sites using a highly unbalanced dataset[J]. GLYCOCONJUGATE JOURNAL,2012,29(7):551-564.
Service
 Recommend this item
 Sava as my favorate item
 Show this item's statistics
 Export Endnote File
Google Scholar
 Similar articles in Google Scholar
 [Zhou, Kun]'s Articles
 [Ai, Chunzhi]'s Articles
 [Dong, Peipei]'s Articles
CSDL cross search
 Similar articles in CSDL Cross Search
 [Zhou, Kun]‘s Articles
 [Ai, Chunzhi]‘s Articles
 [Dong, Peipei]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
  Add to CiteULike  Add to Connotea  Add to Del.icio.us  Add to Digg  Add to Reddit 
文件名: 2012inkxNLf5Zi.PDF
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Powered by CSpace