Prediction of diabetic protein markers based on an ensemble method release_f26mpjmvr5hidcaczu5qqphgey

Published in Frontiers in Bioscience by Bioscience Research Institute Pte. Ltd..

2021   Volume 26, Issue 7, p207

Abstract

Introduction: A diabetic protein marker is a type of protein that is closely related to diabetes. This kind of protein plays an important role in the prevention and diagnosis of diabetes. Therefore, it is necessary to identify an effective method for predicting diabetic protein markers. In this study, we propose using ensemble methods to predict diabetic protein markers. Methodological issues: The ensemble method consists of two aspects. First, we combine a feature extraction method to obtain mixed features. Next, we classify the protein using ensemble classifiers. We use three feature extraction methods in the ensemble method, including composition and physicochemical features (abbreviated as 188D), adaptive skip gram features (abbreviated as 400D) and g-gap (abbreviated as 670D). There are six traditional classifiers in this study: decision tree, Naive Bayes, logistic regression, part, k-nearest neighbor, and kernel logistic regression. The ensemble classifiers are random forest and vote. First, we used feature extraction methods and traditional classifiers to classify protein sequences. Then, we compared the combined feature extraction methods with single methods. Next, we compared ensemble classifiers to traditional classifiers. Finally, we used ensemble classifiers and combined feature extraction methods to predict samples. Results: The results indicated that ensemble methods outperform single methods with respect to either ensemble classifiers or combined feature extraction methods. When the classifier is a random forest and the feature extraction method is 588D (combined 188D and 400D), the performance is best among all methods. The second best ensemble feature extraction method is 1285D (combining the three methods) with random forest. The best single feature extraction method is 188D, and the worst one is g-gap. Conclusion: According to the results, the ensemble method, either the combined feature extraction method or the ensemble classifier, was better than the single method. We anticipate that ensemble methods will be a useful tool for identifying diabetic protein markers in a cost-effective manner.
In text/plain format

Archived Files and Locations

application/pdf   3.6 MB
file_dc32mfwujvbljm4ey7eoy6lfeu
www.fbscience.com (web)
web.archive.org (webarchive)
application/pdf   3.6 MB
file_mvzgz74iwffw5aeh6anbigbdwe
api.imrpress.com (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2021-07-30
Language   en ?
DOI  10.52586/4935
PubMed  34340268
Container Metadata
Not in DOAJ
In Keepers Registry
ISSN-L:  1093-4715
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 2db8f74d-7aa9-4a50-9847-12705c65c8b7
API URL: JSON