Referee report. For: Detecting "protein words" through unsupervised word segmentation [version 1; referees: 1 approved with reservations, 1 not approved]

Judith Klein-Seetharaman
2016
Unsupervised word segmentation methods were applied to analyze protein sequences. Protein sequences, such as "MTMDKSELVQKA...," were used as input to these methods. Segmented protein word sequences, such as "MTM DKSE LVQKA," were then obtained. We compared the protein words derived via unsupervised segmentation and protein secondary structure segmentation. An interesting finding is that unsupervised word segmentation is more efficient than secondary structure segmentation in expressing
more » ... on. Our experiment also suggests the presence of several "protein ruins" in current non-coding regions.
doi:10.5256/f1000research.8005.r12977 fatcat:ikvqoiukzzfy5bne6zblxagyba