Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches


1 Department of Computer Engineering, Faculty of Electrical, IT and Computer Science, Qazvin Branch, Islamic Azad University, Qazvin, Iran

2 Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran


DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functional responsibilities. Consequently protein function prediction is a momentous task in bioinformatics. Protein function can be elucidated from its structure. Protein secondary structure prediction has attracted great attention since it’s the input feature of many bioinformatics problems. The variety of proposed computational methods for protein secondary structure prediction is very extensive. Nevertheless they couldn’t achieve much due to the existing obstacles such as abstruse protein data patterns, noise, class imbalance and high dimensionality of encoding schemes of amino acid sequences. With the advent of machine learning and later ensemble approaches, a considerable elevation was made. In order to reach a meaningful conclusion about the strength, bottlenecks and limitations of what have been done in this research area, a review of the literature will be of great benefit. Such review is advantageous not only to wrap what has been accomplished by far but also to cast light for the future decisions about the potential and unseen solutions to this area. Consequently in this paper it’s aimed to review different computational approaches for protein secondary structure prediction with the focus on machine learning methods, addressing different parts of the problem’s area.