Towards Activation Function Search for Long Short-Term Model Network: A Differential Evolution based Approach

K. Vijayaprabakaran, K. Sathiyamurthy
2020 Journal of King Saud University: Computer and Information Sciences  
In Deep Neural Networks (DNNs), several architectures had been proposed for the various complex tasks such as Machine Translation, Natural Language processing and time series forecasting. Long-Short Term Model (LSTM), a deep neural network became the popular architecture for solving sequential and time series problems and achieved markable results. On building the LSTM model, many hyper-parameters like activation function, loss function, and optimizer need to be set in advance. These
more » ... ters play a significant role in the performance of the DNNs. This work concentrates on finding a novel activation function that can replace the existing activation function such as sigmoid and tanh in the LSTM. The Differential Evolution Algorithm (DEA) based search methodology is proposed in our work to discover the novel activation function for the LSTM network. Our proposed methodology finds an optimal activation function that outperforms than the traditional activation functions like sigmoid (r), hyperbolic tangent (tanh) and Rectified Linear Unit (ReLU). In this work, the newly explored activation function based on DEA methodology is sinh x ð Þ þ sinh À1 x ð Þ named as Combined Hyperbolic Sine (comb-H-sine) function. The proposed comb-H-sine activation function outperforms the traditional functions in LSTM with accuracy of 98.83%,93.49% and 78.38% with MNIST, IMDB and UCI HAR datasets respectively. Ó 2020 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
doi:10.1016/j.jksuci.2020.04.015 fatcat:ovik7d5h2va7xeexuizsdtb4cq