Jonathan Ishii, Paul Salvador Inventado, Anand Panangadan, Kenneth Kung
In music there are a set of rules a melody must follow in order to sound pleasant to the listener. In machine learning artificial neural networks are the king at learning pattern. The purpose is to explore what type of artificial neural networks and data representations best fits for generating melodies. Two different datasets were used to analyze the architectures. The Nottingham dataset which is comprised of 1034 different folk tunes The other dataset is one self-curated from Gulf Coast Music
... consisting of 206 Country songs. The neural network architectures used are Deep Convolutional Generative Adversarial Networks (DCGAN), Long Short Term Memory (LSTM), and Long Short Term Memory Generative Adversarial Networks (LSTM GAN). The different data representations used are one hot encode pitch and the different available note durations together, binary encoding pitch with note durations, and binary encoding instruments with pitches and note durations. The primary outcome is to determine if architectures with different data representations and datasets could generate melodies that sound pleasant. Different network sizes, including changing how many different layers and units in each layer were tested. A variety of activation and loss functions, optimizers, epochs and many other hyper parameters were tested to ensure that the models generated harmonious melodies. Upon completing analysis of all the architectures, there was a high rate of successful LSTM models using a variety of different data representations and datasets. The most common reasons for success were the strength of LSTMs to learn time series patterns in data. The data must also be represented in a way that acknowledges all the categories to train on without losing too much information. The dataset must also be consistent throughout to reduce removing correlations between instruments.