Constructing an Algorithm for Selecting the Number of Histogram Bins in Statistical Hypothesis Testing for Normal Distribution of Sample Data

Ivelina Zlateva, Nikola Nikolov, Mariela Alexandrova, Violin Raykov
2018 Zenodo  
Abstract— Practice, on the whole, makes extensive use of the vast range of assumptions and conjectures in regards to the type of frequency distribution in statistical samples, the deviations from which would significantly affect the qualities of the model and the estimation accuracy of its parameters. Regrettably, a reliable and clearly defined criterion as to their permissibility is completely absent. For instance the fish stock assessment procedure is initially based on assumption that the
more » ... quencies in the length-frequency samples used for estimation of growth parameters of fish and analysis of the stock status are normally distributed or follow approximately the normal distribution [15,17]. The purpose of the present study is to construct an algorithm for identification of the statistical distribution of a random variable focusing on the proper selection of the number of histogram bins and further assessment of its impact on the stochastic models delivered. To that effect, appropriate simulation studies have been carried out to compensate for the lack of any concrete evidence related to the potential impact of the number of bins in the histogram and the overall data accuracy on the results of the application of the statistical criterion for the verification of the law of distribution. Applied has been the direct statistical method for determining the law of the distribution - chi-square criteria along with some indirect methods. Provided for the simulation studies were machine-generated data sets and the relevant simulations were held in MATLAB programming environment.
doi:10.5281/zenodo.1745060 fatcat:u2nlpalccnadvcqc3m3ffdjgl4