Exploring case-based reasoning for web hypermedia project cost estimation

Emilia Mendes, Nile Mosley, Steve Counsell
2005 International Journal of Web Engineering and Technology  
This paper compares several methods of analogy-based effort estimation, including the use of adaptation rules as a contributing factor to better estimation accuracy. Two data sets are used in the analysis. Results show that best predictions were obtained for the dataset that presented a continuous "cost" function and was more "unspoiled". More recently researchers have investigated the use of machine learning approaches to effort estimation [15] [39]. One of these approaches -estimation by
more » ... gy -has provided comparable accuracy to, or better than, algorithmic methods [26], [38] , [39] . Estimation by analogy is a form of analogical reasoning where cases stored on the case base and the target case are instances of the same category [15] . As such, an effort estimate for a target case is obtained by searching one or more similar cases, each representing information about finished software projects. Unfortunately, when comparing prediction accuracy between different cost estimation approaches, researchers have been unable to find a unique technique which would unanimously provide the best estimates across different industrial data sets [8], [9] , [20] , [21] , [29] , [30] . These conflicting results suggested that there were other factors that should be taken into consideration, other than the technique itself. This lead to a simulation study by Shepperd and Kadoda [38], showing that data set characteristics (number of variables, data distribution, existence of collinearity and outliers, type of relationship between effort and cost drivers) influenced the effectiveness of effort estimation techniques. For example, if the data set used to estimate effort for a new project is roughly normally distributed, then algorithmic models, such as Stepwise regression, are to be preferred. Conversely, if the data set presents outliers and collinearity, then techniques such as analogy-based estimation should be favored.
doi:10.1504/ijwet.2005.007467 fatcat:ka6xouk2n5agphb73khequ4t7m