A Classification of Software Modules into Library and Application Components in the Open-Source Field
International Journal of Software Engineering and Its Applications
Software reuse significantly reduce the costs of software production in the field of open-source software (OSS) development and lead to produce more reliable software systems. Software metrics have been proposed as indicators of software quality factors such as reusability. However, few empirical research papers have validated the relationship between components reusability and software metrics. This research aims to validate Chidamber and Kemerer (CK) metrics as predictors of software
... ty. In order to achieve this goal, an empirical study is conducted to validate metrics in classifying two groups of components: library (reuse-prone) and non-library (less reuseprone). A nearest neighbor's technique is used to classify library and application components using object-oriented software metrics. The approach is applied to a number of library and application systems available online. The conducted nearest neighbors models have produced acceptable classification. The results provide evidence of using metrics as surrogates of software reusability when models are evaluated using Fmeasure. CK metrics can be used to measure component reuse-proneness and can be used to differentiate between library and application components. A nearest neighbor's technique can be used to identify the reuse-prone components in open-source application. reusability and indirect measurements are usually used as surrogates of reusability. The indirect measurements include finding the properties of the components from different perspectives that can be relevant to the degree of reuse of a particular component. Boetticher and Eichmann assessed the reusability of software components in Ada using software evaluators to classify the components into either reusable or not reusable and then a neural network was trained to mimic the software evaluators by using metrics that capture coupling, complexity and adaptability  . Bieman and Zhao found that inheritance depth of classes in applications is notably different from libraries. The mean inheritance depth of libraries found three times greater than applications  . Barnard summarized the relationship between metrics and reusability as follows: components are more reusable when these components are well-encapsulated, cohesive, low coupled, with many children, less complex, and not data centric (lower number of instances and class variables)  . On the other hand, components that have large depth of inheritance and large size are less reusable. Bansiya and Davis considered large coupling reduces reusability, while high cohesion, large message passing among objects, and large interface size increase reusability  . Moser conducted an empirical study on a commercial system and used internal measures, cyclomatic complexity and object-oriented metrics, as surrogates for reusability  . Capiluppi and Boldyreff studied the internal and external reuse in four OSS systems  . Throughout studying coupling characteristics of components, the authors found stable components are more reusable. A component was considered reusable based on particular thresholds (Instability <= 0.2; Extensibility >= 10) and there was a large likelihood of getting false positives. Israel et al. analyzed the code reuse in Android mobile applications  . They found that almost 50% of the classes in the mobile apps under study inherit from a base class. The results showed that software reuse is prevailing in mobile applications when compared with regular opensource software. There are few studies that have empirically validated software metrics as indirect measures of software reusability. Barnard has defined a new reusability metric which can be used to measure the reusability of object-oriented code using a combination of many OO metrics  . The new metric was validated in controlled experiments and two developers were asked to develop reusable and non-reusable components. Some findings showed that highly reusable code have low coupling and high cohesion. Gupta et. al., reported on a case study that compares software maintenance of a reused Java class framework with two applications that are reusing the framework  . The framework was more stable, but the change profiles of all systems were similar. Ampatzoglou et. al., investigated the optimal selection of component reuse and considered classes, pattern, or package reuse  . The results showed that the reuse of design patterns were the optimal choice of reuse. Dallal and Morasca studied the use of internal measures, individually and in combination, to estimate class reuse-proneness using size, coupling, and cohesion metrics  . They measured reuse-proneness using two measures, the number of descendants and number of instantiations. The authors, using logistic regression, have shown that internal metrics can be significant predictors of the reuse-proneness of classes. There are many research papers on finding reusability index, which provide a linear relationship between reusability and software metrics [10,     . In these papers, researchers have searched for a reusability index by studying the relationship between a set of metrics and code reuse in empirical settings. The reusability has either a negative or positive relationship with individual metrics, but researchers have not found consensus on some metrics. For instance, in two different studies, the depth of inheritance hierarchy (known as DIT metric) has negative relationship with reusability  (and positive in another work  . The reusability index is estimated based on empirical work without theoretical validation, which makes the calculated index dependent on the projects under investigation. The metrics may affect reusability in different directions in different projects, which makes the use of these metrics in index calculation inconsistent and leads to different interpretations.