A Decision Analysis Approach for Selecting Software Defect Prediction Method in the Early Phases
xmlui.mirage2.itemSummaryView.MetaDataShow full item record
Considering that software usage rates have increased, it is inevitable for end-users to prefer high-quality software products. Undoubtedly, one of the most important quality indicators of a software product is its defect rate. With the widespread use of methods and tools that support estimation tasks in software engineering, the interest in software defect prediction is increasing. Currently, most defect prediction models are built using the metrics from the coding phase. This situation leads to the inability to process the information belonging to the early stages of the software development life cycle such as requirements analysis or design, thus not being able to benefit from preventive actions such as cost reduction and effective resource planning in the early stages. Eventually, it becomes important for stakeholders to build the desired defect prediction model as early as possible and to use it throughout the software development life cycle. When the proliferation of methods of data science in software engineering is combined with the shortage of knowledge to use them in industry, an important need arises to guide practitioners in selecting the best-fit methods by considering their specific needs. This thesis presents research aimed at addressing the method selection problem in software defect prediction during the early phases of the life cycle by using a formal decision analysis process. A two-phase decision analysis approach was proposed that is structured using a decision tree and multi-criteria decision analysis (MCDA) methodologies. In doing so, an extensive literature review was conducted to obtain a general view of the characteristics and usefulness of Early Software Defect Prediction (ESDP) models reported in scientific literature. As a result, the most preferred prediction methods, metrics, datasets, and performance evaluation methods, as well as the addressed SDLC phases were highlighted. Accordingly, the alternatives to be evaluated in the decision analysis and the criteria that may have an impact on the decision of method selection were systematically determined. To strengthen the knowledge, two different expert opinion surveys were conducted. Besides, to manage the operation of the decision analysis process, a questionnaire is proposed to reveal stakeholder needs and dataset characteristics. After, several case studies were performed to investigate the trustworthiness of the proposed approach with selected SDP methods using public datasets. The most convenient methods proposed by the decision analysis are Naïve Bayes (NB), Decision Tree (DT), and Fuzzy Logic-based methods for the case studies. It is concluded that the results of the decision analysis are consistent with both the results of the empirical evidence of the experiments conducted in the thesis and the results reported in the scientific literature. Overall, the presented approach could be useful in helping software practitioners decide which SDP method is advantageous by revealing their specific requirements for the software projects and associated defect data. While the results of this thesis provide guidance for future research on the context of ESDP, further studies on different software projects are necessary in order to expand knowledge prior to having decisions that are more reliable.