It is critical for software vendors to establish continuous quality management to avoid cost explosions. However, before controlling the quality of software systems, we have to assess it first. In the case of maintainability, this often happens with manual expert reviews. The goal of this project is to establish an automated evaluation that is based on expert judgment.
In contrast to most other work using expert assessments, we investigate in-depth which aspects experts take into account during quality assessments. To limit the subjective nature of „quality“, we focus our research on the perceived maintainability of code and four subcategories thereof: Understandability, Comprehensibility, Adequate Size, and Perceived Complexity of code.
Currently, we are focusing on several research questions:
- Which inputs are suitable predictors for software maintainability?
- Which algorithms perform best to predict the experts‘ judgment?
- How effective and efficient is our approach in an industrial setting?
Before controlling the quality of software systems, we need to assess it. Current automatic approaches have received criticism because their results often do not reflect the opinion of experts or are biased towards a small group of experts. We use the judgments of a significantly larger expert group to create a robust maintainability dataset. In a large scale survey, 70 professionals assessed code from 9 open and closed source Java projects with a combined size of 1.4 million source lines of code. The assessment covers an overall judgment as well as an assessment of several subdimensions of maintainability. Among these subdimensions, we present evidence that understandability is valued the most by the experts. Our analysis also reveals that disagreement between evaluators occurs frequently. Significant dissent was detected in 17% of the cases. To overcome these differences, we present a method to determine a consensus, i.e. the most probable true label. The resulting dataset contains the consensus of the experts for more than 500 Java classes. This corpus can be used to learn precise and practical classifiers for software maintainability.
For more details about the creation of this dataset, please refer to: M. Schnappinger, A. Fietzke, and A. Pretschner, "Defining a Software Maintainability Dataset: Collecting, Aggregating and Analysing Expert Evaluations of Software Maintainability", International Conference on Software Maintenance and Evolution (ICSME), 2020
The dataset, i.e. code plus labels and instructions on how to use the data, is available from here
In research related to Software Quality, we are interested to create a dataset of source code and corresponding quality labels. Thus, we have built an online platform that allows to evaluate strategically chosen code snippets from various code bases. With just a small invest of your time, you can actively foster very important research. Curious how that looks like? https://coality.sse.in.tum.de/
- Defining a Software Maintainability Dataset: Collecting, Aggregating and Analysing Expert Evaluations of Software Maintainability. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2020 mehr…
- A Software Maintainability Dataset. 2020 mehr…
- Learning a Classifier for Prediction of Maintainability Based on Static Analysis Tools. 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), IEEE, 2019 mehr…
- Software quality assessment in practice. Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM '18, ACM Press, 2018 mehr…