Authors:
(1) Raquel Blanco, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]);
(2) Manuel Trinidad, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(3) María José Suárez-Cabal, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]);
(4) Alejandro Calderón, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(5) Mercedes Ruiz, Software Process Improvement and Formal Methods Research Group, University of Cadiz, Department of Computer Science and Engineering, Cádiz, Spain ([email protected]);
(6) Javier Tuya, Software Engineering Research Group, University of Oviedo, Department of Computer Science, Gijón, Spain ([email protected]).
Editor's note: This is part 2 of 7 of a study detailing attempts by researchers to create effective tests using gamification. Read the rest below.
3.2 Participants
3.3 Materials
3.4 Procedure
3.5 Metrics
Software testing is a set of activities involved in software development conducted to determine whether a software product satisfies the specified requirements and fits the user needs, as well as to detect failures and defects (International Software Testing Qualification Board (ISTQB), 2020). One of these activities encompasses the design and implementation of sets of test cases, called test suites. A test case consists of the program input (test input) and the expected output that should be obtained. The execution of a test case against the program under test allows the tester to observe whether there is any deviation between the output obtained and the expected output. In that case, a failure is found. A failure is caused by the existence of defects in the program under test.
Software testing is crucial for evaluating and assuring the quality and for reducing the risk of failure when a product is released. The percentage of the software development budget allocated to software testing in 2020 was 22% (Capgemini, 2021). Despite that, the total cost of poor software quality continues to trend upward. As stated in (Krasner, 2021), from the total cost of poor software quality in the US in 2020 ($2.08 trillion), $1.56 trillion was due to software failures, which has grown 22% over the last 2 years. This cost could have been reduced with comprehensive testing.
Testing all the possible input and output combinations of a program is impractical, and often impossible (Myers et al., 2012). For this reason, it is essential to design and implement test suites that are effective in order to reduce the impact of software defects and failures and the cost of software testing. The effectiveness of a test suite is its ability to find defects: the more defects it is capable of finding, the more effective it is.
Despite the importance of software testing, it is frequently neglected in computer science education (Jesus et al., 2020), where the amount of time spent studying software testing is significantly less than that spent on other software development activities (Sherif et al., 2020; Vos et al., 2020; Zivkovic & Zivkovic, 2021). Dedicated courses on software testing are not very common (Silvis-Cividjian, 2021) and, even when the curricula include them, much more effort has to be made in order to provide the students with practical problems (Zivkovic & Zivkovic, 2021). Besides, engaging students is challenging for several reasons. On the one hand, testing is perceived as a destructive task (Myers et al., 2012), while software design and implementation are considered more creative activities. Thus, students are less interested in software testing (Deak et al., 2016; Garousi et al., 2020; Vos et al., 2020). Instead of perceiving the benefits of finding defects to improve quality, students do not feel satisfaction when the defects of their own programs are exposed, as they signal their code is not correct. Therefore, students are not motivated to find these defects (Garousi et al., 2020). On the other hand, software testing education tends to be more theoretical than practical, and very frequently students describe it as boring (Fraser et al., 2019; Garousi et al., 2020), especially when they find few to no defects. Moreover, the lack of real-life testing scenarios creates the sense of doing something repetitive and irrelevant (Isomöttönen & Lappalainen, 2012). In addition, if the programs to be tested contain a few meaningful defects, students tend to perceive testing as a tedious and unsatisfying task (Silvis-Cividjian, 2021). Overcoming these problems is critical to improving the students’ testing skills.
In order to engage software testing students, several learning environments and educational approaches have been devised, such as web-based tutorials (Elbaum et al., 2007), flipped classrooms (Elgrably & Oliveira, 2022), problem-based learning (Andrade et al., 2019), serious games (Valle et al., 2017), as well as gamification (Jesus et al., 2018), which is explored in this work.
Several definitions have been provided in the literature for the term gamification, such as "the use of game elements and game design techniques in non-game contexts" (Werbach & Hunter, 2012), “the phenomenon of creating gameful experiences” (Hamari et al., 2014), "the use of typical elements of games in contexts outside the game environment" (Deterding et al., 2011), or "the use of game elements in non-gaming context to boost engagement between humans and computers and resolve issues with high quality modern electronic applications” (Khaleel et al., 2016). They all agree that gamification is not the creation of a fully-fledged game, but consists of applying lessons from the game domain to increase commitment and motivation in non-game situations (Calderón et al., 2018).
In recent years, gamification has attracted the attention from both practitioners and researchers as a way to achieve a range of emotional, cognitive, and social benefits, and guide human behavior for inducing innovation, productivity, or engagement (Sardi et al., 2017) in different contexts, such as employee performance, customer engagement and social loyalty, and in a diversity of domains, including marketing, human resources, healthcare, education, environmental protection and wellbeing (Dichev & Dicheva, 2017).
Software engineering has also explored the strengths and weaknesses of applying gamification for the learning of the software engineering processes. During the last decade, several works have applied gamification in order to improve student’s engagement, performance and social skills, as well as to encourage the use of software engineering best practices (Alhammad & Moreno, 2018; Garcia et al., 2020). The results obtained are promising, but the research on gamification in software engineering education is still at an early stage. Thus, further research and more empirical studies are needed for analyzing whether gamification is a useful and effective technique in this context (Alhammad & Moreno, 2018; Kosa et al., 2016; Pedreira et al., 2015; Souza et al., 2018).
Previous works that applied gamification in software engineering education are mainly focused on the software construction and software engineering process areas of the SWEBOK guide (Bourque & Fairley, 2014) (Alhammad & Moreno, 2018; Pedreira et al., 2015; Souza et al., 2018). Software testing has also attracted the researchers’ interest and it is a promising area for applying gamification. Some works are focused on applying gamification to expose students of introductory computer science courses to software testing, such as (Bell et al., 2011; Sheth et al., 2015, 2013). Other works apply gamification to engage students in the learning of Agile test practices (Elgrably & Oliveira, 2018; Lorincz et., 2021), unit testing (Marabesi & Silveira, 2019), Graphical User Interface testing (Cacciotto et al., 2021; Garaccione et al., 2022), testing tools (Clarke et al., 2017; Fu & Clarke, 2016) or test incident reporting (Dal Sasso et al., 2017).
Most of the works use gamification to engage students in learning several testing techniques, such as the code review process (Dal Sasso et al., 2017; Khandelwal et al., 2017), exploratory testing (Costa & Oliveira, 2019, 2020; Lorincz et., al 2021), statement coverage (Clegg et al., 2017; Sherif et al., 2020), loop coverage (Clegg et al., 2017), control flow testing (Buckley & Clarke, 2018; Clarke et al., 2019, 2022, 2020), dataflow testing (Buckley & Clarke, 2018; Clarke et al., 2019, 2022, 2020; Clegg et al., 2017), equivalence partitioning (Buckley & Clarke, 2018; Clarke et al., 2019, 2022, 2020; Jesus et al., 2020), boundary value analysis (Buckley & Clarke, 2018; Clarke et al., 2019, 2022, 2020; Clegg et al., 2017; Jesus et al., 2020), state-based testing (Buckley & Clarke, 2018; Clarke et al., 2019, 2022, 2020) or mutation testing (Rojas & Fraser, 2016). Other works use gamification to motivate students to create test cases for findings bugs, such as (Fraser et al., 2019, 2020; Silvis-Cividjian, 2021), which are the closest works to ours. (Silvis-Cividjian, 2021) presents an approach that uses the platform VU-BugZoo, which contains embedded buggy code. Students have to design test strategies and create test cases to find the defects. (Fraser et al., 2019, 2020) use the game Code Defenders to engage students to test Java classes. Students play as attackers, who modify the source code to introduce artificial defects, or defenders, who implement test cases in Junit that reveal the existence of those defects.
Table 1 summarizes the aforementioned works and compares them with ours. First, works are classified according to whether they report a practical experience. When a practical experience is reported, its scope, duration and number of participants are indicated. Table 1 also indicates whether a comparison with a non-gamified experience (control group) is made, the number of participants in the control group and whether a statistical analysis is reported. In general, the extent of practical experiences reported is small: the number of participants is not very large, which makes it difficult to extrapolate the results, and/or the experiences are of limited time length or are applied to individual assignments, so it is difficult to analyze the long-term impact of gamification. On the other hand, these gamification experiences are not compared against non-gamified ones in most of the works. Besides, statistical analysis is not commonly reported. Our work shares the aims of engaging students and improving their performance with the foregoing works, but unlike them, we present a long gamification experience that lasted a whole academic semester (15 weeks), where 135 students were involved and the rewards were given at the end of the experience. In addition, we conducted a controlled experiment that qualitatively compares the gamification experience against a non-gamified one and we carried out a statistical analysis to test the hypotheses stated.
This paper is available on arxiv under CC BY 4.0 DEED license.