All computer scientists will have to learn how to write effective unit tests at some point in their (academic) career. Almost all computer science degrees that we have come across so far teach their students how to do this. Sometimes this is already done in an introduction to programming in Java course, to lay a good foundation, but often we see this taking place in more advanced Java Software Development courses.
For this guide, I have researched the best way to autograde student JUnit unit tests for Java assignments and will explain step by step how you can use code coverage to automatically assess JUnit tests in CodeGrade. This guide will explain testing JUnit tests, but the theory and principles will also be very useful for achieving this for different programming languages and unit testing frameworks. We have covered how to do this in Python in another guide, click here to read that.
Unit Test assessment metrics
There are multiple ways in which we can effectively assess unit tests, of which code coverage and mutation testing are the two most common ones. Both serve different purposes and differ in setup complexity. For this guide, I will stick to Code Coverage testing which should be sufficient for most educational purposes, but I do want to briefly mention how effective (albeit harder to set up) mutation testing can be.
- Code Coverage is the most common metric to assess unit test quality. It very simply measures what percentage of source code is covered by the unit tests that you have written. Different metrics can be used to calculate this, for instance the percentage of subroutines or the percentage of instructions of the code that the tests cover.
- Mutation Testing is a more advanced metric to assess unit test quality, that goes beyond simple code coverage. During mutation testing, a tool makes minor changes to your code (known as mutations), that should break your code. It then checks whether a mutation makes your unit tests fail (this is what we want if our unit test is correct and complete) or not (meaning that our unit test was not correct or incomplete).
For most educational purposes and introductory Java assignments, using code coverage is sufficient for testing the students’ JUnit tests. As the main learning goal is to teach students the good practice of writing unit tests that cover each line or function of their code. Using CodeGrade’s continuous feedback we can very effectively motivate our students with instant feedback to go for a 100% code coverage score. However, for more advanced software engineering courses, you may want to consider using Mutation Testing, which not only measures the number of lines we cover, but also how well these lines are actually covered by our tests. This metric can be somewhat off putting for beginning students, but very useful in more advanced courses.