I often get requests from teachers who want to automatically assess the structure of their students’ code using CodeGrade, for assignments in courses ranging from Introduction to Programming to Data Structures or other more advanced courses. Especially for larger, more free-form, assignments, automatically grading structure becomes difficult due to the large number of different possible solutions. However, almost like how you would use a linter (a tool to assess code style based on rules in a style guide), we can automatically detect code structure and add or deduct points for desired structures or bad practices in student code effectively with CodeGrade.
In this blog post, I will explain how you can very easily set up automatic structure testing for your CodeGrade assignments using a very handy tool called semgrep and give you some concrete examples you can use in your own CodeGrade assignments!
Real world scenarios
To better understand when you may want to automatically grade code structure and which problems we are solving in this blog, I will go over two real requests I got from computer science instructors:
- For an Introduction to Programming in Python course, an instructor wants to effectively teach students the different types of loops. She has multiple assignments in the course, focusing on different loops. To force students to use a while-loop for one assignment and to use a for-loop for another one, she wants to automatically detect these structures in the code and deduct points if a loop is missing and if the wrong type of loop was used.
- For numerous programming courses, instructors want to enforce good coding practices that are not caught by traditional linters. For this, they want to deduct points for common bad “spaghetti code” practices. In our example, we will automatically deduct points for Java code with too many if-statements.
Of course, there are endless possibilities with the tools I explain in this blog and you are encouraged to translate the examples to fit your own assignments or programming languages.
Semgrep
Traditional linters, like pylint for Python or eslint for JavaScript, are easily used in CodeGrade and great for general, broad language standards, but not for specific code structure checks. Semgrep is a tool that can do static code analysis on the structure of code, based on very simple patterns you provide it. Originally designed to find security vulnerabilities in code, Semgrep is an open-source tool by the software security company r2c (originally developed at Facebook) that supports many programming languages like Go, Java, JavaScript, Python and Ruby, with TypeScript, PHP and C currently being beta-tested. Semgrep can also be used for Jupyter Notebooks, after converting the notebook to python code. Learn how to do that in our blog on grading Jupyter Notebooks.
Semgrep makes it surprisingly easy to perform more complex code analysis by allowing you to write rules in a human readable format. You can provide generic or language specific patterns, which are then found in the code. With its pattern syntax, you can find:
- Equivalences: Matching code that means the same thing even though it looks different.
- Wildcards / ellipsis (...): Matching any statement, expression or variable.
- Metavariables ($X): Matching unknown expressions that you do not yet know what they will exactly look like, but want to be the same variable in multiple parts of your pattern.
To make semgrep effective for educational purposes, we have created a wrapper script around semgrep in CodeGrade. This wrapper script makes it work beautifully in our Unit Test step, so that each individual rule you define will show up as one specific Unit Test that can either pass or fail. What’s more, we have added the necessary feature that allows for “positive matches”: when we define a pattern and do expect to find (e.g. if we enforce users to use a for-loop). By default, all patterns and found matches in semgrep are considered errors. This wrapper script called `cg-semgrep` is automatically installed on AutoTest.