Find and Replace Code at AST-level with Semgrep
Recently, I found a tool call semgrep that lets you find and replace code at AST-level. This tool allows me to refactor legacy code without looking through all the code in the project.
In this article, I will give a simple example of how to use semgrep to search in a repository, without any previous knowledge of the repo. Before you read this article, you must know how to use Semgrep. If you don’t know Semgrep yet, check out the official tutorial (it’s a bit buggy when I used it). Make sure you fully complete the tutorial, or you won’t understand this article.
Update: For C, Coccinelle may work better. It is introduced to me by Richard Palethorpe.
Update: If you use VS Code, there is a “Semgrep” plugin that will provide Semgrep snippets when editing any YAML file. This has been helpful to me because I cannot remember the rule syntax at all.
Preparing the Repo
To search through some code, we first need to have some code. In this article, we use the source code of Janet.
git clone --depth 1 https://github.com/janet-lang/janet/
cd janet
Defining the Semgrep Rule
Inside the repo, create the file config.yml
with the following content.
rules:
- id: panic
patterns:
- pattern-inside: |
$RET_TY $FUNC(...) {...}
- pattern: janet_panic($...ARGS);
message: |
panic inside: $RET_TY $FUNC
fix: |
janet_panic("replaced");
severity: INFO
languages: [c]
What we are trying to do here, is to replace any janet_panic(...)
call inside a function with janet_panic("replaced")
.
pattern-inside
: the pattern should be inside a functionpattern
: the pattern to find and deletefix
: the pattern to insert/replace
Search
To search through the code with semgrep, we run
semgrep scan --config config.yml
You should now see a lot of results in your terminal like
src/core/string.c
panic
panic inside: void kmp_init
▶▶┆ Autofix ▶ janet_panic("replaced")
109┆ janet_panic("expected non-empty pattern");
⋮┆----------------------------------------
Replace
Since the results look good, we will now replace the search results with what we want. To do this, we run
semgrep scan --config config.yml --autofix
Here, you can use git diff
to verify if the files are changed correctly. Then, commit the changed files to git and we are done with the task!
Additional Comments
Before semgrep, I tried tree-sitter. The query language of tree-sitter does not support the equivalent of pattern-inside
yet, so I didn’t try it further.
Also, this task “can” be done with regex, but regex is far more flaky than searching through the code at AST level. My average experience regex is worse than my first experience with semgrep, and that says something.