This Article IsCreated at 2023-10-31Last Modified at 2024-01-24Referenced as ia.www.b21

Find and Replace Code at AST-level with Semgrep

Recently, I found a tool call semgrep that lets you find and replace code at AST-level. This tool allows me to refactor legacy code without looking through all the code in the project.

In this article, I will give a simple example of how to use semgrep to search in a repository, without any previous knowledge of the repo. Before you read this article, you must know how to use Semgrep. If you don’t know Semgrep yet, check out the official tutorial (it’s a bit buggy when I used it). Make sure you fully complete the tutorial, or you won’t understand this article.

Update: For C, Coccinelle may work better. It is introduced to me by Richard Palethorpe.

Update: If you use VS Code, there is a “Semgrep” plugin that will provide Semgrep snippets when editing any YAML file. This has been helpful to me because I cannot remember the rule syntax at all.


Preparing the Repo

To search through some code, we first need to have some code. In this article, we use the source code of Janet.

git clone --depth 1 https://github.com/janet-lang/janet/
cd janet

Defining the Semgrep Rule

Inside the repo, create the file config.yml with the following content.

rules:
- id: panic
  patterns:
  - pattern-inside: |
      $RET_TY $FUNC(...) {...}
  - pattern: janet_panic($...ARGS);
  message: |
    panic inside: $RET_TY $FUNC
  fix: |
    janet_panic("replaced");
  severity: INFO
  languages: [c]

What we are trying to do here, is to replace any janet_panic(...) call inside a function with janet_panic("replaced").

Search

To search through the code with semgrep, we run

semgrep scan --config config.yml

You should now see a lot of results in your terminal like

    src/core/string.c 
       panic               
          panic inside: void kmp_init
                                     
           ▶▶┆ Autofix ▶ janet_panic("replaced")
          109┆ janet_panic("expected non-empty pattern");
            ⋮┆----------------------------------------

Replace

Since the results look good, we will now replace the search results with what we want. To do this, we run

semgrep scan --config config.yml --autofix

Here, you can use git diff to verify if the files are changed correctly. Then, commit the changed files to git and we are done with the task!

Additional Comments

Before semgrep, I tried tree-sitter. The query language of tree-sitter does not support the equivalent of pattern-inside yet, so I didn’t try it further.

Also, this task “can” be done with regex, but regex is far more flaky than searching through the code at AST level. My average experience regex is worse than my first experience with semgrep, and that says something.