Auto-Reviewing Claude's Code

Auto-Reviewing Claude’s Code

A well-crafted system prompt will increase the quality of code produced by your coding assistant. It does make a difference. If you provide guidelines in your system prompt for writing code and tests, coding assistants will follow the guidelines.

Although that depends on your definition of “will follow”. If your definition is “will follow often” then it’s accurate. If your definition is “will follow always” or even “will follow most of the time” then it’s inaccurate (unless you’ve found a way to make them reliable that I haven’t, please let me know).

Coding agents will ignore instructions in the system prompt on a regular basis. As the context window fills up and starts to intoxicate them, all bets are off.

Even with the latest Opus 4.5 model, I haven’t noticed a major improvement. So if we can’t rely on models to follow system prompts, we need to invest in feedback cycles.

I’ll show you how I’m using Claude Code hooks to implement automatic code review on all AI-generated code, so that code quality is higher before it reaches the human in the loop.

A code example that demonstratest the concepts discussed in this post is on my github: https://github.com/NTCoding/claude-skillz/tree/main/automatic-code-review

All my posts are 100% human written — all mistakes (and em dashes) are my own

Auto code review for fast, semantic feedback

When I talk about auto code review in this post, I am describing a fast feedback mechanism intended to review common code quality issues. This will be run whenever Claude has finished making edits so it needs to be fast and efficient.

I also use coding assistants for detailed code reviews when reviewing a PR, for example. That will spin up multiple subagents and take a bit longer. That’s not what I’m talking about here.

The purpose of the auto code review is to reinforce what’s in your system prompt, project documentation, and on-demand skills. Things that Claude may have ignored. Part of a multi-pronged approach.

Wherever possible, I recommend using your lint and test rules to bake in quality, and leave auto code review for more semantic issues that tools can’t check.

If you want to set a maximum length for your files or maximum level of indentation, then uses your lint tool. If you want to enforce a minimum test coverage, use your test framework.

Semantic code review

A semantic code review looks at how well the code is designed. For example, naming. Does the code accurately describe the business concepts it represents?

AI will often default to names like “helper” and “utils”. But AI is also good at understanding the nuance and finding better names if you challenge it, and it can do this quickly. So this is a good example of a semantic rule.

You can ban certain words like “helper” and “utils” with lint tools (I recommend doing that). But that won’t catch everything.

Another example is logic leaking out of the domain model. When a use case / application service queries an entity and then makes a decision, it’s highly like your domain logic is leaking into the application layer. Not so easy to catch with lint tools, but worth addressing.

class RandomUseCase {
...

const customer = repository.load(customerId)

// query the domain then make a decision based on it's state.
// common anti-pattern. The customer entity should protect its invariants
if (customer.isActive()) {
customer.awardLoyalty(...)
}
...
}

Another example is default fallback values. When Claude has an undefined value where a value is expected, it will set a default value. It seems to hate throwing exceptions or challenging the type signature and asking “should we allow undefined here?”. It wants to make the code run no matter what and no matter how much the system prompt tells it not to.


const paymentType = command.paymentType ?? 'bank-transfer' // Bad - we shouldn't guess or assume. But AI will do this a lot

const age = request.age ?? 18 // probably a bad default value

const effectiveDate = event.effectiveDate ?? new Date() // This is probably ok

const isEnabled = feature.enabled ?? false // This could be ok

You can catch some of this with lint rules but it’s very nuanced and depends on the context. Sometimes falling back to a default value is correct.

Building an auto code review with Claude hooks

If you’re using Claude Code and want to build an auto code review for checks that you can’t easily define with lint or testing tools, then a solution is to configure a script that runs on the Stophook.

The Stophook is when Claude has finished working and passes control back to the user to make a decision. So here, you can trigger a subagent to perform the review on the modified files.

To trigger the subagent you need to return the error status code which blocks the main agent and forces them to read the output.

⏺ Ran 1 stop hook
⎿  Stop hook error: 📋 CODE REVIEW REQUIRED

Files modified since last review:
- /.../src/features/flows/components/CodeLinkMenu/C
odeLinkMenu.tsx

INSTRUCTION: Use the Task tool with subagent_type "automatic-code-reviewer". Pass only the
file list as the prompt. The agent will follow its configured review procedure. SHOW ALL
FINDINGS TO THE USER.


⏺ automatic-code-review:automatic-code-reviewer(Review modified code)
⎿  Done (19 tool uses · 18.0k tokens · 52s)

⏺ Code Review Findings

The automatic code reviewer found 3 violations related to domain modeling:

Violations Found

1. [DOMAIN MODELING] - CodeLinkMenu.tsx:6
- Issue: filePath is a bare string primitive representing a domain concept. Should use a
branded type like FilePath to enforce type safety.
- Fix: Create a branded type type FilePath = string & { readonly __brand: 'FilePath' } and
update the parameter to filePath: FilePath

I think it’s generally considered a best practice to use subagent focused on the review with a very critical mindset. Asking the main agent to mark it’s own homework is obviously not a good approach and it will use up context window.

The solution I use is available on github. You can install it as a plugin in your repo and customize the code review instructions or just use it as inspiration for your own solution. Any feedback is welcome, btw

In the example above you can see it took 52s. Probably quicker than me reviewing providing the feedback myself. But that’s not always the case, sometimes it can take a few minutes.

If you’re sitting there blocked waiting for review, this might be slower than doing it yourself. But if you’re not blocked and are working on something else (or watching TV), this saves you time because the end result will be higher quality and require less of your time to review and fix.

Scanning for updated files

I want my auto code review to only review files that have been modified since the last pull request. But Claude doesn’t provide this information the context to the Stop hook.

I can find all files modified or unstaged using git, but that’s not good enough.

What I do instead is to hook into PostToolUse by keeping a log of each modified file.

"PostToolUse": [
{
"matcher": "Write|Edit|MultiEdit",
"hooks": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/tools/automatic-code-review-plugin.sh log"
}
]
}

When the Stop hook is triggered, the review will find the files modified since the last review and ask the subagent to review only those. If there are no modified files, the code review is not activated.

Challenges with the stop hook

Unfortunately the stop hook is not 100% reliable for this use case for a few reasons. Firstly, Claude might stop to ask a question, e.g. for you to clarify some requirements. You might not want the auto review to trigger here until you’ve answered Claude and it has finished..

The second reason is that Claude can commit changes before the Stop hook. So by the time the subagent performs the review, the changes are already committed to git.

That might not be a problem and there are simple ways to solve it if it is. It’s just extra things to keep in mind and setup.

The ideal solution would be for Anthropic (or other tool vendors) to provide us hooks that are higher level in abstraction — more aligned with the software development worklow and not just low level file modification operations.

What I would really love is a CodeReadyForReview hook which provides all the files that Claude has modified — then we can throw away our custom solutions.

Let me know if you have a better approach

I don’t know if I’m not looking in the right places or if the information isn’t out there, but I feel like this solution is solving a problem that should already be solved.

I’d be really grateful if you can share any advice that helps to bake in code quality before the human in the loop has to review it.

Until then I’ll continue to use this auto code review solution. When you’re giving AI some autonomy to implement tasks and reviewing what it produces, this is a useful pattern that can save you time and reduce frustration from having to repeat the same feedback to AI.

Auto-Reviewing Claude's Code was originally published in Nick Tune’s weird ideas on Medium, where people are continuing the conversation by highlighting and responding to this story.

Generated by RSStT. The copyright belongs to the original author.

Source

Auto-Reviewing Claude's Code

Report Page