AI Codes, Humans Engineer

Engineers are now writing most of their code using AI. But despite what you read on Tech Twitter: reports of the death of the software engineering profession have been greatly exaggerated.

Whilst AI does the "coding", engineers will continue to do the "engineering". But first, the question on every engineer's mind...

Will AI replace software engineers?

Nobody truly knows where this is heading. I'd encourage skepticism of anybody who asserts they do; I'm wary of the incentives behind bullish predictions.

AI coding tools leapt forward in 2025, surprising many engineers, myself included. AI can now write pretty good code, but it cannot yet do all of "software engineering" to the standard of a top human engineer.

In the meantime, here's how to get the most out of AI coding tools today, given their current limitations.

Tasks where AI is superhuman for coding

Like many jobs, software engineering is comprised of a set of skills / tasks.

AI is already an incredible tool for some of these tasks:

Searching the web and your code base
Broad surface-level general knowledge
Using gh and other command line tools
Reproducing boilerplate patterns in an existing codebase
Reproducing code patterns from the LLM's training data
Fixing simpler issues when a feedback loop with clear success criteria can be established (the ralph loop)

Engineers should be outsourcing all of these tasks to AI if they want to maximize their output.

Tasks where AI currently has limitations

AI currently falls short of top human engineers at some skills:

Thinking through decisions and considering all the options
- AI can implicitly take decisions, which later turn out to be incorrect.
- AI often won't stop to confirm its decisions.
Distilling information to "what matters"
- AI tends to expand information rather than compress it - burying the signal in a wall of text.
Bringing broad or nuanced organizational context to a task
- e.g. "Alison in accounting will likely object to this plan."
Code reviews
- AI won't think outside of the box and identify incorrect architecture patterns outside of its training data.
- AI struggles to rescue a poor quality PR.
Breaking out of incorrect thinking patterns / assumptions

These limitations mean that unsupervised AI coding cannot produce pull requests of the same overall quality as a top engineer.

Throwing more AI at the problem does not work

The limitations mean that it's simply not possible to throw "more AI" at a pull request to bring the code up to a high standard. This is partially because AI is very susceptible to the broken windows effect. AI tends to read low quality code with incorrect assumptions into its context window, leading it to produce more poor quality code using similar patterns.

AI code reviews are a useful tool I use every day. But currently they're not capable of really stepping back and applying common sense. As a result the code often gets stuck in a sort of local maximum rather than the best solution.

Making a larger test suite does not scale

You can constrain AI coding with large end-to-end test suites to cover every edge case. This is particularly viable because AI makes it much easier to write tests.

However, the test suite ends up becoming an overfitted specification of what the software should do. The tests grow so large they become incomprehensible to humans.

The implementation code will "work", but doesn't model the underlying problem correctly. Small changes to the code lead to bug "whack-a-mole" as multiple bugs pop up. In contrast: Code which models the problem domain correctly often just works when you add a feature.

The best strategy: keep autonomy in the places where humans are better

The best engineers delegate to AI aggressively, but they're deliberate about what stays with them. As AI has progressed, the goalposts of what can be delegated have constantly been moving.

Even with the limitations, agentic coding is an amazing tool. But now that so much of the code is written by AI, how should teams be handling code review?

If AI writes the code, do we need two human reviewers?

Before AI, the best practice for code review was:

Human 1 writes the code
Human 2 reviews the code

With AI writing the code, the process becomes:

AI writes the code
Human 1 reviews the code and tweaks it.
Human 2 reviews the code again?

Do we really need a second code review?

I think two human reviewers is still better than one.

It turns out that the process of a human writing code actually had a lot of reasoning and self-review baked in. During the process of writing the code, the engineer runs the code, thinks about the code and explores the problem space. With hand-written code there's a "connection" between the engineer and the code. With AI-generated code, that connection is missing.

AI doesn't really reason in the same way as a human engineer. AI one-shots a plausible solution, this is blazing fast, and often high quality. But as a reviewer you need to be skeptical. AI agents haven't scrutinized code the way a human author would. It's human 1's job to apply this scrutiny in their review, keeping the AI honest, so that human 2 can perform a traditional review.

Another reason to use two human reviewers is that code review serves other purposes: knowledge sharing in a team, and upholding joint standards. And if AI writes all the code, reviewing it is one of the few ways junior engineers still learn. AI hasn't changed any of this.

Shouldn't we just skip review so we can 10x output?

I've seen people suggesting that to unlock the full speed and power of AI, you need to stop looking at the code and treat it like a compiled binary.

On some level this makes sense: Human review is now the bottleneck in the process, and to really realize the productivity gains, you need to remove it.

The question is: How long do you want to go fast for?

Unsupervised AI coding consistently produces tech debt - and we already know how that story ends.

If you want to go fast in the short term: a one-off script, an experiment, an MVP. AI is incredibly strong, since some tech debt is a good trade-off.

If you want to go sustainably fast: You need to manage tech debt using human code reviews.

Team collaboration in an AI coding world

In just a couple of years we've unlocked the super power to produce code 10x faster. It's perhaps not surprising that we need to reevaluate the social norms of how we work together in teams.

AI unlocks a lot of productivity, but it's not without downside risks. AI coding creates an asymmetry where an underperforming team member can drown reviewers with AI slop, and worse, it's hard to prove it is slop.

The fundamentals of craft, trust and accountability that have always been present in top software teams are just as important as ever.

Tips for submitting AI generated PRs for code review

You own the output

Don't trust AI fully, always review its output (the code!).
- You, and you alone, are responsible for its output. You need to own it.
Try to understand what AI can and can't do.
- Be careful what you outsource to AI.
- Important or high level things should keep a human in the loop.

Run the code and perform manual QA testing

It's easier than ever to generate code without actually running it. You should:

Run the code locally
Or test using preview deploys on pull requests
- <advert>We make a Postgres database that lets you branch for preview deploys.</advert>

In the end there's never a substitute for manual QA (which can be AI assisted).

Be transparent about AI usage

AI produces output that looks ~human, but may contain issues. This makes it hard for the reviewer to know what is human-generated and what is AI-generated.

Provide signal to reviewers of which output received human effort / review.
- You can do this by showing that you "took the care" (we can all spot AI slop!).
- Never present AI slop as your own work.
- If something is AI-generated and not properly human-reviewed, explicitly signal this is the case (sometimes this is fine/useful).
- To reduce the reviewer’s burden, you should human-review as much of the output as possible before submitting it.

Mark experiments clearly

If something is heavily vibe coded or is an experiment, explicitly communicate this is the case and what your expectations are.
- Example 1: "I vibed this just to see if it would work. Not expecting to merge it in this state".
- Example 2: PR Title: "[Experiment] New pricing page with calculator".
- Do this even if you heavily reviewed the content, but it looks like AI (more relevant with words than code).

Write PR titles and descriptions by hand

Title and description are the highest leverage part of the PR.
Take the time to write them by hand to show you took the care.
AI-generated descriptions tend to summarize "what changed", which the diff already shows.
A good description focuses on the motivation for the PR.
- What does the reviewer need to know that isn't in the code: the why, the trade-offs, anything non-obvious.

It's OK for reviewers to reject AI slop

Pre-AI, when a teammate raised a PR that needed work, I felt a responsibility to help them get it merged. Since creating a PR necessarily required human effort - spending human effort on the other side felt like a fair trade.

But in a world where someone can generate a PR by typing 3 sentences into Claude Code, the reviewer must have the right to send a PR back and ask for human effort to be applied.

The asymmetry of AI can move the work away from the submitter and to the reviewer. If we're not careful, senior engineers could end up spending all their time fielding AI slop, actually decreasing overall productivity.

AI is also bringing a wave of new coders into the PR process. Here's a companion guide for them: Code Review for Vibe Coders.

The fundamentals haven't changed

The best engineers will use AI wherever it outperforms them, and still hold the line on quality. Writing code feels unrecognizable from two years ago, but software engineering hasn't changed.

Principles like respecting the time of other engineers in the team, and taking care in the craft of your work remain timeless...at least for another 6-12 months, depending on who you listen to.

AI codes. Humans engineer