Back in 2019, I wrote the original version of this article. The arguments against unit tests were familiar: they don’t hit real dependencies, 100% coverage means nothing, passing tests don’t guarantee working software. All fair points. But the world has changed drastically since then.
AI coding assistants now generate thousands of lines of code per day. Engineers ship faster than ever. And the question “do we need unit tests?” has taken on an entirely different dimension — because the entity writing your code isn’t always human anymore.
My answer, seven years later, is more emphatic than before: yes, you need unit tests. And in the AI era, you need them more than ever.
The Classic Arguments (Still Standing)
Let me briefly revisit the original case, because the fundamentals haven’t changed.
Consider a system composed of three simple operations:
S(a, b) => mul(sum(a, b), sub(a, b))
where:
sum(a, b) => a + b
sub(a, b) => a - b
mul(a, b) => a * b
High-level tests check the final output:
S(0, 0) == 0 // (0 + 0) * (0 - 0) = 0
S(0, 1) == -1 // (0 + 1) * (0 - 1) = -1
S(1, 1) == 0 // (1 + 1) * (1 - 1) = 0
Now introduce a bug — change sum(a, b) from a + b to a + 1. Every system test still passes:
S(0, 0) == 0 // (0 + 1) * (0 - 0) = 0
S(0, 1) == -1 // (0 + 1) * (0 - 1) = -1
S(1, 1) == 0 // (1 + 1) * (1 - 1) = 0
The bug is invisible at the system level because the test inputs happen to mask it. A unit test for sum would catch it instantly. This principle scales: the more units you have, the more likely a high-level test will miss a localized defect. Real systems have thousands of units. A 1:1000 ratio between system tests and code paths is generous.
This was true in 2019. It’s true in 2026. The math doesn’t care about technology trends.
What Changed: AI Writes Code, Humans Verify It
Here’s what’s fundamentally different now. In 2019, humans wrote both the code and the tests. The common complaint was that writing unit tests doubled the workload. ”I could be shipping features instead of writing test cases for obvious logic.”
That argument is dead.
AI coding assistants — Claude Code, GitHub Copilot, Cursor — generate code at a pace no human can match. They also generate unit tests. The cost of writing tests has dropped to near zero. What used to take an afternoon now takes minutes. You describe the behavior you want, and the AI scaffolds comprehensive test suites that cover happy paths, edge cases, and error conditions.
But here’s the twist that most people miss: the lower cost of generating code makes verification more important, not less.
When a human writes a function, they hold the full context in their head — the business requirement, the edge cases they considered, the trade-offs they made. When an AI generates a function, it’s making statistical predictions about what code should look like given its training data. It’s usually right. But “usually” isn’t a word you want in production.
I’ve seen AI-generated code that looked perfectly clean, passed a code review, and hid a subtle off-by-one error deep in a loop condition. I’ve seen it produce correct logic for 99% of inputs and silently corrupt data for a specific combination of null and empty string. These aren’t hypothetical scenarios — they’re Tuesday.
Unit tests are the verification layer for AI-generated code. They’re the contract that says: regardless of who or what wrote this function, here’s what it must do. Without them, you’re trusting a probabilistic system to be deterministic. That’s not engineering — that’s gambling.
The “Tests Are Documentation” Argument Got Stronger
In 2019, I argued that unit tests serve as documentation. That was true but somewhat academic — most teams had developers who knew the codebase intimately enough that reading tests for understanding was optional.
In 2026, the documentation argument is critical. Here’s why.
AI-assisted codebases grow faster than any team’s ability to hold them in their heads. Engineers context-switch between multiple services, often inheriting code that an AI generated in a previous sprint — or that a colleague generated with an AI and didn’t fully review. The codebase is increasingly nobody’s code.
When you pick up a function you’ve never seen and need to understand what it does, what do you reach for? The implementation might be 50 lines of dense logic. The commit message says “Add processing pipeline.” The PR description is AI-generated boilerplate.
But the tests? The tests say:
it('returns empty array when input is null', ...)
it('excludes items with negative quantity', ...)
it('applies discount before tax calculation', ...)
it('throws when discount exceeds item price', ...)
That’s a specification. It tells you exactly what the author intended — edge cases included. In a world where you can’t always ask the original author because the original author was an LLM, tests are the closest thing you have to a source of truth.
When Unit Tests Genuinely Don’t Help
I’m not a unit test absolutist. There are situations where they provide little value, and pretending otherwise makes the whole practice feel like cargo cult engineering.
Thin wrappers and pass-through code. If a function does nothing but call another function and return its result, testing it is testing the language runtime. Skip it.
UI layout and styling. Does this button render 20 pixels from the left edge? That’s a visual regression test, not a unit test. Tools like Playwright and Storybook handle this far better.
Glue code in rapidly changing prototypes. If you’re in discovery mode and the code will be rewritten three times in the next two weeks, investing in comprehensive unit tests is waste. Write a few smoke tests, move fast, and add proper tests when the design stabilizes.
Pure integration logic. Sometimes the interesting behavior is the interaction between components — database queries, API calls, message flows. Unit-testing these with mocks can create a false sense of security. If your mock says the database returns [{id: 1}] but the real query returns [{id: 1, deleted_at: null}] with an extra field that breaks your deserializer, the mock won’t save you. This is where integration tests and contract tests earn their keep.
The key insight: unit tests excel at verifying logic — branching, calculations, transformations, state machines. They’re weak at verifying interactions. Know the difference and test accordingly.
The Testing Pyramid Is Dead. Long Live the Testing Diamond.
The traditional testing pyramid — many unit tests at the base, fewer integration tests in the middle, a handful of E2E tests at the top — was designed for a world where tests were expensive to write and slow to run.
Both constraints have loosened dramatically.
AI generates tests at every level almost equally fast. And modern tooling has made integration tests much faster than they used to be. Testcontainers spins up a real PostgreSQL instance in under two seconds. Playwright runs a full browser flow in milliseconds. The infrastructure excuse for avoiding higher-level tests barely holds anymore.
What’s emerging in practice is more of a diamond shape:
- A small base of unit tests for complex, algorithmic logic — parsers, validators, state machines, financial calculations. Things with many branches and edge cases.
- A thick middle of integration and component tests that verify real interactions between real components. API endpoint tests that hit a real database. Service tests that use real queues. This is where the highest ROI lives for most web applications.
- A thin top of E2E tests for critical user journeys. Checkout flow. Login. The paths that, if broken, wake someone up at 3 AM.
This isn’t a rejection of unit tests. It’s a recognition that not all code benefits equally from them. The business logic layer with complex branching? Unit test it heavily. The CRUD controller that maps an HTTP request to a database query? An integration test is more honest about what can actually go wrong.
Mutation Testing: Proving Your Tests Actually Work
Here’s a practice that was niche in 2019 and deserves far more attention in 2026: mutation testing.
The idea is simple. A mutation testing tool takes your code, introduces small changes (mutations) — flipping a > to >=, removing a return statement, replacing true with false — and runs your test suite against each mutant. If your tests still pass after a mutation, they’re not actually testing that code path. The mutant “survived,” and your test suite has a blind spot.
Coverage metrics tell you which lines your tests execute. Mutation testing tells you which lines your tests actually verify. Those are very different things.
Tools like Stryker (JavaScript/TypeScript), mutmut (Python), and pitest (Java) are mature and practical. Running them on your most critical modules — even quarterly — reveals exactly where your test suite is theatrical rather than useful.
This matters especially for AI-generated tests. An LLM will happily produce a test suite with 95% line coverage that catches almost nothing, because it optimizes for the superficial pattern of “code that looks like tests” rather than the deeper pattern of “assertions that constrain behavior.” Mutation testing catches this.
TDD in the Age of AI
Test-driven development — write the test first, watch it fail, make it pass — was already controversial in 2019. In 2026, it’s been reshaped but not replaced.
The classic TDD cycle assumed a human writing both tests and implementation, alternating in tight loops. With AI, a more effective pattern has emerged: specification-driven generation.
You write the tests first — or more precisely, you write the specifications as tests. What should this function accept? What should it return? What are the error cases? Then you hand those tests to an AI and say: make these pass.
This is TDD on steroids. The AI generates an implementation that satisfies your contracts, and you review it. If the implementation looks wrong despite passing tests, your tests are incomplete — add more. If it looks right, you’ve just built a verified component in minutes instead of hours.
The human remains the designer. The AI is the builder. The tests are the blueprint. This workflow only works if the tests exist before the implementation. Otherwise, you’re back to generating code and then generating tests that confirm what the code already does — which is circular and catches nothing.
What I’d Tell My 2019 Self
Seven years ago, I ended the original article by arguing that unit tests serve as documentation and that TDD helps maintain low coupling. Both still true. But I was too defensive about it. I spent too much energy justifying tests at all, rather than talking about how to test well.
Here’s what I know now that I didn’t know then:
The cost of writing tests is no longer the bottleneck. AI eliminated it. The cost is now in designing the right tests — choosing what to assert, which edge cases matter, where to draw the boundary between unit and integration.
Coverage is a vanity metric. Mutation score is a quality metric. Chase the latter.
Mocking is a design smell, not a testing strategy. If you need to mock ten dependencies to test one function, the function is doing too much. Fix the design, don’t duct-tape the test. The original article was right that coupling matters — but I’d go further now: excessive mocking is a sign you’re testing the wrong way, not that you need more mocks.
The best test suite is the one your team actually runs. Speed matters. Flaky tests get ignored. A fast, reliable suite of 200 tests beats a comprehensive but flaky suite of 2,000 every time.
Write tests for the code that scares you. The function with six nested conditions? Test it. The billing calculation? Test it. The straightforward getter? Don’t.
Unit tests aren’t about proving your code is correct. They never were. They’re about catching the moment it stops being correct — whether the change came from a human, an AI, or a 3 AM hotfix pushed by someone who “just needed to fix one thing.”
In the AI era, that catching matters more than ever. Write the tests. Let the machines write the code.