TDD with Claude Code | AI Toolkit

The biggest mistake when doing TDD with an LLM is letting it write all the tests first, then all the implementation. This is "horizontal slicing" — and it produces bad tests. Tests written in bulk test imagined behavior, not actual behavior. They test the shape of things (data structures, function signatures) instead of what users experience.

The correct approach: one vertical slice at a time. Write one test, make it pass, repeat. Each test responds to what you learned from the previous cycle.

This technique is informed by Matt Pocock's TDD skill and adapted for Claude Code workflows.

The prompt

We're going to build this feature using TDD with vertical slices. Follow this process strictly:

## Rules

1. ONE test at a time. Write a test, then write the minimal code to make it pass. Do not write the next test until the current one is green.

2. Tests verify BEHAVIOR through public interfaces, not implementation details. A good test reads like a specification: "user can checkout with valid cart." A bad test mocks internal collaborators or calls private methods.

3. If a test would break when you refactor internals (but behavior hasn't changed), it's testing implementation. Rewrite it.

4. Never write more code than the current test demands. No speculative features.

## Process

### Planning
Before writing any code:
- What is the public interface? (functions, API endpoints, component props)
- What are the key behaviors to test? (list them in priority order)
- What does the simplest possible end-to-end path look like?

### Cycle
For each behavior, in priority order:

**RED**: Write one test that fails. The test name should describe the behavior, not the implementation.
**GREEN**: Write the minimum code to make the test pass. Nothing more.
**REFACTOR**: If you see duplication or unclear code, clean it up. Run tests after each change. Never refactor while RED.

Show me each cycle explicitly:
1. The test you're writing and why
2. The failing test output
3. The code change to make it pass
4. The passing test output

### Anti-patterns to avoid
- Writing all tests first, then all implementation (horizontal slicing)
- Testing private methods or internal state
- Mocking things that aren't external dependencies
- Adding code "while you're in there" that no test requires

When to use it

Building a new module or feature from scratch
Adding behavior to an existing module with test coverage
When you want Claude to explain its reasoning at each step (the explicit cycle format forces this)

When NOT to use it

Fixing a simple bug (just write the regression test and fix)
Prototyping or exploring (TDD slows down exploration)
UI-heavy work where the "behavior" is visual (test the logic, not the pixels)

Why vertical slices matter

WRONG (horizontal):
  RED:   test1, test2, test3, test4, test5
  GREEN: impl1, impl2, impl3, impl4, impl5

RIGHT (vertical):
  RED→GREEN: test1→impl1
  RED→GREEN: test2→impl2
  RED→GREEN: test3→impl3

When you write all tests first, you're committing to an interface before you understand the implementation. The tests end up testing the shape you imagined, not the behavior that emerged. Vertical slices let each test respond to what you actually built in the previous cycle.

Tips

Start with the tracer bullet: the simplest test that proves the whole path works. For a data layer, that might be "can read a single entry by slug." For an API, "GET /endpoint returns 200."
If Claude tries to write multiple tests at once, interrupt it. The discipline of one-at-a-time is the whole point.
The refactor step is where the real design emerges. Don't skip it, but don't force it either. If nothing needs cleaning up, move on.