AI Development

From Spec-Driven Development to Feature Plan Development

Spec-driven development felt like overkill. Vibe coding felt like chaos. Feature plan development is the middle ground I've open-sourced.

Richard Woollcott

23 Dec 2025 — 7 min read

Sean Grove's AI Engineer talk about model specs sent me down a rabbit hole.

I started exploring spec-driven development. Looked at Brian Casel's Agent OS, tried BMAD. Good tools, but they didn't click with how I've actually worked.

In my experience from 24 years of contracting experience, I work on tasks. Tasks relate to features. Features relate to epics. That's the rhythm.

So I started building what I couldn't find. Spent time with James Guest structuring a spec-driven tool. Early mornings, late nights, weekends.

The Pivot Moment

About six weeks into the development I read Birgitta Böckeler's ThoughtWorks article.

"Spec-driven development is like using a sledgehammer to crack a nut."

Ouch. But also... true?

So I stepped back and did some honest reviews with Claude. We realised something uncomfortable: I wasn't even using the requirements functionality, but we were building a full-blown Spec-Driven Development tool. I was just using the task-create/task-work/task-complete slash commands.

The Reality of Day-to-Day Development

Part of the reason for this was that I was working solo, but even within teams, we often work with minimal specs due to time or resource pressures. We might also be working on a brownfield project, adding features and fixing bugs, with specs created long ago that are now out of date.

I've found that quite often the specs are minimal, and we infer what needs to be done in informal conversations.

The Split - GuardKit vs RequireKit

So I split the tool.

GuardKit for guardrails and quality gates.

RequireKit for when you actually need proper requirements.

Both tools work standalone, or optionally, you can use them together.

GuardKit	RequireKit
Guardrails & quality gates	Requirements gathering
Task workflow commands	EARS notation support
Feature Plan command	Epic and Feature spec support
Always needed	When the work demands it
"How we build"	"What we build"

I've always liked tools and frameworks that let you use the bits you want to, rather than force you to adopt an all-or-nothing approach.

What GuardKit Does

In short:

Guardrails for AI output - stop the slop
Enables parallel task development - the real productivity unlock
Analyses your codebase - generates CLAUDE.md, rules, templates and sub-agent definitions

The rest of this post digs into how these pieces fit together.

The Feature Plan Workflow

I kept finding myself putting Claude into plan mode, creating tasks from that plan, then reviewing with Opus. I had already built commands for creating tasks, so I added a command for task review to replace the prompts I kept copying and pasting.

The task-review command is used for analysis and has a number of modes:

architectural: SOLID/DRY/YAGNI compliance review
code-quality: Maintainability and complexity assessment
decision: Technical decision analysis with options evaluation
security: Security audit and vulnerability assessment

The review mode will determine the choice of primary and secondary agents, and the recommendations will be generated from the aggregate findings of these agents.

The feature-plan command evolved from a workflow I found I had fallen into quite often, whereby I used the task-create command to create a review task and then used the task-review command, the output of which would be:

[A]ccept  - Approve findings, save for reference
[R]evise  - Request deeper analysis on specific areas
[I]mplement - Create implementation tasks
[C]ancel  - Discard this review

Once we are happy with the review analysis, we can create implementation tasks.

I found I was then generating a README with context and acceptance criteria, and an Implementation Guide, so I created the feature-plan command, which aggregates the task-create and task-review commands internally.

The plan details the inter-dependencies between tasks and breaks the work into waves, identifying those which can be implemented in parallel. It also specifies whether to use the task-work command or to ask Claude to execute the task directly.

The feature-plan command uses this workflow to generate the tasks and documentation in a subfolder within the backlog. Additionally, a series of clarifying questions will be asked to improve the context.

So GuardKit treats features as a unit of planning and tasks as the unit of execution.

Example feature-plan summary

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ FEATURE IMPLEMENTATION STRUCTURE CREATED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Created: tasks/backlog/pack-list-feature/
  ├── README.md
  ├── IMPLEMENTATION-GUIDE.md
  ├── TASK-PL-001-data-model-repository.md
  ├── TASK-PL-002-template-management-ui.md
  ├── TASK-PL-003-meeting-list-ui.md
  ├── TASK-PL-004-promotion-history.md
  └── TASK-PL-005-navigation-integration.md

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 EXECUTION STRATEGY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Wave 1: Foundation (Sequential)
  └── TASK-PL-001: Data model & repository layer (3-4 hrs)
      Mode: task-work | Testing: TDD

Wave 2: UI Components (Parallel)
  ├── TASK-PL-002: Template management UI (3-4 hrs)
  │   Workspace: pack-list-wave2-templates
  │   Mode: task-work | Testing: TDD
  │
  └── TASK-PL-003: Meeting pack list UI (3-4 hrs)
      Workspace: pack-list-wave2-lists
      Mode: task-work | Testing: TDD

Wave 3: Features & Integration (Parallel)
  ├── TASK-PL-004: Item promotion & history (2-3 hrs)
  │   Workspace: pack-list-wave3-features
  │   Mode: task-work | Testing: TDD
  │
  └── TASK-PL-005: Navigation & integration (1-2 hrs)
      Workspace: pack-list-wave3-nav
      Mode: direct | Testing: TDD

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 SUMMARY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Approach: Hybrid Copy-on-Create with Template References
Total Effort: 12-17 hours
Subtasks: 5
Testing: Full TDD (test-first)

Review Task: TASK-REV-47E0 (completed)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🚀 NEXT STEPS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. Review the implementation guide:
   tasks/backlog/pack-list-feature/IMPLEMENTATION-GUIDE.md

2. Start Wave 1 (foundation must complete first):
   /task-work TASK-PL-001

3. After Wave 1, run Wave 2 in parallel with Conductor

4. After Wave 2, run Wave 3 in parallel with Conductor

The Task Workflow Commands

/task-create   - Creates a task from a plan
/task-work     - Works on the task with quality gates
/task-complete - Completes with automatic test execution

The GuardKit Workflow

The full task-work workflow includes several phases, some added as we discovered gaps along the way:

Phase 1.6: Clarifying Questions (complexity-gated)

Phase 2: Implementation Planning (Markdown format)

Phase 2.5: Architectural Review (SOLID/DRY/YAGNI scoring)

Phase 2.7: Complexity Evaluation (0-10 scale)

Phase 2.8: Human Checkpoint (if complexity ≥7 or review required)

Phase 3: Implementation (using approriate subagents)

Phase 4: Testing (compilation + coverage)

Phase 4.5: Test Enforcement Loop (auto-fix up to 3 attempts)

Phase 5: Code Review

Phase 5.5: Plan Audit (scope creep detection)

Note that the crazy phase numbering reflects the evolutionary discovery of these as we experimented with ideas.

The Parallel Development Unlock

All of these quality gates add overhead to each task - so what's the point? The real unlock came from combining this with parallel execution.

Working on 2, 3, and 4 things simultaneously in Conductor. That's the "aha" moment. That's where the productivity gains actually lie, provided the tasks' output is of good quality. That's the challenge I've been trying to work out how to meet.

Adding the various quality gates to the task implementation workflow slows things down, so it doesn't really make sense to use the Claude Code CLI in a pseudo-pair-programmer mode. Still, now that we can implement these tasks in parallel, the time per task matters less, since we can work on something else, such as researching the next feature, instead of sitting and waiting for Claude Code.

There are limits to how much work we can parallelise in terms of cognitive ability and being able to handle bottlenecks in 'feeding the machine' work, and then the other side in testing, verifying and fixing the work output. So it's still hard to say how much we could gain in terms of outright productivity.

So I can see the potential gains, but am still unclear how close we really are to realising them, and even then, do bottlenecks elsewhere slow down the overall delivery of the business value?

An example screenshot is shown below, showing four tasks being implemented in parallel:

Implementing tasks in parallel using Conductor

Codebase Analysis and Templates

GuardKit templates are self-contained packages that provide stack-specific guidance, agents, and configuration for AI-assisted development. The system uses progressive disclosure to optimise context window usage while maintaining comprehensive documentation.

template-name/
├── CLAUDE.md              # Primary AI guidance document
├── manifest.json          # Template metadata and discovery
├── settings.json          # Code style and generation settings
├── README.md              # Human-readable documentation
├── agents/                # Stack-specific AI specialists
│   ├── {agent}.md         # Core agent (always loaded)
│   └── {agent}-ext.md     # Extended reference (on-demand)
├── docs/                  # Extended documentation
│   ├── patterns/          # Design patterns (on-demand)
│   └── reference/         # Reference material (on-demand)
└── templates/             # Code templates (optional)

The first step to using GuardKit is to initialise it with a template. This would typically be from using one of the five built-in templates or by analysing your codebase with the template-create command, which generates a template with stubs for the identified subagents. Next, run the agent-enhance command for each of these agents to generate a complete definition, including a core and extended file, for progressive disclosure.

GuardKit includes five high-quality templates for learning and evaluation:

Stack-Specific Reference Templates

Template	Stack	Description
react-typescript	Frontend	From Bulletproof React (28.5k stars)
fastapi-python	Backend API	From FastAPI Best Practices (12k+ stars)
nextjs-fullstack	Full-stack	Next.js App Router + production patterns

Specialized Templates

Template	Stack	Description
react-fastapi-monorepo	Full-stack	React + FastAPI monorepo

Language-Agnostic Template

Template	Languages	Description
default	Go, Rust, Ruby, Elixir, PHP, etc.	Language-agnostic foundation

The /template-init command enables developers to create custom project templates through an interactive Q&A session, without requiring an existing codebase.

This "greenfield" approach guides users through a structured questionnaire covering technology stack choices, architecture patterns, testing strategies, and development practices. Based on the answers, the system generates a complete template package, including AI agents with intelligent behaviour boundaries, discovery metadata for automatic agent selection, and modular configuration files optimised for context-window efficiency.

The command also creates enhancement tasks for each generated agent, providing a clear path to improve template quality from initial scaffolding to production-ready specifications incrementally.

Standing on Shoulders

Jordan Hubbard's post on breaking planning from execution reinforced what I was finding - I added an audit step and fix loop based on his insights. Steve Yegge's Beads introduced me to hash-based task IDs (hoping to integrate soon).

I built this for myself to explore ideas for 'taming the beast' when using Claude Code. Hopefully, it's useful for others too, either by using the tool or for idea exploration.

The Bigger Question

When IS full spec-driven development appropriate, and when is "feature plan development" - this middle ground - actually the better fit?

I'm curious what others have found. An initial stab at answering this is below.

When to Use What

A simple decision framework:

Use Full Spec-Driven Development When:

Regulated industry (finance, healthcare, etc.)
Multiple teams need alignment on requirements
Long-lived product with formal change management
Client/contract requires documentation

Use Feature Plan Development When:

Solo dev or small team
Fast iteration/startup pace
Requirements are emergent (you're figuring it out as you go)
Product owner is time-poor (most of us)

Use Vibe Coding When:

Prototyping/throwaway code
Learning a new technology
Personal projects with no stakes

Vibe Coding ←——— Feature Plan Development ———→ Spec-Driven Development
(fast/loose)        (middle ground)              (rigorous/slow)