Code Kit v5.0 released for CC's new Agent Teams feature.
Claude FastClaude Fast
Mechanics

Claude Code Experiments: 5 Tests to Optimize Your Workflow

Prompt style showdown, context loading tests, planning mode comparison, and more. Run these 5 Claude Code experiments to stop guessing what works best.

Stop configuring. Start shipping.Everything you're reading about and more..
Agentic Orchestration Kit for Claude Code.

Problem: You use the same prompting patterns every session, and you have no idea if they're optimal or costing you time.

Quick Win: Run this experiment right now to see how prompt style changes Claude's output:

# Experiment A: Direct command
claude -p "Fix the bug in auth.js"
 
# Experiment B: Collaborative framing
claude -p "Let's debug auth.js together. Walk me through what you see first."

Compare the outputs. The collaborative version typically produces more thorough analysis and catches edge cases the direct version misses. That's experimentation in action.

Experiment 1: Prompt Style Showdown

The same task produces different results based on how you frame it. Try these three styles on any debugging task:

# Style 1: Command
claude -p "Fix the login validation bug"
 
# Style 2: Teaching request
claude -p "Explain what's wrong with login validation and show me the fix step by step"
 
# Style 3: Review mode
claude -p "Review the login validation code. What bugs, security issues, or edge cases do you see?"

What you'll discover: Style 3 (review mode) consistently finds more issues because it frames the task as analysis rather than quick-fix execution.

Here's what this looks like in practice. I ran all three styles against a login handler that had an obvious null-check bug plus a subtle timing vulnerability in the rate limiter. Style 1 (command) fixed the null check and moved on. Style 2 (teaching) fixed the null check and explained why it mattered, but still missed the timing issue. Style 3 (review) identified both the null check, the timing vulnerability, and flagged that the error messages were leaking internal state to the client.

The pattern holds across task types, but the magnitude varies. For security-related code, review mode tends to catch significantly more issues. For straightforward UI bugs, the difference shrinks and command style is fine. For refactoring, teaching mode often produces the cleanest results because it forces Claude to articulate the reasoning behind each change.

Document which style works best for different task types in your projects. After a few sessions, you'll have a personal lookup table: security work gets review mode, simple fixes get command mode, refactors get teaching mode.

Experiment 2: File Context Impact

Claude's code quality changes based on what files you provide as context. Test this yourself:

# Experiment A: Single file
claude -p "Refactor UserService.ts for better error handling"
 
# Experiment B: Related files included
claude -p "Refactor UserService.ts for better error handling. Related files: AuthController.ts, types/user.ts, utils/errors.ts"

The second version produces refactors that align with your existing patterns. Without context, Claude invents patterns that may conflict with your codebase.

I tested this on a real project with a custom error hierarchy. In experiment A, Claude created a generic try/catch with new Error() calls. Clean code, but it ignored my existing AppError, ValidationError, and NotFoundError classes entirely. The refactored file wouldn't even compile alongside the rest of the codebase without type mismatches.

In experiment B, Claude discovered my error classes from utils/errors.ts, saw how AuthController.ts already used them, and produced a refactor that matched the existing pattern exactly. Same task, same model, same session. The only difference was three extra file references.

The lesson goes deeper than "provide more files." There's a sweet spot. Including 2-4 closely related files improves output quality consistently. Beyond 6-7 files, you start hitting diminishing returns because Claude spreads attention across too many patterns. Start with the type definition file and one file that already implements the pattern you want.

Experiment 3: Planning Mode vs Direct Execution

Planning mode (Shift+Tab twice) changes how Claude approaches problems. Run this comparison:

# Direct execution
claude -p "Add user role permissions to the dashboard"
 
# Planning first (enter planning mode, then)
claude -p "Add user role permissions to the dashboard"

In planning mode, Claude analyzes your codebase and presents options before touching files. You get to review the approach, catch potential issues, and choose the best path forward.

The difference becomes obvious with multi-file features. When I ran this experiment on a permissions system that touched the auth middleware, the dashboard routes, and the frontend components, direct execution jumped straight into the middleware file and started coding. It made assumptions about the role structure that conflicted with the existing user model, and I had to undo half the changes.

Planning mode, by contrast, surfaced three architecture options before writing any code. One option used the existing role enum. Another suggested a permissions table for finer granularity. The third proposed a hybrid approach. I picked option one, and the implementation aligned with the codebase from the start.

Where the threshold sits: Complex features touching 3+ files almost always benefit from planning mode. Simple fixes (single file, obvious change) don't. The break-even point in my workflow is roughly 2 files. If a task spans 2 files, planning mode adds maybe 30 seconds of overhead but saves backtracking. Below that, it's unnecessary ceremony.

Experiment 4: CLAUDE.md Configurations

Your CLAUDE.md file directly shapes Claude's behavior. Test different configurations:

# Configuration A: Minimal
 
Always run tests after changes.
 
# Configuration B: Detailed
 
Always run tests after changes.
Prefer small, focused commits.
Ask before modifying files outside src/.
Use existing patterns from the codebase.

Run the same task with each configuration and compare:

  • How many clarifying questions does Claude ask?
  • Does it follow your project patterns?
  • Are the commits appropriately scoped?

With Configuration A, Claude treats every directory as fair game. I asked it to add a feature to the dashboard and it modified the database schema, updated the API routes, and changed the deployment config in one shot. No questions asked, one massive commit.

With Configuration B, the same request triggered a clarifying question: "The dashboard feature will need a new API endpoint. Should I create that in src/api/ or modify the existing routes.ts?" The resulting work came as three focused commits, each touching only the files relevant to that layer.

Your CLAUDE.md isn't documentation. It's an operating system. The instructions you put there shape every decision Claude makes. Experiment with different instructions and measure the output. Small changes in your CLAUDE.md produce surprisingly large changes in behavior. Check the rules directory guide for even more granular control over Claude's behavior per file type.

Experiment 5: Context Window Pressure

This one surprised me. Claude's output quality degrades under context pressure in ways that aren't immediately obvious. Run this test:

Start a fresh session and ask Claude to implement a moderately complex feature (something touching 3-4 files). Note the quality. Then, in an existing session that's been running for a while with accumulated context, ask for the exact same feature.

The fresh session version will typically produce cleaner code with better error handling. The loaded session version tends to cut corners: shorter variable names, fewer edge case checks, sometimes skipping tests entirely. It's not that Claude "forgets" how to write good code. It's that the model allocates attention across all the accumulated context, leaving less capacity for the current task.

The practical takeaway: use /compact aggressively before complex tasks. I now run /compact before any feature that touches more than two files, and the quality difference is noticeable. Some developers prefer /clear for a completely fresh start, but you lose conversation history. The /compact command strikes the right balance by summarizing previous context without discarding it.

Building Your Experiment Log

Create a simple tracking file to capture what you learn:

# Experiments Log
 
## 2025-01-15: Prompt Framing
 
- **Test**: "Fix X" vs "Review X for issues"
- **Result**: Review framing found 3 additional edge cases
- **Action**: Use review framing for any security-related code
 
## 2025-01-16: Context Loading
 
- **Test**: Single file vs multi-file context
- **Result**: Multi-file context matched existing error patterns
- **Action**: Always include related type files for refactors
 
## 2025-01-20: Planning Mode Threshold
 
- **Test**: Direct execution vs planning on 2-file and 4-file tasks
- **Result**: Planning mode saved 15 min on 4-file task, added 2 min overhead on 2-file task
- **Action**: Use planning mode when task touches 3+ files
 
## 2025-01-22: Context Pressure
 
- **Test**: Same feature request in fresh session vs 80% context session
- **Result**: Fresh session produced noticeably better test coverage, caught 2 edge cases
- **Action**: Run /compact before any multi-file implementation

This log becomes your personal optimization guide. After a month, you'll have a dataset of what actually works in your specific codebase with your specific patterns. That's worth more than any generic advice, because the optimal approach varies by project, language, and even the type of work you're doing.

The key is being specific. "Review mode is better" isn't useful. "Review mode catches 2-3x more security issues in auth code" is something you can act on every time.

Next Experiments

Ready to test more? Try these:

  1. Context management: What happens at 60% vs 90% context usage?
  2. Auto-planning: Does automated planning outperform manual mode switching?
  3. Model selection: How do different models handle the same complex task?
  4. Deep thinking: Does extended thinking produce better architecture decisions?

Stop guessing what works. Run the experiment. Document the result. Iterate.

Every session is a chance to discover something that makes you faster. The developers who experiment systematically outperform those who stick with their first approach. ClaudeFast's Code Kit encodes this self-improvement loop directly into its CLAUDE.md, prompting Claude to surface reusable patterns and suggest skill updates whenever a workaround or new approach is discovered during a session.

This experimentation approach is what we call the "system evolution mindset." It's one of 5 Claude Code best practices that separate top developers from everyone else.

Last updated on

On this page

Stop configuring. Start shipping.Everything you're reading about and more..
Agentic Orchestration Kit for Claude Code.