Roi Gomez

How I built an AI agent with Claude Code in 20 minutes

2025-05-11

Claude Code is not just a glorified autocomplete. It’s an agent runtime with full access to your filesystem, terminal, and the ability to run any command. When you understand that, the possibilities shift.

The starting point

I wanted a simple agent: monitor a src/ directory, detect changes in .ts files, run the affected tests, and report the result. No external watchers. No CI. Just Claude doing the work.

The initial prompt was this direct:

claude "Monitor the src/ directory. When you detect changes in .ts files,
run the related tests with npx jest and report the result with a summary
of what passed and what failed."

Claude Code has Bash and Read tools built in. It knows how to run commands and read files. The agent worked on the first try — not because the prompt was perfect, but because the task matched the available tools.

Adding persistence with CLAUDE.md

The problem with the previous agent is it loses context between sessions. The solution is a CLAUDE.md in the project root — Claude Code reads it automatically on startup.

# Project: Users API

## Stack
- TypeScript + Node.js
- Jest for testing
- PostgreSQL with Drizzle ORM

## Test conventions
- Tests live in `src/__tests__/`
- Filename: `[feature].test.ts`
- Run only the tests for the affected module: `npx jest [filename]`

## Agent rules
- If a test fails, check the error in logs before editing code
- Do not modify files outside `src/`

With this persistent context, the agent has the instructions it needs to operate correctly even across new sessions.

The result

The agent took about 3 minutes to set up and ran reliably for weeks. The key wasn’t prompt complexity — it was clarity: concrete task, known tools, explicit constraints.

What surprised me was Claude’s ability to chain actions: detect the changed file → identify which tests to run → run the tests → parse the output → report only failures with context. All of that without additional code on my end.

If you have a task that can be described in natural language and executed with terminal commands, you can probably automate it with Claude Code in less time than you think.

CLAUDE.md: best practices for configuring your agent

2025-05-06

CLAUDE.md is the configuration file that Claude Code reads automatically when starting in a project. It’s your way of telling the agent how to behave, what conventions to follow, and what context it needs to be useful from the first message.

What goes in CLAUDE.md and what doesn’t

CLAUDE.md is not technical documentation for the project. It’s a briefing for the agent. Include information Claude needs to operate well but can’t infer from reading the code.

Include:

Tech stack and specific versions
Project code conventions
Development, test, and deploy commands
Restrictions and files it shouldn’t touch
Business or domain context that isn’t obvious

Don’t include:

Function documentation (that goes in the code)
Change history
Documentation for human users
Things Claude can infer from reading the code

Recommended structure

# Project

[2-3 sentences describing what the project does and for whom]

## Stack

- Runtime: Node.js 20
- Framework: Next.js 14 App Router
- Database: PostgreSQL via Drizzle ORM
- Tests: Vitest + Testing Library

## Commands

```bash
npm run dev          # dev server at :3000
npm run test         # tests in watch mode
npm run test:ci      # tests without watch (for CI)
npm run db:push      # sync schema with DB

Conventions

Components in PascalCase, hooks with use prefix
Absolute imports from @/ (alias configured in tsconfig)
No any in TypeScript — use unknown if you need to
Tests next to the file they test, not in a separate folder

Restrictions

Do not modify src/migrations/ — migrations are auto-generated
Do not install dependencies without confirming first
src/lib/vendor/ is third-party code, don’t touch it


## The most common mistake: vague instructions

Vague instructions don't help. "Write clean code" or "follow best practices" tells the agent nothing useful. Be specific about what clean means in your project.

Instead of:

Use good security practices.


Write:

Input validation: use Zod on all API endpoints. Never build SQL queries with string concatenation — always use prepared parameters. API keys go in environment variables, never hardcoded.


## CLAUDE.md in subdirectories

Claude Code reads all CLAUDE.md files in the directory hierarchy. You can have a root one with global config and additional files in subdirectories for specific context:

project/ ├── CLAUDE.md # global config ├── frontend/ │ └── CLAUDE.md # frontend-specific conventions └── backend/ └── CLAUDE.md # backend conventions


The agent combines all relevant contexts based on the directory it's working in.

## Update CLAUDE.md when the project changes

CLAUDE.md is a living document. If you change the stack, add a new convention, or notice the agent consistently doing something you don't want, update the file. It's more efficient than correcting it in every session.

A well-maintained CLAUDE.md is the difference between an agent that needs constant guidance and one that operates well from the start.

Prompt caching: how to cut Claude API costs by 80%

2025-05-05

There’s one line of code that cut my Claude API bill by 78% in a single month. It’s called cache_control. Most people using the Anthropic API aren’t using it. That’s a mistake.

What prompt caching is

When you call the Claude API, you pay for every token the model processes — both prompt tokens and response tokens. If you have a 2000-token system prompt you send with every request, you’re paying for those 2000 tokens every time.

Prompt caching changes that. You mark parts of the prompt as cacheable, and Anthropic stores them in their infrastructure for 5 minutes. Subsequent requests that use that cached portion cost roughly 90% less for that segment.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert TypeScript assistant...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "Explain generics"}]
)

The cache_control field with type: "ephemeral" tells Anthropic: “cache everything up to this point.”

When it makes sense to use it

Caching is most effective when you have:

Long, stable system prompts. A system prompt that defines agent behavior, includes documentation, or provides extensive context. If your system prompt is over 500 tokens and doesn’t change between requests, cache it.

Reference documents or code. If you’re doing Q&A over a document, code analysis, or any task requiring fixed context, that context should be cached.

Long conversations. Previous messages in a conversation can be cached to avoid reprocessing the history.

Caching does NOT help if your prompts change constantly or are short (less than 1024 tokens for Sonnet — that’s the minimum for the cache to activate).

The real numbers

In a project where I used Claude to analyze code in a repo, I had a ~3000-token system prompt that included project conventions. Without cache: ~$0.18 per request (just the system prompt). With cache: $0.018 on the first request, $0.003 on subsequent ones.

For 500 requests per day, that goes from $90/day to $1.50/day for the system prompt alone. The savings pay for themselves within hours.

One important detail

The cache lasts 5 minutes. If there’s a gap of more than 5 minutes between requests using the same cache, the next request pays full price and refreshes the cache. This matters for intermittent workloads.

To verify the cache is working, check the API response — the usage field includes cache_creation_input_tokens and cache_read_input_tokens. If cache_read_input_tokens > 0, the cache is active.

Implementation takes under 10 minutes. There’s no reason not to do it.

Build your first MCP server in TypeScript

2025-04-28

MCP (Model Context Protocol) is the mechanism that lets Claude Code connect to external tools. An MCP server exposes functions that Claude can invoke, similar to how an LLM invokes function calls in the OpenAI API, but with a standardized protocol that works with any MCP client.

What an MCP server actually is

An MCP server is a process that exposes a set of tools via a JSON-RPC protocol. Claude Code connects to it at startup and can invoke those tools during the session.

The most useful use cases: connect Claude to your database, expose internal APIs, give it access to services it doesn’t have natively (Slack, Jira, your own CMS).

Initial setup

mkdir my-mcp-server && cd my-mcp-server
npm init -y
npm install @modelcontextprotocol/sdk
npm install -D typescript @types/node ts-node

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "CommonJS",
    "outDir": "dist",
    "strict": true
  }
}

The minimal server

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { CallToolRequestSchema, ListToolsRequestSchema } from "@modelcontextprotocol/sdk/types.js";

const server = new Server(
  { name: "my-tools", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: "get_timestamp",
      description: "Returns the current timestamp in ISO format",
      inputSchema: { type: "object", properties: {}, required: [] },
    },
  ],
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "get_timestamp") {
    return {
      content: [{ type: "text", text: new Date().toISOString() }],
    };
  }
  throw new Error(`Tool not found: ${request.params.name}`);
});

const transport = new StdioServerTransport();
await server.connect(transport);

Connecting it to Claude Code

In your .claude/settings.json:

{
  "mcpServers": {
    "my-tools": {
      "command": "node",
      "args": ["./dist/index.js"]
    }
  }
}

After npx tsc && claude, Claude Code has access to your get_timestamp tool. From there, add whatever tools you need: database queries, API calls, reading config files.

The MCP SDK handles all the protocol. Your job is just to define the tools and their logic.

Extended thinking in Claude: when and how to use deep reasoning

2025-04-22

Extended thinking is the mode where Claude “thinks out loud” before giving a response. Instead of jumping straight to an answer, Claude works through the problem internally and you can see that process. For complex tasks, the quality difference is significant.

How it works internally

When you enable extended thinking, Claude generates a thinking block before the text block of the response. This block contains the internal reasoning: hypotheses, checks, self-corrections. It’s not decorative — it’s the real process that improves the final response.

Thinking consumes output tokens but Anthropic bills it differently: thinking tokens cost the same as regular output tokens but don’t count toward context limits the same way.

Enabling extended thinking in the API

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "Design a microservices architecture for an e-commerce platform with 50k concurrent users."
    }]
)

for block in response.content:
    if block.type == "thinking":
        print("REASONING:", block.thinking)
    elif block.type == "text":
        print("RESPONSE:", block.text)

The budget_tokens parameter controls how many tokens the thinking can use. More budget = more reasoning = better response (at higher cost and latency).

When to use extended thinking

Worth enabling for:

Complex software architecture problems
Code analysis with multiple dependencies
Mathematical or logical reasoning
Decisions with many trade-offs
Tasks where the first attempt is usually wrong

Not needed for:

Creative text generation
Translations
Structured data extraction
Questions with direct answers

Recommended budget tokens by use case

Task	Suggested budget
Simple code analysis	2,000
Architecture design	8,000
Hard math problems	10,000
Complex systems analysis	15,000+

The model stops when it reaches a satisfactory answer, not when it exhausts the budget. If you see thinking cutting off before concluding, increase the budget.

Streaming with extended thinking

For production, use streaming to avoid blocking while Claude thinks:

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": "..."}]
) as stream:
    for event in stream:
        if hasattr(event, 'type') and event.type == 'content_block_delta':
            # Process thinking or text based on block_index
            pass

Extended thinking is one of Claude’s most underused capabilities. For problems where the first response isn’t enough, it’s the change that has the most impact on quality.

Agent orchestration: patterns that scale

2025-04-20

A single agent working alone is easy. A system where multiple agents coordinate work is where most people get lost. Three patterns solve 90% of cases.

The problem with scaling agents

A single agent with access to all tools and all context collapses quickly. The context window fills up, instructions mix together, and errors propagate uncontrolled. The solution isn’t bigger prompts — it’s smaller, well-coordinated systems.

Pattern 1: Orchestrator + Workers

The orchestrator receives the high-level task, breaks it down, and delegates to specialized workers. Each worker has a limited scope and specific tools.

Orchestrator: "Analyze this PR and write the report"
  → Worker A: reviews code changes (tools: Read, Bash/git)
  → Worker B: verifies tests (tools: Bash/npm test)
  → Worker C: drafts report (tools: Write)

The orchestrator receives the outputs and assembles them. No worker needs to know what the others are doing.

Pattern 2: Sequential pipeline

For processes where each step depends on the previous one. Data flows from agent to agent, each transforming the previous agent’s output.

This pattern is simple but powerful for ETL processes, content pipelines, or any workflow where order matters. The risk is that an error in one step blocks the entire pipeline — you need to design handoffs with explicit validation.

Pattern 3: Parallel fan-out

When you have independent tasks that can run simultaneously. The orchestrator launches them in parallel and waits for all results before continuing.

The implementation with the Claude API is straightforward: multiple async calls with asyncio.gather() in Python or Promise.all() in JS. Each agent’s context is independent, so there’s no interference risk.

The practical rule

Start with one agent. When you notice the context window filling up or the agent mixing distinct concerns, that’s when to separate. Each split should have a concrete reason — no premature abstractions.

The 5 prompting techniques that actually matter

2025-04-14

There are dozens of “prompting techniques” floating around the internet. Most are noise. These five are what I use in production with measurable impact.

1. Specific role + context, not generic

“You are an expert in X” is too vague. A useful role includes specific domain context and the constraints that apply.

Instead of: "You are a TypeScript expert", try: "You are a senior engineer reviewing TypeScript code for production. The codebase uses Node.js 20, strict mode, and Drizzle ORM. You prioritize security over brevity." The difference in output quality is noticeable.

2. Explicit output format

If you don’t say how you want the output, Claude will choose. Sometimes it chooses well; often it doesn’t. Specify: type (list, JSON, code, markdown), approximate length, and what to include/exclude.

"Respond with a JSON array. Each object has: { error: string, severity: 'low'|'medium'|'high', fix: string }. No additional explanations."

3. Chain-of-thought for complex tasks

For complex reasoning, explicitly asking for the process improves output quality. "Think step by step before answering" activates a more careful reasoning mode. It’s especially useful in debugging and code analysis where the first impulse is often wrong.

4. Few-shot examples for exact format

When format matters (and it almost always does), one example is worth a thousand description words. Show input → expected output before giving the real task. With two or three examples, Claude replicates the pattern with high fidelity.

5. Explicit negative constraints

“Don’t do X” is as important as “do Y”. Negative constraints prevent default behaviors you don’t want: "Don't explain the code you generate", "Don't use comments", "Don't add unnecessary error handling".

Without explicit constraints, Claude tends to be “helpful” by adding things you didn’t ask for. In an automated pipeline that’s noise, not value.

My Claude Code workflow after 3 months

2025-04-08

I’ve been using Claude Code as my primary tool for three months. Not as an occasional assistant — as part of the daily workflow. Here’s what I learned.

What changed from the first month

The first month I used it like a powerful autocomplete. I’d ask for code, review it, adjust it. Reasonable productivity but not spectacular.

The change came when I started working with tasks, not questions. Instead of “write a function that does X,” I started describing the full objective: “implement the /users/:id endpoint with permission validation, standard error handling, and tests.” The difference is substantial — Claude can maintain the context of a complete task much better than fragmented questions.

What I kept: CLAUDE.md as memory

The CLAUDE.md file in the project root is the most valuable thing I adopted. It includes:

Exact project stack (versions, libraries)
Code conventions (naming, folder structure)
Domain-specific rules (“no any in TypeScript”, “errors go in { code, message } format”)
Business context Claude can’t infer from code

Without this, every session started from scratch. With it, Claude has the context it needs to make correct decisions without asking me every time.

What I discovered: hooks are powerful

Claude Code hooks (pre-tool, post-tool) let you automate actions around tools. I use them to: log which files get modified, run the linter automatically after each Write, and alert if it tries to modify files outside the allowed scope.

This turns Claude Code from “agent that executes what you ask” to “agent with automatic guardrails.”

The real limit: context window

After three months, the main limit isn’t response quality — it’s the context window. In large projects, the context fills up and quality drops. The solution isn’t to force more context, it’s to work in shorter sessions with more specific tasks. One session = one well-defined task.

Hooks in Claude Code: automate guardrails with shell commands

2025-04-08

Hooks are one of Claude Code’s least-known and most powerful features. They let you run shell commands automatically before or after Claude uses any tool. Guardrails without friction.

What hooks are and what they’re for

A hook is a shell command that Claude Code runs in response to agent events. Available events:

PreToolUse — before Claude uses a tool
PostToolUse — after Claude uses a tool
Notification — when Claude wants to notify you of something
Stop — when Claude finishes a task

Real use cases: run the linter after each edit, take a git snapshot before destructive changes, block edits on protected files, send Slack notifications when the agent finishes.

Configuring hooks in settings.json

Hooks are configured in ~/.claude/settings.json (global) or .claude/settings.json (project):

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "cd $CLAUDE_PROJECT_DIR && npx eslint --fix $CLAUDE_FILE_PATHS 2>/dev/null || true"
          }
        ]
      }
    ]
  }
}

The matcher is a regex matching the tool name. Write|Edit captures both editing tools.

Available environment variables

Claude Code exposes useful variables in each hook:

$CLAUDE_PROJECT_DIR — project root directory
$CLAUDE_FILE_PATHS — files affected by the tool (space-separated)
$CLAUDE_TOOL_NAME — name of the tool that ran
$CLAUDE_SESSION_ID — current session ID

Blocking protected files

You can use the exit code to block actions. If the hook returns exit code 2, Claude Code cancels the action and shows stderr as an error message:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "if echo \"$CLAUDE_FILE_PATHS\" | grep -q 'production.env\\|\\.secrets'; then echo 'Protected file: edit blocked' >&2; exit 2; fi"
          }
        ]
      }
    ]
  }
}

Auto-commit hook after each task

A useful pattern: auto-commit when Claude finishes, to have a granular history of each agent change:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "cd $CLAUDE_PROJECT_DIR && git diff --quiet || git add -A && git commit -m 'chore: claude agent checkpoint'"
          }
        ]
      }
    ]
  }
}

Hooks turn Claude Code into an auditable, controlled agent. With a few shell commands you define exactly what it can and cannot do in your project.

Tool use in Claude API: practical guide with real examples

2025-03-25

Tool use (function calling) is the capability that transforms Claude from a text generator into an agent that can interact with the real world. With tools, Claude can query APIs, read databases, run calculations, or call any function you define.

How the tool use cycle works

The flow is a 3-step cycle:

You send Claude a list of available tools with their descriptions and schemas
Claude decides whether it needs a tool and returns a tool_use block with the parameters
You execute the tool, return the result, and Claude generates the final response

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Gets the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather like in London?"}]
)

Processing the tool use response

Claude can return tool_use in the stop_reason. You need to handle this case:

if response.stop_reason == "tool_use":
    tool_block = next(b for b in response.content if b.type == "tool_use")
    tool_name = tool_block.name
    tool_input = tool_block.input

    # Execute your real function here
    result = call_your_function(tool_name, tool_input)

    # Return the result to Claude
    final_response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "What's the weather like in London?"},
            {"role": "assistant", "content": response.content},
            {"role": "user", "content": [
                {"type": "tool_result", "tool_use_id": tool_block.id, "content": str(result)}
            ]}
        ]
    )

Multiple tools and tool choice

You can define several tools and Claude will choose the appropriate one. With tool_choice you can force the use of a specific tool or disable automatic selection.

# Force a specific tool
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "get_weather"},
    messages=[...]
)

When to use tool use vs plain prompt

Tool use adds latency and complexity. Use it when you need real-time external data, executing actions with side effects, or when the result depends on information not in context. For pure reasoning or text generation, a direct prompt is faster and cheaper.