How Claude Code + Nia Achieved 38% Higher Pass Rate on Next.js Benchmarks

We ran benchmarks using Vercel’s public Next.js evaluation suite to test AI agent competency. Claude Code paired with Nia achieved an 80% pass rate compared to standalone Claude Code’s 58% — a significant improvement driven by grounding AI responses in official & up-to-date sources before writing code.

NextJS Evaluation Results

The Problem: AI Hallucination in Framework Code

Modern web frameworks like Next.js evolve rapidly. The App Router, Server Components, Server Actions, the 'use cache' directive, intercepting routes — these features have specific patterns and APIs that change between versions. When AI coding agents rely solely on training data, they often:

Use outdated patterns — e.g., getServerSideProps instead of Server Components
Hallucinate APIs — inventing parameters that don’t exist
Miss framework-specific conventions — like requiring Suspense boundaries for useSearchParams

The hypothesis: What if we forced the AI to consult official documentation before writing code?

The Experiment Setup

The Benchmark: Vercel’s Next.js Evals

We used a modified version of vercel/next-evals-oss — a rigorous benchmark consisting of 50 real-world coding tasks.

Category	Examples
Core Next.js	Server Components, Client Components, Route Handlers
App Router Patterns	Parallel Routes, Intercepting Routes, Route Groups
Data & Caching	Server Actions, `'use cache'` directive, Revalidation
Best Practices	Prefer `next/link`, `next/image`, `next/font`
AI SDK Integration	`generateText`, `useChat`, tool calling, embeddings

Each eval consists of:

A Next.js project with failing tests
A natural language prompt describing the task
Validation via build + lint + test (binary pass/fail)

The Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     EVALUATION PIPELINE                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────────┐     ┌─────────────┐     ┌──────────────┐     │
│   │  prompt.md  │────▶│ Claude Code │────▶│ File Changes │     │
│   └─────────────┘     └──────┬──────┘     └──────────────┘     │
│                              │                    │             │
│                              │ (with Nia)         │             │
│                              ▼                    ▼             │
│                       ┌──────────────┐    ┌──────────────┐     │
│                       │  Nia Search  │    │   Validate   │     │
│                       │  • Next.js   │    │   • Build    │     │
│                       │  • AI SDK    │    │   • Lint     │     │
│                       └──────────────┘    │   • Test     │     │
│                                           └──────────────┘     │
│                                                  │              │
│                                                  ▼              │
│                                           ┌──────────────┐     │
│                                           │  Pass / Fail │     │
│                                           └──────────────┘     │
└─────────────────────────────────────────────────────────────────┘

How Nia Integration Works

trynia.ai is a knowledge agent that provides AI models with access to up-to-date documentation through the Model Context Protocol (MCP). Here’s how we integrated it:

1. Pre-indexed Documentation Sources

# Pre-indexed documentation IDs
NIA_NEXTJS_DOCS_ID=a7e29fbd-213b-478a-9679-62a05023cffa  # Next.js docs
NIA_AISDK_DOCS_ID=4f80d3ee-e9e5-41f4-8593-faa63af93dd0   # AI SDK docs

2. MCP Server Configuration

A pre-eval hook dynamically configures the Nia MCP server:

# nia-mcp-pre.sh
cat > "$OUTPUT_DIR/.mcp.json" <<EOF
{
  "mcpServers": {
    "nia": {
      "type": "stdio",
      "command": "pipx",
      "args": ["run", "--no-cache", "nia-mcp-server"],
      "env": {
        "NIA_API_KEY": "$NIA_API_KEY",
        "NIA_API_URL": "$NIA_API_URL"
      }
    }
  }
}
EOF

3. Enhanced Prompt Strategy

When --with-nia is enabled, the prompt is enhanced to require documentation search before implementation:

enhancedPrompt = `IMPORTANT: Before writing ANY code, you MUST search the
relevant documentation using Nia MCP tools.

## Available Documentation Sources:

1. **Next.js Documentation** (ID: ${niaNextjsDocsId})
   - Use for: App Router, Server Components, Server Actions, etc.

2. **AI SDK Documentation** (ID: ${niaAisdkDocsId})
   - Use for: generateText, useChat, tool calling, embeddings, etc.

## Required Documentation Search Steps:

1. Analyze the task to determine which documentation source(s) you need
2. Search for relevant documentation using 'search' or 'nia_grep'
3. Read relevant pages using 'nia_read'
4. ONLY THEN implement the solution following best practices from the docs

---

## Your Task:

${originalPrompt}`;

The Results

Overall Performance

Configuration	Passed	Failed	Pass Rate
Claude Code + Nia	40	10	80%
Claude Code (Baseline)	29	21	58%

Improvement: +11 evals, 38% relative improvement

Detailed Breakdown: What Improved?

The evals that passed with Nia but failed without it reveal exactly where documentation grounding helps:

Eval	Task	Why Documentation Helped
`014-server-routing`	Server-appropriate navigation	Docs clarify when to use `redirect()` vs `useRouter()`
`017-use-search-params`	`useSearchParams` with Suspense	Docs specify the required Suspense boundary pattern
`029-use-cache-directive`	`'use cache'` with cache tags	New API requires exact syntax from documentation
`032-ai-sdk-model-specification`	AI SDK model initialization	API format requires current documentation
`040-intercepting-routes`	Intercepting routes `(.)/path`	Complex convention requires docs reference

Example: The ‘use cache’ Directive

This eval asks Claude to implement caching with selective invalidation:

Implement efficient data caching with selective invalidation:

Create a component that fetches data using the 'use cache' directive

Include cache tags for selective invalidation

Add a form with server action that invalidates the cache

Without documentation: Claude might use outdated caching patterns or guess at the API.

With Nia: Claude first searches for 'use cache' in the Next.js docs, reads the current API specification, then implements correctly:

'use cache'
import { cacheTag } from 'next/cache';

async function ProductList() {
  cacheTag('products');
  const products = await getAllProducts();
  // ...
}

Why This Matters

1. Reduced Hallucination

By grounding responses in official documentation, the AI is less likely to invent APIs or use deprecated patterns. This is particularly important for:

Rapidly evolving frameworks — Next.js, React
New features — Server Components, RSC payload, turbopack
Complex conventions — file-system routing, intercepting routes

2. Version Awareness

Documentation is indexed with version information. When Next.js 15 introduces breaking changes, the AI consults Next.js 15 docs — not training data from Next.js 13.

3. Deterministic Accuracy

Instead of probabilistic recall from training, the AI performs deterministic lookup. For questions like “what’s the exact API for cache invalidation?”, documentation search provides the authoritative answer.

The Technical Implementation

Claude Code Runner Architecture

The evaluation framework spawns Claude Code as a subprocess with MCP configuration:

class ClaudeCodeRunner {
  private async executeClaudeCode(projectDir: string, prompt: string) {
    // Check for MCP config
    const mcpConfigPath = path.join(projectDir, '.mcp.json');
    const mcpConfigExists = existsSync(mcpConfigPath);

    const args = [
      ...(mcpConfigExists ? ['--mcp-config', mcpConfigPath] : []),
      '--print',
      '--dangerously-skip-permissions',
      enhancedPrompt
    ];

    const claudeProcess = spawn('claude', args, {
      cwd: projectDir,
      env
    });
  }
}

Evaluation Scoring

Each eval is scored based on three binary checks:

const evalResults = await this.runEvaluation(outputDir);

// Score calculation
const buildScore = result.buildSuccess ? 1.0 : 0.0;
const lintScore = result.lintSuccess ? 1.0 : 0.0;
const testScore = result.testSuccess ? 1.0 : 0.0;
const overallScore = buildScore * lintScore * testScore;  // All must pass

Nia Tools Available to Claude

Tool	Purpose
`search`	Semantic search across indexed documentation
`nia_explore`	Browse documentation structure (tree/ls)
`nia_grep`	Regex search in documentation content
`nia_read`	Read specific documentation pages

Running the Benchmark Yourself

# Clone and setup
git clone https://github.com/nozomio/next-evals-oss
cd next-evals-oss
pnpm install

# Set environment variables
export ANTHROPIC_API_KEY="your-anthropic-key"
export NIA_API_KEY="your-nia-key"
export NIA_NEXTJS_DOCS_ID="a7e29fbd-213b-478a-9679-62a05023cffa"
export NIA_AISDK_DOCS_ID="4f80d3ee-e9e5-41f4-8593-faa63af93dd0"

# Run baseline (Claude Code only)
bun cli.ts --all --claude-code

# Run with Nia
bun cli.ts --all --claude-code --with-nia --with-hooks nia-mcp

Conclusion

The results demonstrate a clear pattern: AI coding agents perform measurably better when grounded in official documentation.

The 38% relative improvement (58% → 80%) isn’t just about passing more tests — it’s about:

Reliability — Using documented APIs rather than hallucinated ones
Maintainability — Code that follows current best practices
Confidence — Developers can trust the generated code more

As frameworks continue to evolve faster than model training cycles, documentation grounding becomes not just helpful, but essential.

Built with Claude Code and Nia. Benchmark based on vercel/next-evals-oss.