How Claude Code + Nia Achieved 38% Higher Pass Rate on Next.js Benchmarks

We ran benchmarks using Vercel’s public Next.js evaluation suite to test AI agent competency. Claude Code paired with Nia achieved an 80% pass rate compared to standalone Claude Code’s 58% — a significant improvement driven by grounding AI responses in official & up-to-date sources before writing code.

NextJS Evaluation Results

The Problem: AI Hallucination in Framework Code

Modern web frameworks like Next.js evolve rapidly. The App Router, Server Components, Server Actions, the 'use cache' directive, intercepting routes — these features have specific patterns and APIs that change between versions. When AI coding agents rely solely on training data, they often:

  1. Use outdated patterns — e.g., getServerSideProps instead of Server Components
  2. Hallucinate APIs — inventing parameters that don’t exist
  3. Miss framework-specific conventions — like requiring Suspense boundaries for useSearchParams

The hypothesis: What if we forced the AI to consult official documentation before writing code?


The Experiment Setup

The Benchmark: Vercel’s Next.js Evals

We used a modified version of vercel/next-evals-oss — a rigorous benchmark consisting of 50 real-world coding tasks.

CategoryExamples
Core Next.jsServer Components, Client Components, Route Handlers
App Router PatternsParallel Routes, Intercepting Routes, Route Groups
Data & CachingServer Actions, 'use cache' directive, Revalidation
Best PracticesPrefer next/link, next/image, next/font
AI SDK IntegrationgenerateText, useChat, tool calling, embeddings

Each eval consists of:

  • A Next.js project with failing tests
  • A natural language prompt describing the task
  • Validation via build + lint + test (binary pass/fail)

The Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     EVALUATION PIPELINE                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────────┐     ┌─────────────┐     ┌──────────────┐     │
│   │  prompt.md  │────▶│ Claude Code │────▶│ File Changes │     │
│   └─────────────┘     └──────┬──────┘     └──────────────┘     │
│                              │                    │             │
│                              │ (with Nia)         │             │
│                              ▼                    ▼             │
│                       ┌──────────────┐    ┌──────────────┐     │
│                       │  Nia Search  │    │   Validate   │     │
│                       │  • Next.js   │    │   • Build    │     │
│                       │  • AI SDK    │    │   • Lint     │     │
│                       └──────────────┘    │   • Test     │     │
│                                           └──────────────┘     │
│                                                  │              │
│                                                  ▼              │
│                                           ┌──────────────┐     │
│                                           │  Pass / Fail │     │
│                                           └──────────────┘     │
└─────────────────────────────────────────────────────────────────┘

How Nia Integration Works

trynia.ai is a knowledge agent that provides AI models with access to up-to-date documentation through the Model Context Protocol (MCP). Here’s how we integrated it:

1. Pre-indexed Documentation Sources

# Pre-indexed documentation IDs
NIA_NEXTJS_DOCS_ID=a7e29fbd-213b-478a-9679-62a05023cffa  # Next.js docs
NIA_AISDK_DOCS_ID=4f80d3ee-e9e5-41f4-8593-faa63af93dd0   # AI SDK docs

2. MCP Server Configuration

A pre-eval hook dynamically configures the Nia MCP server:

# nia-mcp-pre.sh
cat > "$OUTPUT_DIR/.mcp.json" <<EOF
{
  "mcpServers": {
    "nia": {
      "type": "stdio",
      "command": "pipx",
      "args": ["run", "--no-cache", "nia-mcp-server"],
      "env": {
        "NIA_API_KEY": "$NIA_API_KEY",
        "NIA_API_URL": "$NIA_API_URL"
      }
    }
  }
}
EOF

3. Enhanced Prompt Strategy

When --with-nia is enabled, the prompt is enhanced to require documentation search before implementation:

enhancedPrompt = `IMPORTANT: Before writing ANY code, you MUST search the
relevant documentation using Nia MCP tools.

## Available Documentation Sources:

1. **Next.js Documentation** (ID: ${niaNextjsDocsId})
   - Use for: App Router, Server Components, Server Actions, etc.

2. **AI SDK Documentation** (ID: ${niaAisdkDocsId})
   - Use for: generateText, useChat, tool calling, embeddings, etc.

## Required Documentation Search Steps:

1. Analyze the task to determine which documentation source(s) you need
2. Search for relevant documentation using 'search' or 'nia_grep'
3. Read relevant pages using 'nia_read'
4. ONLY THEN implement the solution following best practices from the docs

---

## Your Task:

${originalPrompt}`;

The Results

Overall Performance

ConfigurationPassedFailedPass Rate
Claude Code + Nia401080%
Claude Code (Baseline)292158%

Improvement: +11 evals, 38% relative improvement

Detailed Breakdown: What Improved?

The evals that passed with Nia but failed without it reveal exactly where documentation grounding helps:

EvalTaskWhy Documentation Helped
014-server-routingServer-appropriate navigationDocs clarify when to use redirect() vs useRouter()
017-use-search-paramsuseSearchParams with SuspenseDocs specify the required Suspense boundary pattern
029-use-cache-directive'use cache' with cache tagsNew API requires exact syntax from documentation
032-ai-sdk-model-specificationAI SDK model initializationAPI format requires current documentation
040-intercepting-routesIntercepting routes (.)/pathComplex convention requires docs reference

Example: The ‘use cache’ Directive

This eval asks Claude to implement caching with selective invalidation:

Implement efficient data caching with selective invalidation:

  • Create a component that fetches data using the 'use cache' directive
  • Include cache tags for selective invalidation
  • Add a form with server action that invalidates the cache

Without documentation: Claude might use outdated caching patterns or guess at the API.

With Nia: Claude first searches for 'use cache' in the Next.js docs, reads the current API specification, then implements correctly:

'use cache'
import { cacheTag } from 'next/cache';

async function ProductList() {
  cacheTag('products');
  const products = await getAllProducts();
  // ...
}

Why This Matters

1. Reduced Hallucination

By grounding responses in official documentation, the AI is less likely to invent APIs or use deprecated patterns. This is particularly important for:

  • Rapidly evolving frameworks — Next.js, React
  • New features — Server Components, RSC payload, turbopack
  • Complex conventions — file-system routing, intercepting routes

2. Version Awareness

Documentation is indexed with version information. When Next.js 15 introduces breaking changes, the AI consults Next.js 15 docs — not training data from Next.js 13.

3. Deterministic Accuracy

Instead of probabilistic recall from training, the AI performs deterministic lookup. For questions like “what’s the exact API for cache invalidation?”, documentation search provides the authoritative answer.


The Technical Implementation

Claude Code Runner Architecture

The evaluation framework spawns Claude Code as a subprocess with MCP configuration:

class ClaudeCodeRunner {
  private async executeClaudeCode(projectDir: string, prompt: string) {
    // Check for MCP config
    const mcpConfigPath = path.join(projectDir, '.mcp.json');
    const mcpConfigExists = existsSync(mcpConfigPath);

    const args = [
      ...(mcpConfigExists ? ['--mcp-config', mcpConfigPath] : []),
      '--print',
      '--dangerously-skip-permissions',
      enhancedPrompt
    ];

    const claudeProcess = spawn('claude', args, {
      cwd: projectDir,
      env
    });
  }
}

Evaluation Scoring

Each eval is scored based on three binary checks:

const evalResults = await this.runEvaluation(outputDir);

// Score calculation
const buildScore = result.buildSuccess ? 1.0 : 0.0;
const lintScore = result.lintSuccess ? 1.0 : 0.0;
const testScore = result.testSuccess ? 1.0 : 0.0;
const overallScore = buildScore * lintScore * testScore;  // All must pass

Nia Tools Available to Claude

ToolPurpose
searchSemantic search across indexed documentation
nia_exploreBrowse documentation structure (tree/ls)
nia_grepRegex search in documentation content
nia_readRead specific documentation pages

Running the Benchmark Yourself

# Clone and setup
git clone https://github.com/nozomio/next-evals-oss
cd next-evals-oss
pnpm install

# Set environment variables
export ANTHROPIC_API_KEY="your-anthropic-key"
export NIA_API_KEY="your-nia-key"
export NIA_NEXTJS_DOCS_ID="a7e29fbd-213b-478a-9679-62a05023cffa"
export NIA_AISDK_DOCS_ID="4f80d3ee-e9e5-41f4-8593-faa63af93dd0"

# Run baseline (Claude Code only)
bun cli.ts --all --claude-code

# Run with Nia
bun cli.ts --all --claude-code --with-nia --with-hooks nia-mcp

Conclusion

The results demonstrate a clear pattern: AI coding agents perform measurably better when grounded in official documentation.

The 38% relative improvement (58% → 80%) isn’t just about passing more tests — it’s about:

  1. Reliability — Using documented APIs rather than hallucinated ones
  2. Maintainability — Code that follows current best practices
  3. Confidence — Developers can trust the generated code more

As frameworks continue to evolve faster than model training cycles, documentation grounding becomes not just helpful, but essential.


Built with Claude Code and Nia. Benchmark based on vercel/next-evals-oss.