Tooling

Backtest a Pine strategy from Claude Code in 90 seconds

Walk-through of the @pineforge/codegen-mcp server: install in one npx command, ask Claude to transpile your Pine, run a Docker backtest, and read the trade list back. Your OHLCV never leaves the machine.

May 8, 20267 min read#mcp#ai#claude#cursor#tooling

The PineForge codegen has lived behind a curl for the last few months. As of this week, it also lives behind a Model Context Protocol server, which means you can drive a backtest from inside any MCP-aware client — Claude Desktop, Claude Code, Cursor, Continue.dev, and the growing list of others.

This is a walkthrough of what that looks like end-to-end. Total wall-clock time from "empty repo" to "I have a JSON backtest report": about 90 seconds, plus the one-time Docker pull.

What the server actually does

The MCP server (@pineforge/codegen-mcp on npm) is a thin local stdio bridge. It exposes four tools to your AI client:

| Tool | Runs on | Cost | |---|---|---| | transpile_pine | Hosted codegen API | counts against your quota (refunded on compile error) | | get_quota | Hosted codegen API | free | | backtest_pine | Your local Docker daemon | counts 1 (the transpile inside) | | pull_engine_image | Your local Docker daemon | free |

The privacy surface is the bit worth pausing on. The Pine source travels to the hosted codegen at codegen.pineforge.dev. The OHLCV CSV does not — backtest_pine runs the resulting strategy.cpp against the file you point it at via your local Docker daemon, with the file mounted into the container as a read-only volume. No network access from the runtime container. Your data stays on your laptop.

Install (one command)

npx -y @pineforge/codegen-mcp

That's it. The package downloads on first run, compiles its TypeScript on the fly, and starts speaking MCP over stdio. No global install required. No build step.

You'll need three things present:

Node ≥ 20 (most recent macOS/Linux distros qualify)
Docker daemon running locally (the backtest_pine tool exec's docker run)
A PineForge API key — pf_… — that you got via the pineforge.dev waitlist

Wire it into Claude Desktop

Open the config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

Add an entry under mcpServers:

{
  "mcpServers": {
    "pineforge-codegen": {
      "command": "npx",
      "args": ["-y", "@pineforge/codegen-mcp"],
      "env": {
        "PINEFORGE_API_KEY": "pf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
      }
    }
  }
}

Restart Claude Desktop. The server will be listed under "Available tools" in the sidebar. The first tool call will hit the hosted codegen and run docker pull for the runtime image — a one-time ~250 MB download.

Wire it into Claude Code (CLI)

If you're already in a terminal session, this is shorter:

claude mcp add pineforge-codegen \
  --transport stdio \
  --env PINEFORGE_API_KEY=pf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
  -- npx -y @pineforge/codegen-mcp

Now claude knows about the server for this project. Re-launch the session and the four tools become available.

Cursor

Settings → MCP → New MCP Server → paste the same JSON block as the Claude Desktop config above. Cursor handles the lifecycle.

Drive it with prose

Once the server is wired in, the interaction model is just English prompts. A realistic exchange (slightly compressed):

You: I have a Pine v6 strategy in ./strategy.pine and 7 days of 15-minute ETHUSDT data in ./eth_15m.csv. Backtest it and tell me the Sharpe and max drawdown.

Claude: Running backtest_pine on those files…

[tool call: backtest_pine({ source: "...", ohlcv_csv_path: "./eth_15m.csv" }) returns in ~3 seconds]

Backtest complete. 49 trades over the 7-day window. Net PnL −$190.85, Sharpe 0.21, max drawdown $312. The strategy enters short on the 3rd losing trade and never recovers — would you like me to look at what triggered that entry, or sweep the entry threshold?

This is the productive surface. The model has the tool descriptions, knows what backtest_pine returns, and can chain the next call without you typing the curl yourself. The friction of "edit Pine → save → switch to TradingView → reload chart → read numbers → switch back to editor" collapses into a single conversation.

What the report looks like

backtest_pine returns the same JSON shape the standalone Docker image produces. The summary block:

{
  "engine": "pineforge",
  "summary": {
    "total_trades": 49,
    "net_pnl": -190.85,
    "max_drawdown": 312.0,
    "sharpe": 0.21,
    "profit_factor": 0.78,
    "win_rate": 0.43
  },
  "trades": [
    /* 49 entries with timestamps, prices, PnL */
  ],
  "elapsed_seconds": 0.0042,
  "_meta": {
    "strategy_cpp_bytes": 5079,
    "image": "ghcr.io/fullpass-4pass/pineforge-engine:latest"
  }
}

The model gets the full structure. It can answer follow-ups like "what was the worst trade?" by scanning trades[] itself, no second tool call needed.

Quota awareness

Every conversation that uses the codegen API counts against your quota. The get_quota tool exists so the model can check before re-running an expensive parameter sweep. Free tier is 100 transpiles per month — plenty for hobby work and CI smoke tests, less when you're driving an iterative optimization loop.

A useful pattern: in your project's CLAUDE.md (or equivalent), add a hint like If asked to optimize a strategy, call get_quota first and report remaining budget before kicking off >5 transpile calls.

What's deliberately NOT in the server

No live order placement. The server is read-only against your strategy and data; it doesn't connect to a broker. Live trading is a separate concern.
No data fetching. You bring the OHLCV. The server doesn't pull from exchanges or feed providers — you control where the bytes come from and what gets backtested.
No state between calls. Every backtest_pine is a fresh Docker container. No persisted cache, no shared session.

These are deliberate scope limits. They keep the surface auditable and the failure modes simple.

Where this becomes interesting

A few patterns this enables that are awkward to do by hand:

"Tighten the stop and re-run." The model edits one number in the Pine source, calls backtest_pine again, compares Sharpe. Loop until you're tired of suggesting variations.
"Try this on the last 30 days instead of the last 7." You point at a different CSV, the model re-runs. No filter UI to navigate.
"What's the parameter sensitivity here?" Sweep one input across a range, collect Sharpe per value, print a small markdown table. Model handles the iteration.
"Compare against this other strategy I have." Two backtest_pine calls, side-by-side report.

None of these require new infrastructure. They emerge from the model + four tools + your prose.

Try it

Get a free codegen API key (waitlist signup, key arrives by email)
Full /ai setup page (Claude Desktop / Claude Code / Cursor configs)
npm package

The package is open source. The server itself is small enough to read in one sitting if you want to audit what gets sent over the wire.