pineforgeGet started
CI / GitHub Actions

Run PineScript backtests in CI (GitHub Actions, GitLab, anything Docker)

Every commit to your strategy repo triggers a backtest. Parity is asserted, regressions fail the build. Same Docker image you run locally, executed in your runner.

Why parity in CI

Strategies drift like any code. A parameter rename, a refactor of a helper function, a Pine v6 API change you didn't notice — any of these can silently shift signals without breaking any existing test. Without a parity gate in CI, a commit you made two months ago can change last week's trade list and you'll only find out when the live P&L looks wrong.

The discipline of committing your reference trade list alongside your strategy source is the same discipline that makes software reliable: you describe the expected output, and your build fails if the actual output diverges. For a backtest, the expected output is the trade list — entry bar, exit bar, direction, size, fill price — against a pinned historical dataset.

When that comparison runs on every commit, you get a ratchet: your strategy's historical behavior can only improve, never accidentally regress. You know exactly which commit changed the output, because the build failed on that commit. You have a git blame for your equity curve.

This matters more than most quants realize at first. The gap between "I tested this strategy" and "I have a reproducible, version-controlled record of every historical signal this strategy has ever produced" is enormous. The first is a screenshot. The second is an audit trail. CI is how you maintain the second without manual effort on every push.

PineForge makes this possible for Pine strategies because the runtime is a Docker image: it runs everywhere Docker runs, it produces deterministic output given the same inputs, and it returns a stable JSON schema that's safe to parse in shell scripts. There's no browser, no authentication flow, no rate limit on the backtest execution itself.

GitHub Actions example

The full workflow: check out your strategy, transpile to a shared object via the codegen API, run the backtest against your pinned OHLCV CSV, parse the JSON report with jq, compare against your committed baseline, and fail the build if the net PnL delta exceeds your threshold.

Add your PineForge API key as a repository secret named PINEFORGE_API_KEY, then create .github/workflows/backtest.yml:

name: Backtest parity

on:
  push:
    branches: [main, "feat/**"]
  pull_request:

jobs:
  backtest:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Transpile strategy to .so
        run: |
          curl -s https://api.pineforge.io/v1/codegen \
            -H "Authorization: Bearer ${{ secrets.PINEFORGE_API_KEY }}" \
            -H "Content-Type: application/json" \
            -d '{"source": "'"$(cat strategy.pine | jq -Rs .)"'"}' \
            | jq -r '.artifact_url' \
            | xargs curl -sL -o strategy.so

      - name: Pull PineForge runtime
        run: docker pull ghcr.io/pineforge/runtime:latest

      - name: Run backtest
        run: |
          docker run --rm \
            -v "$PWD/strategy.so":/strategy.so \
            -v "$PWD/data/ohlcv.csv":/data.csv \
            ghcr.io/pineforge/runtime:latest \
            run /strategy.so --data /data.csv --output json \
            > report.json

      - name: Assert parity
        run: |
          ACTUAL=$(jq '.summary.net_pnl' report.json)
          BASELINE=$(jq '.summary.net_pnl' baseline/report.json)
          DELTA=$(echo "$ACTUAL $BASELINE" | awk '{d=$1-$2; print (d<0)?-d:d}')
          THRESHOLD="0.01"
          if awk "BEGIN{exit !($DELTA > $THRESHOLD)}"; then
            echo "Parity drift: net_pnl delta $DELTA exceeds threshold $THRESHOLD"
            diff <(jq '.trades' baseline/report.json) \
                 <(jq '.trades' report.json)
            exit 1
          fi
          echo "Parity OK — delta $DELTA (threshold $THRESHOLD)"

Store your baseline report at baseline/report.json in the repository, committed alongside the strategy source. When a PR changes strategy behavior intentionally, the author updates the baseline as part of the PR. That diff becomes a permanent record of what changed and why.

The same workflow runs identically on GitLab CI, Bitbucket Pipelines, or any runner that supports Docker — swap the YAML syntax, keep the shell commands.

Exit codes and asserting parity

The PineForge runtime exits with code 0 on a clean run and non-zero on any engine error (malformed Pine, unsupported built-in, data file parse failure). CI systems pick up these exit codes automatically — no special handling needed for engine-level failures.

Parity assertion is a separate concern, implemented in your workflow. The JSON report schema is stable across patch versions:

{
  "summary": {
    "total_trades": 47,
    "net_pnl": 12483.50,
    "max_drawdown": 3201.00,
    "profit_factor": 1.82,
    "sharpe_ratio": 1.34,
    "win_rate": 0.617
  },
  "trades": [
    {
      "entry_bar": 142,
      "exit_bar": 156,
      "direction": "long",
      "size": 1.0,
      "entry_price": 42310.5,
      "exit_price": 44820.0,
      "pnl": 2509.50,
      "exit_reason": "strategy.close"
    }
  ]
}

For a simple numeric parity check, compare summary.net_pnl and summary.total_trades against baseline. For strict trade-for-trade parity, diff the full trades array — entry and exit bar indices, directions, and fill prices should all match. When they don't, the diff output pinpoints exactly which trade diverged, which bar it was on, and what the fill price difference was.

A practical threshold for numeric fields: accept floating-point deltas up to 0.01 (one cent, or one basis point on a normalized series) and fail on anything larger. For trade-for-trade diffs, any mismatch in entry_bar or exit_bar is always a hard failure regardless of PnL proximity — a trade landing on a different bar means your signal logic changed.

Teams that adopt parity CI typically run the baseline update as a separate workflow step gated on a PR label. That way an intentional strategy improvement gets documented in the PR timeline — the baseline diff shows up in code review alongside the Pine change that caused it.

Get started