pineforgeGet started
Engineering

When two Pine engines disagree: cross-validating PineForge against PyneCore

We run every release through a parity sweep against both TradingView and PyneCore. Here's what the numbers from the latest sweep look like, and why having a friendly second-source engine is the most useful debugging tool we own.

9 min read#pinecore#pineforge#parity#engineering

There are exactly two production-quality off-platform Pine engines that we know of: ours, and PyneCore. PyneSys's PyneComp transpiles Pine to Python; PyneCore is the open-source Python runtime that executes the result. Different language, different runtime, same source material, same API definitions to honour.

We treat PyneCore as our second-source oracle. Every PineForge release runs the parity sweep against both TradingView's "List of Trades" CSV exports and PyneCore's executed output. When all three agree on a strategy's trades, we believe the result. When two agree and one diverges, the divergent one almost always has a bug — usually ours.

This post is what the numbers from the latest sweep look like.

The corpus

50 reference strategies in the public sweep (the broader 162-strategy corpus is private; this subset is what we publish in the trade-comparison report). Each strategy gets:

The headline

Of the 50 strategies in the sweep, 47 hit the same tier in PineForge and PyneCore. Both engines hit excellent (≥95% trade match against TV's CSV export) on 47, with PineForge hitting strong on 2 of those instead — an artifact of how PineForge's code-delta accounting handles a couple of border-case strategies.

The 3 that disagree by tier are interesting cases. All 3 are PineForge-higher.

StrategyPineForgePyneCoreNotes
49-partial-exit-qty-percent🟢 excellent🟠 weakPyneCore over-emits exit fills
06-liquidity-sweep🟢 excellent🟡 moderateExit timing drift on Pyne side
07-scalping-strategy🟢 excellent🟡 moderatePnL p90 high on Pyne; entries match

The interesting one

49-partial-exit-qty-percent is the largest gap. The strategy uses strategy.exit(qty_percent=...) to peel off partial position size on profit-take, and the cycle repeats — fully exited, re-entered, partially exited again. Across the test window:

TV trades (raw, in-window):  725
PineForge engine trades:     852  →  725 in-window  →  725 matched (100% of TV)
PyneCore  engine trades:    3297  →  2805 in-window →  582 matched (80.3% of TV)

PyneCore's count delta on this strategy is +74%. PineForge's is 0%. Looking at the per-decile metrics:

PineForge: count_delta 0.0000% · entry p90 0.0000% · exit p90 0.0004% · PnL p90 0.1321%
PyneCore : count_delta 74.1533% · entry p90 0.0000% · exit p90 1.0376% · PnL p90 (high)

Translation: both engines fire the entries at the right bars (entry p90 is 0% for both — perfect alignment). The disagreement is on exits. PyneCore generates roughly 4× as many exit fills as TradingView does for the same strategy — most likely an over-firing of partial-exit cycles where the same qty_percent clause re-arms before TV's order processor considers it complete.

This is not a value judgement on PyneCore's quality — it's a corner case in Pine's strategy.exit() semantics, and we've ironed out our handling of it after the parity sweep flagged it. The bug-finding game flows in both directions: we've found bugs in our own codegen multiple times by seeing PyneCore agree with TV when we didn't.

The other two

06-liquidity-sweep and 07-scalping-strategy are smaller-magnitude versions of the same shape. Both have entry alignment at 100% and exit drift around 1-2 percentage points (PyneCore) vs <0.05% (PineForge). Trail-style exits and partial-fill semantics are where Pine's reference implementation is least documented; both engines have to reverse-engineer it from observable trade behavior.

Calling the PyneCore compile API

PyneSys exposes the compile step over HTTP at api.pynesys.io. You can hit it directly with curl:

curl -X POST https://api.pynesys.io/compiler/compile \
  -H "Authorization: Bearer pyne_..." \
  --data-urlencode 'script=//@version=6
strategy("ma cross", overlay=true)
if ta.crossover(close, ta.sma(close, 20))
    strategy.entry("L", strategy.long)'

The response is a Python file that targets the open-source pynecore runtime:

"""
@pyne
 
This code was compiled by PyneComp v6.0.31 — the Pine Script to Python compiler.
Run with open-source PyneCore: https://pynecore.org
Compile Pine Scripts online at PyneSys: https://pynesys.io
"""
from pynecore.lib import close, script, strategy, ta
 
 
@script.strategy("ma cross", overlay=True)
def main():
    if ta.crossover(close, ta.sma(close, 20)):
        strategy.entry('L', strategy.long)
 
 
if __name__ == "__main__":
    from pynecore.standalone import run
    run(__file__)

Drop that into a Python file, pip install pynecore, point it at your OHLCV, and you get the second-engine output to diff against PineForge.

Why this matters

A single backtest engine reporting "your strategy returns 12.4% over the test window" tells you exactly one thing: that engine claims that result. If you have nothing to compare against, you have to take the engine's word for it.

A pair of independent engines reporting the same result on the same script is much stronger evidence. When they disagree, the disagreement itself is a hypothesis: one of them has a bug, the corner case in Pine semantics is ambiguous, or the test data has an edge condition neither engine handles. All three are valuable to know.

We use PyneCore this way every release. If you're shipping a strategy you'd put real money behind, you should too.

What's not in this post

Try the cross-validation pattern yourself

Two engines, one source, three outputs to compare. If they all agree, ship it. If they disagree, you've learned something useful before any real money was on the line.