When two Pine engines disagree: cross-validating PineForge against PyneCore
We run every release through a parity sweep against both TradingView and PyneCore. Here's what the numbers from the latest sweep look like, and why having a friendly second-source engine is the most useful debugging tool we own.
There are exactly two production-quality off-platform Pine engines that we know of: ours, and PyneCore. PyneSys's PyneComp transpiles Pine to Python; PyneCore is the open-source Python runtime that executes the result. Different language, different runtime, same source material, same API definitions to honour.
We treat PyneCore as our second-source oracle. Every PineForge release runs the parity sweep against both TradingView's "List of Trades" CSV exports and PyneCore's executed output. When all three agree on a strategy's trades, we believe the result. When two agree and one diverges, the divergent one almost always has a bug — usually ours.
This post is what the numbers from the latest sweep look like.
The corpus
50 reference strategies in the public sweep (the broader 162-strategy corpus is private; this subset is what we publish in the trade-comparison report). Each strategy gets:
- Compiled to Python by PyneComp, executed by PyneCore on the canonical OHLCV
- Compiled to C++ by PineForge codegen, run by
pineforge-engineon the same OHLCV - Both diffed against TradingView's CSV export, window-clipped to
[OHLCV span] ∩ [TV entry span] ∩ [engine entry span] - Match-degree assigned per the 5-tier scale (excellent / strong / moderate / weak / minimal)
The headline
Of the 50 strategies in the sweep, 47 hit the same tier in PineForge and
PyneCore. Both engines hit excellent (≥95% trade match against TV's CSV
export) on 47, with PineForge hitting strong on 2 of those instead — an
artifact of how PineForge's code-delta accounting handles a couple of
border-case strategies.
The 3 that disagree by tier are interesting cases. All 3 are PineForge-higher.
| Strategy | PineForge | PyneCore | Notes |
|---|---|---|---|
49-partial-exit-qty-percent | 🟢 excellent | 🟠 weak | PyneCore over-emits exit fills |
06-liquidity-sweep | 🟢 excellent | 🟡 moderate | Exit timing drift on Pyne side |
07-scalping-strategy | 🟢 excellent | 🟡 moderate | PnL p90 high on Pyne; entries match |
The interesting one
49-partial-exit-qty-percent is the largest gap. The strategy uses
strategy.exit(qty_percent=...) to peel off partial position size on
profit-take, and the cycle repeats — fully exited, re-entered, partially
exited again. Across the test window:
TV trades (raw, in-window): 725
PineForge engine trades: 852 → 725 in-window → 725 matched (100% of TV)
PyneCore engine trades: 3297 → 2805 in-window → 582 matched (80.3% of TV)PyneCore's count delta on this strategy is +74%. PineForge's is 0%. Looking at the per-decile metrics:
PineForge: count_delta 0.0000% · entry p90 0.0000% · exit p90 0.0004% · PnL p90 0.1321%
PyneCore : count_delta 74.1533% · entry p90 0.0000% · exit p90 1.0376% · PnL p90 (high)Translation: both engines fire the entries at the right bars (entry p90
is 0% for both — perfect alignment). The disagreement is on exits.
PyneCore generates roughly 4× as many exit fills as TradingView does for
the same strategy — most likely an over-firing of partial-exit cycles
where the same qty_percent clause re-arms before TV's order processor
considers it complete.
This is not a value judgement on PyneCore's quality — it's a corner case
in Pine's strategy.exit() semantics, and we've ironed out our handling
of it after the parity sweep flagged it. The bug-finding game flows in
both directions: we've found bugs in our own codegen multiple times by
seeing PyneCore agree with TV when we didn't.
The other two
06-liquidity-sweep and 07-scalping-strategy are smaller-magnitude
versions of the same shape. Both have entry alignment at 100% and exit
drift around 1-2 percentage points (PyneCore) vs <0.05% (PineForge).
Trail-style exits and partial-fill semantics are where Pine's reference
implementation is least documented; both engines have to reverse-engineer
it from observable trade behavior.
Calling the PyneCore compile API
PyneSys exposes the compile step over HTTP at api.pynesys.io. You can
hit it directly with curl:
curl -X POST https://api.pynesys.io/compiler/compile \
-H "Authorization: Bearer pyne_..." \
--data-urlencode 'script=//@version=6
strategy("ma cross", overlay=true)
if ta.crossover(close, ta.sma(close, 20))
strategy.entry("L", strategy.long)'The response is a Python file that targets the open-source pynecore
runtime:
"""
@pyne
This code was compiled by PyneComp v6.0.31 — the Pine Script to Python compiler.
Run with open-source PyneCore: https://pynecore.org
Compile Pine Scripts online at PyneSys: https://pynesys.io
"""
from pynecore.lib import close, script, strategy, ta
@script.strategy("ma cross", overlay=True)
def main():
if ta.crossover(close, ta.sma(close, 20)):
strategy.entry('L', strategy.long)
if __name__ == "__main__":
from pynecore.standalone import run
run(__file__)Drop that into a Python file, pip install pynecore, point it at your
OHLCV, and you get the second-engine output to diff against PineForge.
Why this matters
A single backtest engine reporting "your strategy returns 12.4% over the test window" tells you exactly one thing: that engine claims that result. If you have nothing to compare against, you have to take the engine's word for it.
A pair of independent engines reporting the same result on the same script is much stronger evidence. When they disagree, the disagreement itself is a hypothesis: one of them has a bug, the corner case in Pine semantics is ambiguous, or the test data has an edge condition neither engine handles. All three are valuable to know.
We use PyneCore this way every release. If you're shipping a strategy you'd put real money behind, you should too.
What's not in this post
- Indicator-level drift. A separate report tracks per-indicator
agreement between PineForge, PyneCore, and TradingView's reference
values. Lives in
benchmarks/results/indicator_comparison.mdin the engine repo. - The other ~110 strategies in our internal corpus that aren't in the public sweep. Those exercise Pine corner cases (UDT methods, multi-timeframe lookups, OCA exit groups) and ship the parity sweep internally. Expect a future post once the methodology is published.
Try the cross-validation pattern yourself
- Get a free PineForge codegen API key
- Get a PyneSys API key
- Pick a Pine strategy. Run both pipelines on the same OHLCV CSV. Diff the resulting trade lists.
Two engines, one source, three outputs to compare. If they all agree, ship it. If they disagree, you've learned something useful before any real money was on the line.