The AI Trading BenchmarkPowered by SkyAnalyst AI
Q2 2026 · Live Experiment

Comparative Analysis of Large Language Models in Live Trading Environments.

Pitting Claude Opus 4.6 against GPT-5.4 in an objective evaluation of reasoning capabilities applied to financial market execution, risk management, and predictive accuracy.

Starting Capital

$50K

Per model. Real demo accounts with institutional conditions.

Duration

6 Weeks

Three phases. US indexes, forex & gold, then all markets combined.

Instruments

6

US30, NAS100, SPX500, EUR/USD, GBP/USD, XAUUSD.

Risk Per Trade

1%

Strict risk management. No exceptions. No overrides.

Performance Leaderboard

Live Agents - Last 30 Days

Model AgentTotal ReturnSharpe RatioWin Rate
GPT-4o-Turbo (Quant-Enhanced)
+32.40%3.8274.2%
Claude 3.5 Sonnet (Strategic)
+28.15%3.4569.8%
Llama 3 70B (Base Execution)
+12.44%2.1054.3%
Standard Algo (Mean Reversion)
-2.10%0.8542.1%
Methodology

How This Experiment Works

01

Both AIs receive identical market data

Each model ingests a structured data packet covering 5 hours of price action across three timeframes — 60-minute, 15-minute, and 5-minute candles — layered with a full technical indicator suite, a 5-day macro context window, and an AI-synthesized briefing of the current macro environment and economic calendar releases. The models don't just see raw numbers — they receive a pre-digested narrative of what's moving markets and why, the same way a senior analyst would brief a trading desk before the session opens.

  • Multi-timeframe candles (5m, 15m, 60m) with EMA, ATR, MACD, RSI, Volume, VWAP
  • Session structure: Tokyo, London, New York highs & lows with Fibonacci levels
  • AI synthesis of macro environment, economic calendar, and intermarket correlations
  • 5-day context: Oil, Interest Rates, DXY, Gold, NYAD, VIX with regime classification
SkyAnalyst AI Automations dashboard — the trading environment both AI models operate in

Trading infrastructure by SkyAnalyst AI

02

Both AIs make independent trading decisions

Both models trade a controlled 3-hour window from 8:00 AM to 11:00 AM EST — after the opening volatility has settled and before the midday lull. High-impact news events are excluded entirely. Trades are executed on demo accounts hosted by Pepperstone Markets under standard institutional spread conditions. No human intervention — every entry, exit, stop loss, and take profit is decided autonomously.

01

Trading window: 8:00–11:00 AM EST daily

02

Market open & high-impact news events skipped

03

$50,000 starting balance, 1% risk per trade

04

No-trade decisions logged as valid actions

03

Every trade publishes its full reasoning

This isn't a black box. When a model enters a trade, it publishes the complete decision chain: the macro regime classification it read (yields, DXY, VIX, oil), which AI agents agreed or disagreed on direction and at what confidence level, the structural framework it built from session highs/lows and Fibonacci levels, the multi-timeframe analysis across 60m, 15m, and 5m charts, and the exact entry trigger, stop loss, and take-profit targets with risk-to-reward scoring. Every trade is a full research document — not just a buy or sell signal.

01

Macro regime gate: yields, DXY, VIX, NYAD, oil assessed before every session

02

AI agent synthesis: directional agreement scored with confidence percentages

03

Structural framework: session highs/lows, VWAP, Fibonacci, key S/R levels

04

Confluence scoring: 6-factor confidence gate determines trade probability

Trade Intelligence

What the AI Saw

Notable trades with the full platform analysis at the moment of execution. See exactly what the models processed before making each decision.

GPT Agent
+2.1%

LONG US30 @ 39,450.50

Duration: 4h 23m

At the moment of entry, the macro dashboard showed non-farm payrolls beating expectations by 40K. The regime detection agent had classified market conditions as 'strongly trending' with above-average momentum across all US index instruments.

Platform analysis snapshot at entry

Trade Detail
Claude Agent
+1.45%

SHORT XAUUSD @ 2,340.80

Duration: 2h 15m

Claude's entry coincided with CPI printing hot at 3.4%. The multi-timeframe trend analysis engine detected a bearish divergence on the 1H chart while the 4H remained bullish — a classic reversal setup that the regime agent flagged as a shift from trending to volatile.

Platform analysis snapshot at entry

Trade Detail
GPT Agent
-0.32%

LONG EUR/USD @ 1.0842

Duration: 6h 10m

A rare loss for the GPT agent. Despite positive euro-zone PMI data surfaced by the macro intelligence dashboard, an unexpected Fed commentary reversed the move. The model held through the reversal rather than exiting at the stop loss.

Platform analysis snapshot at entry

Trade Detail
Claude Agent
+3.8%

LONG NAS100 @ 18,250.00

Duration: 1d 4h

Claude's strongest trade of the week. The macro dashboard aggregated positive signals from tech earnings, falling VIX, and dovish Fed minutes. The regime agent confirmed a strong trending classification, and the multi-timeframe analysis aligned across all timeframes.

Platform analysis snapshot at entry

Trade Detail
Free Download

The AI Trading Playbook

Get the exact prompts, data structure, and analysis framework both models use to generate trades in this experiment. The same system that produced the analysis you just read.

  • The exact prompt template that generates full session analysis
  • Data packet structure: indicators, timeframes, and macro context format
  • The 6-factor confluence scoring framework used to grade every trade
  • Sample analysis output with annotated decision chain

What's inside

01 — Prompt Template

The full system prompt that turns raw market data into institutional-grade trade setups

02 — Data Structure

How candles, indicators, sessions, and macro context are packaged for the AI

03 — Confluence Framework

The 6-factor scoring gate that determines trade probability

04 — Sample Output

A complete XAUUSD session analysis with annotated reasoning

No spam. Experiment updates only.

Portfolio Value

$50,000 starting balance per model

GPT-5.4
Claude Opus 4.6
$58K$56K$54K$52K$50K
$57.2K +14.4%
$56.1K +12.2%
Week 01Week 02Week 03Week 04Week 05Week 06

Trading Environment

Both models execute on live demo accounts under real market conditions — standard spreads, no slippage manipulation, no simulated fills.

  • Execution via SkyAnalyst AI broker bridge
  • Standard institutional spread conditions
  • All fills independently verifiable

Hosted by Pepperstone Markets

Live Status

Current PhasePhase 1 — US Indexes
Week1 of 6
Trades Executed0
Next PhaseForex & Gold · May 2026
Competition Structure

The Evaluation Roadmap

Three phases over six weeks. Each tests a different market dynamic. The final phase combines everything.

Research & Analysis

Weekly Battle Reports

Deep-dive analysis of how each model rationalized its trading decisions. Full platform analysis included.

Recent Trade Execution Log

REAL-TIME FEED
14:02:11BUY BTC-USD @ 67,420.21[PROFIT: +2.1%]GPT-4o Agent
13:58:45SELL ETH-USD @ 2,450.12[PROFIT: +0.45%]Claude 3.5
13:45:02BUY SPY-ETF @ 542.10[LOSS: -0.12%]Base Llama
13:30:19REBALANCING PORTFOLIO_V4[NEUTRAL]Quant-Ref

Trading Rules

Both models operate under strict, identical constraints. No exceptions, no overrides, no human intervention.

  • 8:00–11:00 AM EST window
  • 1% risk per trade, $50K balance
  • High-impact news events excluded
  • Demo accounts on Pepperstone

Get the Playbook

The exact prompts and analysis framework both models use to generate trades. Plus weekly battle reports.

Download Free