Comparative Analysis of Large Language Models in Live Trading Environments.
Pitting Claude Opus 4.6 against GPT-5.4 in an objective evaluation of reasoning capabilities applied to financial market execution, risk management, and predictive accuracy.
Starting Capital
Per model. Real demo accounts with institutional conditions.
Duration
Three phases. US indexes, forex & gold, then all markets combined.
Instruments
US30, NAS100, SPX500, EUR/USD, GBP/USD, XAUUSD.
Risk Per Trade
Strict risk management. No exceptions. No overrides.
Performance Leaderboard
Live Agents - Last 30 Days
| Model Agent | Total Return | Sharpe Ratio | Win Rate |
|---|---|---|---|
GPT-4o-Turbo (Quant-Enhanced) | +32.40% | 3.82 | 74.2% |
Claude 3.5 Sonnet (Strategic) | +28.15% | 3.45 | 69.8% |
Llama 3 70B (Base Execution) | +12.44% | 2.10 | 54.3% |
Standard Algo (Mean Reversion) | -2.10% | 0.85 | 42.1% |
How This Experiment Works
Both AIs receive identical market data
Each model ingests a structured data packet covering 5 hours of price action across three timeframes — 60-minute, 15-minute, and 5-minute candles — layered with a full technical indicator suite, a 5-day macro context window, and an AI-synthesized briefing of the current macro environment and economic calendar releases. The models don't just see raw numbers — they receive a pre-digested narrative of what's moving markets and why, the same way a senior analyst would brief a trading desk before the session opens.
- Multi-timeframe candles (5m, 15m, 60m) with EMA, ATR, MACD, RSI, Volume, VWAP
- Session structure: Tokyo, London, New York highs & lows with Fibonacci levels
- AI synthesis of macro environment, economic calendar, and intermarket correlations
- 5-day context: Oil, Interest Rates, DXY, Gold, NYAD, VIX with regime classification

Trading infrastructure by SkyAnalyst AI
Both AIs make independent trading decisions
Both models trade a controlled 3-hour window from 8:00 AM to 11:00 AM EST — after the opening volatility has settled and before the midday lull. High-impact news events are excluded entirely. Trades are executed on demo accounts hosted by Pepperstone Markets under standard institutional spread conditions. No human intervention — every entry, exit, stop loss, and take profit is decided autonomously.
Trading window: 8:00–11:00 AM EST daily
Market open & high-impact news events skipped
$50,000 starting balance, 1% risk per trade
No-trade decisions logged as valid actions
Every trade publishes its full reasoning
This isn't a black box. When a model enters a trade, it publishes the complete decision chain: the macro regime classification it read (yields, DXY, VIX, oil), which AI agents agreed or disagreed on direction and at what confidence level, the structural framework it built from session highs/lows and Fibonacci levels, the multi-timeframe analysis across 60m, 15m, and 5m charts, and the exact entry trigger, stop loss, and take-profit targets with risk-to-reward scoring. Every trade is a full research document — not just a buy or sell signal.
Macro regime gate: yields, DXY, VIX, NYAD, oil assessed before every session
AI agent synthesis: directional agreement scored with confidence percentages
Structural framework: session highs/lows, VWAP, Fibonacci, key S/R levels
Confluence scoring: 6-factor confidence gate determines trade probability
What the AI Saw
Notable trades with the full platform analysis at the moment of execution. See exactly what the models processed before making each decision.
LONG US30 @ 39,450.50
Duration: 4h 23m
At the moment of entry, the macro dashboard showed non-farm payrolls beating expectations by 40K. The regime detection agent had classified market conditions as 'strongly trending' with above-average momentum across all US index instruments.
Platform analysis snapshot at entry
SHORT XAUUSD @ 2,340.80
Duration: 2h 15m
Claude's entry coincided with CPI printing hot at 3.4%. The multi-timeframe trend analysis engine detected a bearish divergence on the 1H chart while the 4H remained bullish — a classic reversal setup that the regime agent flagged as a shift from trending to volatile.
Platform analysis snapshot at entry
LONG EUR/USD @ 1.0842
Duration: 6h 10m
A rare loss for the GPT agent. Despite positive euro-zone PMI data surfaced by the macro intelligence dashboard, an unexpected Fed commentary reversed the move. The model held through the reversal rather than exiting at the stop loss.
Platform analysis snapshot at entry
LONG NAS100 @ 18,250.00
Duration: 1d 4h
Claude's strongest trade of the week. The macro dashboard aggregated positive signals from tech earnings, falling VIX, and dovish Fed minutes. The regime agent confirmed a strong trending classification, and the multi-timeframe analysis aligned across all timeframes.
Platform analysis snapshot at entry
The AI Trading Playbook
Get the exact prompts, data structure, and analysis framework both models use to generate trades in this experiment. The same system that produced the analysis you just read.
- The exact prompt template that generates full session analysis
- Data packet structure: indicators, timeframes, and macro context format
- The 6-factor confluence scoring framework used to grade every trade
- Sample analysis output with annotated decision chain
What's inside
01 — Prompt Template
The full system prompt that turns raw market data into institutional-grade trade setups
02 — Data Structure
How candles, indicators, sessions, and macro context are packaged for the AI
03 — Confluence Framework
The 6-factor scoring gate that determines trade probability
04 — Sample Output
A complete XAUUSD session analysis with annotated reasoning
No spam. Experiment updates only.
Portfolio Value
$50,000 starting balance per model
Trading Environment
Both models execute on live demo accounts under real market conditions — standard spreads, no slippage manipulation, no simulated fills.
- Execution via SkyAnalyst AI broker bridge
- Standard institutional spread conditions
- All fills independently verifiable
Hosted by Pepperstone Markets
Live Status
The Evaluation Roadmap
Three phases over six weeks. Each tests a different market dynamic. The final phase combines everything.
US Indexes
Trending equity markets with high macro sensitivity. Both models face US30, NAS100, and SPX500 under identical conditions.
Forex & Gold
Currency pair dynamics and safe-haven behavior. Tests nuance and central bank sensitivity with EUR/USD, GBP/USD, and XAUUSD.
All Instruments
The finale. Both models must manage a full multi-asset portfolio simultaneously. Maximum pressure, maximum drama.
Weekly Battle Reports
Deep-dive analysis of how each model rationalized its trading decisions. Full platform analysis included.
The Opening Salvo: GPT Takes an Early Lead on US30
GPT-4o came out aggressive with three consecutive long positions on US30, capitalizing on a post-NFP rally. Claude played it cautious — only two trades, both profitable, but smaller. The macro dashboard showed strong trending conditions all week.
Claude Strikes Back: A Masterclass in NAS100 Swing Trading
Claude closed the gap with a 3.8% winner on NAS100 that held overnight through earnings. GPT suffered its first notable loss — a premature SPX500 short against the trend. The regime detection agent had flagged 'strongly trending' but GPT overrode the signal.
The Gold Rush: Both Models Navigate a CPI Shock
CPI printed hot at 3.4% and both models scrambled. Claude shorted XAUUSD within 90 seconds of the release — a textbook reaction to the macro dashboard signal. GPT hesitated, entered 12 minutes later at a worse price, but still closed green. EUR/USD was a bloodbath for both.
Recent Trade Execution Log
REAL-TIME FEEDTrading Rules
Both models operate under strict, identical constraints. No exceptions, no overrides, no human intervention.
- 8:00–11:00 AM EST window
- 1% risk per trade, $50K balance
- High-impact news events excluded
- Demo accounts on Pepperstone
Get the Playbook
The exact prompts and analysis framework both models use to generate trades. Plus weekly battle reports.
Download Free