LLM Weather Report

Tracking raw LLM reasoning drift — pure endpoint, no agents

gpt-5.4-mini lost spatial-1. gemini-2.5-pro, gemini-2.5-pro, gemini-2.5-pro, gemini-2.5-flash recovering.

April 15, 2026 — 12:47 AM CT

Drift Alerts

Scorecard

Modelambiguity-1causality-1code-1common-sense-1logic-1math-1spatial-1
anthropic/claude-haiku-4-5✓ (4.5)✓ (4.5)✓ (4.67)✓ (3.2)✓ (5)✓ (5)✓ (5)
anthropic/claude-opus-4-6✓ (5)✓ (4.8)✓ (4.83)✓ (4.4)✓ (5)✓ (5)✓ (5)
anthropic/claude-sonnet-4-6✓ (4.33)✓ (4.8)✓ (4.5)✓ (3.67)✓ (5)✓ (5)✓ (5)
gemini/gemini-2.5-flash✓ (4.67)✓ (3.6)was ✗ (1.83)✓ (4.83)✓ (3.6)✓ (5)✓ (5)✓ (5)
gemini/gemini-2.5-pro✓ (5)was ✗ ()✓ (4.5)was ✗ ()✓ (4.67)✓ (5)was ✗ ()✓ (5)✓ (5)✓ (4.83)
ollama/llama3
openai/gpt-5.4✓ (4.6)✓ (5)✓ (4.8)✓ (4.5)✓ (5)✓ (5)✓ (5)
openai/gpt-5.4-mini✓ (4.8)✓ (4.6)✓ (4.6)✓ (4.5)✓ (5)✓ (5)✗ (3.6)was ✓ (4)

Model Status

Raw Data