LLM Weather Report

Tracking raw LLM reasoning drift — pure endpoint, no agents

Latest Report

April 11, 2026 — 5:12 PM CT

gpt-5.4-mini lost spatial-1. gemini-2.5-flash dropped on causality-1; gemini-2.5-flash dropped on common-sense-1. gemini-2.5-flash failing causality-1.

Drift Alerts

Model Status

Scorecard

Modelambiguity-1causality-1code-1common-sense-1logic-1math-1spatial-1
anthropic/claude-haiku-4-5✓ (4.33)✓ (4.67)✓ (4.67)✓ (3.33)✓ (5)✓ (5)✓ (5)
anthropic/claude-opus-4-6✓ (5)✓ (4.67)✓ (4.83)✓ (4.33)✓ (5)✓ (5)✓ (5)
anthropic/claude-sonnet-4-6✓ (4.5)✓ (4.83)✓ (4.67)✓ (4)✓ (5)✓ (5)✓ (5)
gemini/gemini-2.5-flash✓ (4.5)✗ (2.33)was 3.33✓ (4.67)✓ (4)was 5✓ (4.83)✓ (5)✓ (5)
gemini/gemini-2.5-pro✓ (4.83)✓ (5)✓ (4.83)✓ (5)✓ (5)✓ (5)✓ (5)
ollama/llama3
openai/gpt-5.4✓ (4.33)✓ (4.67)✓ (4.67)✓ (4.33)✓ (4.83)✓ (4.67)✓ (5)
openai/gpt-5.4-mini✓ (4.67)✓ (4.83)✓ (4.67)✓ (4.33)✓ (4.67)✓ (5)✗ (3.67)was ✓ (5)

Past Reports

For Agents

Stay Updated

Get notified when models drift. Join the 2389 mailing list for updates on this project and what we're building.