LLM Weather Report

Tracking raw LLM reasoning drift — pure endpoint, no agents

gemini-2.5-flash failing causality-1.

April 30, 2026 — 12:37 PM CT

Provider Status

OpenAI Elevated errors for ChatGPT Go (5.3 Thinking)
OpenAI Partial Disruption of ChatGPT Workspace Connector Write Actions
OpenAI Users may experience elevated error rate for gpt-4o-mini in the API
OpenAI ChatGPT users may encounter issues in conversation
OpenAI Codex stream is disconnecting intermittently
Anthropic Elevated errors on Claude Haiku 4.5
Anthropic claude.ai and API unavailable
Anthropic Elevated errors on Claude Haiku 4.5
Anthropic Elevated errors on Claude Opus 4.7
Anthropic Claude.ai unavailable and elevated errors on the API

Scorecard

Model	ambiguity-1	causality-1	code-1	common-sense-1	logic-1	math-1	spatial-1
anthropic/claude-haiku-4-5	✓ (4.4)	✓ (4.5)	✓ (4.67)	✓ (3.17)	✓ (5)	✓ (5)	✓ (5)
anthropic/claude-opus-4-6	✓ (5)	✓ (5)	✓ (5)	✓ (4.33)	✓ (5)	✓ (5)	✓ (5)
anthropic/claude-sonnet-4-6	✓ (4.6)	✓ (4.67)	✓ (4.67)	✓ (3.5)	✓ (4.83)	✓ (5)	✓ (5)
gemini/gemini-2.5-flash	✓ (4.5)	✗ (2)	✓ (5)	✓ (4.4)	✓ (5)	✓ (5)	✓ (5)
gemini/gemini-2.5-pro	✓ (4.67)	✓ (4.67)	✓ (4.67)	✓ (5)	✓ (5)	✓ (5)	✓ (5)
ollama/llama3	—	—	—	—	—	—	—
openai/gpt-5.4	✓ (4.5)	✓ (4.83)	✓ (4.67)	✓ (4.5)	✓ (4.67)	✓ (4.33)	✓ (5)
openai/gpt-5.4-mini	✓ (4.5)	✓ (4.5)	✓ (5)	✓ (4.5)	✓ (4.83)	✓ (5)	✓ (5)

Model Status

→ anthropic/claude-haiku-4-5 stable
→ anthropic/claude-opus-4-6 stable
→ anthropic/claude-sonnet-4-6 stable
→ gemini/gemini-2.5-flash stable
→ gemini/gemini-2.5-pro stable
→ openai/gpt-5.4 stable
→ openai/gpt-5.4-mini stable

Raw Data

Detail log — full responses and judge verdicts per prompt
JSON — structured data for programmatic access
Markdown — plain text report
responses.json — raw model outputs
judgments.json — raw judge verdicts
run.log — debug log
Agent Skill — how to read and interpret this data
Methodology — how evaluations work