# LLM Weather Report > Tracking raw LLM reasoning drift — pure endpoint, no agents. A 2389 Research project. Each prompt is a single API call to the model's chat completions endpoint — no tool use, no multi-turn, no agent scaffolding. Tests the model's raw reasoning capability and tracks changes over time. ## Agent Skill For detailed instructions on how to read and interpret LLM Weather data, fetch the agent skill page: - /skill.md ## Runs Each run tests models on 7 reasoning prompts (logic, math, spatial, causality, code, ambiguity, common sense) and evaluates responses individually for correctness (boolean) and reasoning quality (1-5 score). - [2026-04-11T22-12-53](/runs/2026-04-11t22-12-53/): gpt-5.4-mini lost spatial-1. gemini-2.5-flash dropped on causality-1; gemini-2.5-flash dropped on common-sense-1. gemini-2.5-flash failing causality-1. - Markdown: /runs/2026-04-11t22-12-53/report.md - JSON: /runs/2026-04-11t22-12-53/data.json - [2026-04-11T22-12-53 — Detail](/runs/2026-04-11t22-12-53-detail/): - Markdown: /runs/2026-04-11t22-12-53-detail/report.md - JSON: /runs/2026-04-11t22-12-53-detail/data.json - [2026-04-11T17-15-45](/runs/2026-04-11t17-15-45/): gemini-2.5-flash lost causality-1. gpt-5.4-mini recovering. gemini-2.5-flash scores rising. - Markdown: /runs/2026-04-11t17-15-45/report.md - JSON: /runs/2026-04-11t17-15-45/data.json - [2026-04-11T17-15-45 — Detail](/runs/2026-04-11t17-15-45-detail/): - Markdown: /runs/2026-04-11t17-15-45-detail/report.md - JSON: /runs/2026-04-11t17-15-45-detail/data.json - [2026-04-11T17-11-10](/runs/2026-04-11t17-11-10/): gpt-5.4-mini lost spatial-1. gemini-2.5-flash dropped on common-sense-1. gemini-2.5-flash recovering. - Markdown: /runs/2026-04-11t17-11-10/report.md - JSON: /runs/2026-04-11t17-11-10/data.json - [2026-04-11T17-11-10 — Detail](/runs/2026-04-11t17-11-10-detail/): - Markdown: /runs/2026-04-11t17-11-10-detail/report.md - JSON: /runs/2026-04-11t17-11-10-detail/data.json