Weather scientists may soon ask their models questions in plain English rather than writing code. Researchers at UC San Diego have built Zephyrus, an AI agent that translates natural language queries into executable analysis of AI-driven weather and climate models.
The work, which will be presented at the International Conference on Learning Representations (ICLR) in Rio de Janeiro this April, addresses a growing bottleneck in meteorology: AI weather models have become remarkably accurate, but analyzing their output still requires programming expertise.
The Problem
AI-driven weather forecasting has improved dramatically in recent years. Models like GraphCast and Pangu-Weather can now match or exceed traditional numerical weather prediction in many scenarios. NOAA recently deployed its own suite of AI-enhanced forecasting systems.
But these models don’t explain themselves. They produce vast datasets of predictions across space and time, and extracting useful information requires writing code to query, filter, and visualize the data. Climate scientists spend significant time on this data wrangling rather than actual analysis.
“A main issue is that these types of AI models are not able to describe their findings in plain language,” said Duncan Watson-Parris, a climate scientist at UC San Diego’s Scripps Institution of Oceanography who co-authored the research. “A secondary issue is that these models are not able to reason about text information, such as meteorology reports and weather bulletins.”
How Zephyrus Works
Zephyrus acts as an intermediary between scientists and weather models. It accepts questions in natural language, translates them into executable code, runs the analysis against weather model outputs, and returns results in plain language.
Ask “What was the temperature in San Diego on March 15?” and Zephyrus generates the appropriate data query, retrieves the forecast, and provides an answer. Ask “Where will rainfall exceed 2 inches this week?” and it constructs the spatial analysis needed to answer.
The system bridges what the researchers call the gap between “code-driven AI models and language-based AI agents.” Scientists can iterate on questions without debugging code, speeding up exploratory analysis.
What It Can and Can’t Do
In testing, Zephyrus performed well on straightforward queries: finding weather conditions at specific locations, retrieving forecasts for particular times and places, and answering factual questions about model outputs.
It struggled with complex tasks. Finding regions experiencing extreme weather — which requires defining thresholds, handling spatial distributions, and identifying anomalies — often produced errors. Generating comprehensive reports also proved difficult.
The researchers tested Zephyrus across four different large language models and found similar accuracy levels regardless of which underlying model powered the agent. The bottleneck appears to be the agent architecture itself, not the language model’s capabilities.
Why This Matters
Climate research increasingly depends on AI weather models, but the researcher pool who can effectively use these tools remains limited to those with strong programming backgrounds. Tools like Zephyrus could democratize access.
The potential extends beyond research. Emergency managers, policy makers, and journalists often need quick answers from weather data without time to learn the technical stack. An agent that handles the translation could make climate information more accessible.
The researchers note that Zephyrus is a first step. The framework needs larger training datasets and fine-tuning for climate-specific terminology. The current limitations around extreme weather detection and report generation are significant gaps for operational use.
What’s Next
Sumanth Varambally, the lead author, and collaborators Rose Yu and Watson-Parris plan to expand the training data and fine-tune open-source language models specifically for climate applications.
The code and paper are available on arXiv ahead of the ICLR presentation in late April. Whether Zephyrus evolves into a practical tool or remains a research prototype will depend on whether future versions can handle the complex queries that working scientists actually need answered.