Zephyrus: An AI Agent That Speaks Climate Science

Asking questions of climate data typically requires programming skills, specialized domain knowledge, and access to powerful computing resources. A team at UC San Diego wants to change that.

Their system, Zephyrus, is an AI agent designed to translate natural language questions about weather and climate into executable code, run that code against real meteorological datasets, and return answers in plain English.

The Problem It Solves

Weather foundation models have become remarkably capable at numerical forecasting. Large language models excel at understanding human queries. But there’s a gap between them: climate models can’t parse natural language, and LLMs can’t directly process petabytes of meteorological data.

Zephyrus bridges this by combining both. Ask it “Which European cities will experience temperatures above 35°C next week?” and the system translates your question into code that queries weather forecast data, executes the computation, and returns a comprehensible answer.

The research, led by computer scientist Rose Yu and climate scientist Duncan Watson-Parris from Scripps Institution of Oceanography, will be presented at the International Conference on Learning Representations (ICLR) in April.

How It Works

The framework has several components. ZephyrusWorld provides an environment containing the WeatherBench 2 dataset, natural language geocoding tools, weather forecasting capabilities, and climatology modules for statistical queries.

The agent interprets user questions, formulates them as code operations, executes those operations against the environment, and translates results back to language.

The team built ZephyrusBench, a benchmark of question-answer pairs spanning basic lookups to advanced forecasting and counterfactual reasoning scenarios. Tested against text-only baselines, Zephyrus agents improved correctness by up to 44 percentage points.

What It Does Well—and What It Doesn’t

Zephyrus handles straightforward queries effectively: finding locations with specific weather conditions, retrieving forecasts for particular places and times, basic climatological comparisons.

It struggles with more complex tasks. Locating extreme weather events proves difficult. Report generation—synthesizing multiple data points into coherent summaries—remains a challenge. The researchers note that all four frontier language models they tested showed similar accuracy limitations on harder tasks.

The Democratization Angle

The researchers frame this as “lowering the barrier to entry” for climate science. Graduate students without programming backgrounds could query complex datasets. Researchers in regions with limited computational resources could access the same analytical capabilities as well-funded labs.

“The vision is to create AI co-scientists that dramatically lower the barrier to entry, allowing students and researchers everywhere to access and reason about critical weather and climate data at unprecedented speeds,” the researchers state.

What Comes Next

The code and benchmark are publicly available on GitHub. For the next iteration, the team plans to use larger training datasets and fine-tune open-source models specifically for climate-focused tasks.

The broader trajectory here is worth noting: scientific AI is increasingly about interfaces, not just algorithms. Making powerful analytical tools accessible through natural language could matter as much as making them more accurate. Zephyrus suggests one path toward that goal—imperfect but functional enough to be useful.

Whether climate scientists actually adopt such tools will depend on how well they integrate with existing workflows and how much researchers trust AI-generated analyses of data that informs policy decisions. Those questions remain open.