PyCon Lithuania 2024

Building Open Climate Change Information Services in Python


img top-right

  • Trevor James Smith
    PyCon Lithuania
    April 4th, 2024
    Vilnius, Lithuania
PyCon Lithuania 2024

Presentation Outline

  • Who am I? / What is Ouranos?
  • What's our context?
  • Climate Services?
  • xclim: climate operations
  • finch: xclim as a Service
  • Climate WPS Frontends
  • Open Source Climate Services
  • Acknowledgements
Photo: Extratropropical Cyclone over Hudson Bay, Canada, August 2016. Credit: NASA Earth Observatory.
absolute
PyCon Lithuania 2024

Who am I?

Trevor James Smith

github.com/Zeitsperre
Zeit@techhub.social

  • Research software developer/packager/maintainer from Montréal, Québec, Canada 🇨🇦
  • Studied climate change impacts on wine viticulture 🍇 in Southern Québec
  • Making stuff with Python 🐍 for ~6.5 years
  • Užupio Respublikos 🖐️ pilietis
    (nuo 2024 m.)

What is Ouranos? 🌀

  • Non-profit research consortium established in 2003 in Montréal, Québec, Canada
  • Climate Change Adaptation Planning
  • Climate Model Data Producer/Provider
  • Climate Information Services

Photo credit: https://www.communitystories.ca/v2/grand-verglas-saint-jean-sur-richelieu_ice-storm/

Surface air temperature anomaly for February 2024 using ERA5 Reanalysis - Courtesy of C3S/ECMWF

What's the climate situation?

  • Climate Change is having major impacts on Earth's environmental systems
  • IPCC: Global average temperature has increased > +1.1 °C since 1850s.
    • > +1.5 °C is considered to be beyond a safe limit
PyCon Lithuania 2024

What's the climate data situation?

Climate science is a "Big Data" problem

  • New climate models being developed every year
  • More climate simulations being produced every day
  • Higher resolution input and output datasets (gridded data)
  • Specialised analyses and more personalized user needs
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Climate Services

What do they provide?

  • Tailoring objectives and information to different user needs
  • Providing access to climate information
  • Building local mitigation/adaptation capacity
  • Offering training and support
  • Making sense of Big climate Data
PyCon Lithuania 2024

What information do Climate Services provide?

Climate Indicators, e.g.:

  • Hot Days (Days with temperature >= 22 deg Celsius) 🌡️
  • Beginning / End / Length of the growing season 🌷
  • Average seasonal rainfall (3-Month moving average precipitation) ☔
  • Many more examples

Planning Tools, e.g. :

  • Maps 🗺️
  • Point estimates at geographic locations 📈
  • Gridded values 🌐
  • Not really sure what they need? ❓
    ➔ Guidance from experts!
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Climate Services in the 2010s

  • MATLAB-based in-house libraries (proprietary 💰)
    • No source code review
  • Issues with data storage / access / processing
    • Small team unable to meet demand 😫
    • Lack of output data uniformity between researchers ⁉️
    • Lots of bugs 🐛 and human error 🙅
  • Data analysis/requests served manually ⏳
  • Software testing + data validation? Not really. 😱
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Building a Climate Services library?

Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

What are the requirements?

What does it need to perform?

  • Climate Indicators
    • Units management
    • Metadata management
  • Ensemble statistics;
  • Bias Adjustment;
  • Data Quality Assurance Checks

Implementation goals?

  • Operational : Capable of handling very large ensembles of climate data
  • Foolproof : Automatic verification of data and metadata validity by default
  • Extensible : Flexibility of use and able to easily provide custom indicators, as needed
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Is there Python in this talk?

  • Yes
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Why build a Climate Services library in Python?

  • Robust, trustworthy, and fast scientific Python libraries
  • Python's Readability / Reviewability (Peer Review)
  • Growing demand for climate services / products
    • Let the users help themselves
  • The timing was right
    • Internal and external demand for common tools
  • Less time writing code, more time spent doing research
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024
padding: 0px 20px 0px 0px
padding: 0px 20px 0px 0px
padding: 0px 20px 0px 0px
PyCon Lithuania 2024

How did we build Xclim?

  • Data Structure
  • Algorithms
  • Data and Metdata Conventions
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

and pytest(-xdist)

~1625 tests (baseline)
+ Doctests
+ Jupyter Notebook tests
+ Optional module tests
+ Multiplatform/Anaconda Python tests
+ ReadtheDocs (fail-on-warning: true)

Climate Indicator Example - Average Snow Depth

@declare_units(snd="[length]")
def snow_depth(
    snd: xarray.DataArray,
    freq: str = "YS",
) -> xarray.DataArray:
    """Mean of daily average snow depth.

    Resample the original daily mean snow depth series by taking the mean over each period.

    Parameters
    ----------
    snd : xarray.DataArray
        Mean daily snow depth.
    freq : str
        Resampling frequency.

    Returns
    -------
    xarray.DataArray, [same units as snd]
        The mean daily snow depth at the given time frequency
    """
    return snd.resample(time=freq).mean(dim="time").assign_attrs(units=snd.units)

Xclim algorithm design

Two ways of calculating indicators

  • indicators (End-User API)
    • Metadata standards checks
    • Data quality checks
    • Time frequency checks
    • Missing data-compliance
    • Calendar-compliance
  • indice (Core API)
    • For users that don't care for the standards and quality checks
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

What does Xclim do? ➔ Units Management

import xclim
from clisops.core import subset

# Data is in Kelvin, threshold is in Celsius, and other combinations

# Extract a single point location for the example
ds_pt = subset.subset_gridpoint(ds, lon=-73, lat=44)

# Calculate indicators with different units

# Kelvin and Celsius
out1 = xclim.atmos.growing_degree_days(tas=ds_pt.tas, thresh="5 degC", freq="MS")

# Fahrenheit and Celsius
out2 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="5 degC", freq="MS")

# Fahrenheit and Kelvin
out3 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="278.15 K", freq="MS")
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

What does Xclim do? ➔ Units Management

img

import xclim
from clisops.core import subset

# Data is in Kelvin, threshold is in Celsius, and other combinations

# Extract a single point location for the example
ds_pt = subset.subset_gridpoint(ds, lon=-73, lat=44)

# Calculate indicators with different units

# Kelvin and Celsius
out1 = xclim.atmos.growing_degree_days(tas=ds_pt.tas, thresh="5 degC", freq="MS")

# Fahrenheit and Celsius
out2 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="5 degC", freq="MS")

# Fahrenheit and Kelvin
out3 = xclim.atmos.growing_degree_days(tas=ds_pt.tas_F, thresh="278.15 K", freq="MS")
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

What does Xclim do? ➔ Missing Data and Metadata Locales

import xarray as xr
import xclim

ds = xr.open_dataset("my_dataset.nc")

with xclim.set_options(
    # Drop timesteps with more than 5% of missing data
    set_missing="pct", missing_options=dict(pct={"tolerance": 0.05}),

    metadata_locales=["fr"] # Add French language metadata
):
    # Calculate Annual Frost Days (days with min temperature < 0 °C) 
    FD = xclim.atmos.frost_days(ds.tas, freq="YS")
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

What does Xclim do? ➔ Missing Data and Metadata Locales

import xarray as xr
import xclim

ds = xr.open_dataset("my_dataset.nc")

with xclim.set_options(
    # Drop timesteps with more than 5% of missing data
    set_missing="pct", missing_options=dict(pct={"tolerance": 0.05}),

    metadata_locales=["fr"] # Add French language metadata
):
    # Calculate Annual Frost Days (days with min temperature < 0 °C) 
    FD = aclim.atmos.frost_days(ds.tas, freq="YS")

img

Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

What does Xclim do ➔ Climate Ensemble Mean Analysis

Average temperature from the years 1991-2020 average across 14 Regional Climate Models (extreme warming scenario: SSP3-7.0)

Building Open Climate Change Information Services in Python

What Does Xclim do? ➔ Bias Adjustment

  • Model train / adjust approach
PyCon Lithuania 2024

Upstream contributions from Xclim

  • Non-standard calendar (cftime) support in xarray.groupby
  • Quantile methods in xarray.groupby
  • Non-standard calendar conversion migrated from xclim to xarray
  • Climate and Forecasting (CF) unit definitions inspired from MetPy
    • Inspiring work in cf-xarray
  • Weighted variance, standard deviations, and quantiles in xarray
    (for ensemble statistics)
  • Faster NaN-aware quantiles in numpy
  • Initial polyfit function in xarray
  • Also, we help maintain xESMF, intake-esm, cf-xarray, xncml, climpred and others for xclim-related tools
PyCon Lithuania 2024

That's great and all, but what if...

  • There's just too much data that we need to crunch :

    • The data could be spread across servers globally
    • Local computing power is not powerful enough for the analyses
  • The user knows programming but not Python :

    • A biologist who uses R or a different program for their work
    • An engineer who just needs a range of estimates for future rainfall
  • The user just wants to see some custom maps :

    • Agronomist who is curious about average growing conditions in 10 years?
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024
PyCon Lithuania 2024

Xclim on Computation Platforms

Microsoft Planetary Computer

Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Enhancing Accessibility : Web Services

  • WMS : Web Mapping Service
    • Google Maps
  • WFS : Web Feature Service
  • WCS : Web Coverage Service
  • WPS : Web Processing Service
    • Running geospatial analyses over the internet
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Finch : Climate Indicator Web Processing Service

Dynamically-generated indicators from xclim (~430 Indicators in total)

github.com/Bird-house/Finch

Building Open Climate Change Information Services in Python

Using remote Finch Web Service from Python (with birdy)

from birdy import WPSClient

wps = WPSClient("https://ouranos.ca/example/finch/wps")

# Using the OPeNDAP protocol
remote_dataset = "www.exampledata.lt/climate.ncml"

# The indicator call looks a lot like the one from `xclim` but
# passing a url instead of an `xarray` object.
response = wps.growing_degree_days(
    remote_dataset,
    thresh='10 degC',
    freq='MS',
    variable='tas'
)

# Returned as a streaming `xarray` data object
out = response.get(asobj=True).output_netcdf

out.growing_degree_days.plot(hue='location')

Bird-house/birdy -> PyWPS Helper Library

Using remote Finch Web Service from Python (birdy) img

from birdy import WPSClient

wps = WPSClient(finch_url)

# Using the OPeNDAP protocol
remote_dataset = "www.exampledata.lt/climate.ncml"

# The indicator call looks a lot like the one from `xclim` but
# passing a url instead of an `xarray` object.
response = wps.growing_degree_days(
    remote_dataset,
    thresh='10 degC',
    freq='MS',
    variable='tas'
)

# Returned as a streaming `xarray` data object
out = response.get(asobj=True).output_netcdf

out.growing_degree_days.plot(hue='location')

Bird-house/birdy -> PyWPS Helper Library

Making it accessible ➔ Web Frontends

www.ClimateData.ca

PyCon Lithuania 2024

Modern-day Climate Services with Python

  • Open Source Python libraries (numpy, sklearn, xarray, etc.)
  • Multithreading and streaming data formats (e.g. OPeNDAP and ZARR)
  • Common tools built collaboratively and shared widely (xclim, finch)
  • Docker-deployed Web-Service-based infrastructure
  • Testing, CI/CD pipelines, and validation workflows
  • Peer-Reviewed software (pyOpenSci and JOSS)
Building Open Climate Change Information Services in Python
PyCon Lithuania 2024

Thanks!

Colleagues and Collaborators

  • Pascal Bourgault
  • David Huard
  • Travis Logan
  • Abel Aoun
  • Juliette Lavoie
  • Éric Dupuis
  • Gabriel Rondeau-Genesse
  • Carsten Ehbrecht
  • Long Vu
  • Sarah Gammon
  • David Caron
    and many more contributors!

Ačiū!

Have a great rest of PyCon Lithuania! 🇱🇹

github.com/Ouranosinc/xclim

JOSS
DOI

github.com/Bird-house/finch

DOI

This presentation:
https://zeitsperre.github.io/PyConLT2024/

Building Open Climate Change Information Services in Python

Thanks so much to PyCon Lithuania and the organizers for this fantastic conference so far. Today, I’m talking about how we've been using Python to build our open source offerings to better equip researchers interested in climate science.

This presentation is going to start by providing some context on climate adaptation information services, what my company has built with xclim and how we're actively making these kinds of analyses more accessible worldwide.

So who am I? I'm a research software developer from Montréal, Québec. My background is in environmental science, specifically GIS and agroclimate modelling. I only really started picking up dev work on the job. I'm also learning Japanese for fun.

My employer, Ouranos, is a not-for-profit based on Montréeal that works with the Canadian and Quebec governments on climate change adaptation. We were created in response to an extreme storm event that had 1.5 Million people without power for weeks and caused around 5.5 Billion dollars in damage. Our role is to connect government, industry, and academia with many types of climate information so that events like those are less impactful. For the past 8 years or so, we've been moving into software and research platform development. The core development team is small, but we do a lot of collaboration.

Before we get to the Python, it would be good to talk about the climate context. The fact that human-induced Climate Change is occurring is established fact. The temperature change alone has the potential to really impact a lot of things we depend on. Extreme global weather patterns are just one such side effect.

*"Since systematic scientific assessments began in the 1970s, the influence of human activities on the warming of the climate system has evolved from theory to established fact"* \- IPCC Sixth Assessment Report Technical Summary (IPCC AR6-TS)

Since we only have one Earth to run experiments on, climate models are one tool to give us physically consistent estimates on what the future _could_ look like. Unfortunately, this means we need more and more storage and computation resources to test more hypotheses. At some point it becomes completely unmanageable and really challenging to determine what we want or even use climate data, so we need intermediaries to help. This field is what we call Climate Services.

"Overpeck, Jonathan T., Gerald A. Meehl, Sandrine Bony, and David R. Easterling. “Climate Data Challenges in the 21st Century.” Science 331, no. 6018 (February 11, 2011): 700–702. https://doi.org/10.1126/science.1197869"

Climate Services has been a developing field for a few decades now, more so lately with Climate Change. The idea behind a climate service provider is to act as the bridge between researchers in climate and general audiences. You can imagine being a city planner or someone in an industry that can be impacted by climate conditions; if you don't have a background in climate science, where do you even begin? For this we provide information and training to help **make sense of big climate data**.

So what exactly do we provide? Depending on the context, it could be raw historical data to establish trends, or it could be future projections of climate indicators. My background is more agricultural, so we can imagine wanting to know things that would impact our growing season or placing stressing on the crops. Presenting this information is again dependent on the user; some people like maps, some people want time series, more advanced users might want raw data. In many general cases, they don't know what they want or need, so we help them figure that out!

Climate Services have been around for some time, but when I started working on them, there was a lot of not so great things we had to deal with...

This couldn't continue, so when we were negotiating with the Canadian Government for a development agreement for a website to show Climate Data for Canada, we were adamant that we needed to put some funding into a library to help tackle some of these logistical problems. This was approved.

At its base, the library needed to be able to calculate climate indicators, obviously, and what comes out should be easily used by users and fed into other tools. Often climate model data is averaged to remove bias, so statistics tools were also important, as well as ways of correcting bias from models and ensuring what we get out is physically possible. Operationally, it needs to handle Terabytes of data from different sources at times. It should be intuitive, and be relatively mistake-proof, and most importantly, we should be able to extend and build upon it, so that people can customize it to their needs.

At this point, you're probably wondering where Python is coming in, and it's on the next slide.

We decided to try our hand at building this all in Python for a number of reasons... Our goal ultimately was to ensure that we are ensuring that the research comes first and I think we've done that.

Xclim is what we came up with: The library is built with a few key modules, each handling things like indicators, statistics, bias correction and some other utilities. It manages to strike a nice balance between usability and extensibility, and most importantly, it's fast.

We based our data structures on several Open Source scientific Python libraries, namely, Pandas, Numpy and Xarray, ensuring that our code can benefit from the parallelization made possible by Dask. For projects based on xarray, it's customary to put an 'x' in the name, hence 'xclim'.

Xclim was built to be an operational library and as such, we built our algorithms based on more conventional libraries such as scipy for statistics, scikit-learn for bias-adjustment. Numba provides Just-in-time compilation and pandas provides the base API for array and time operations.

Units management and conventions are also key to ensuring that the outputs of operations can be easily used in other applications, and issues such as unit management are handled via libraries dedicated to ensuring that dimensions are always preserved and expected output units are always the same, regardless of inputs.

Since this is research software, we need to be validating at all times, so we've built an extensive array of tests thanks to `pytest`, `pytest-xdist`, `tox`, etc.

Here's the simplest example of a climate indicator I can find: We are taking daily values for snow depth and calculating the year average. Everything is well documented using NumPy Docstrings. The unit dimensions are length and we are resampling to a coarser time frequency. The input units could be inches or millimetres and we explicitly check this in the decorator using metadata standards. This is then passed to more checks to make sure everything is valid.

The `indicators` module is what we suggest for users and it handles... But for those who want to circumvent all these checks, we expose the core `indices` module, which you can use as a basis to build your own indicators or you can use it directly if you trust yourself.

We often get data from many sources and the units can sometimes be wildly different, like Celsius and Fahrenheit for temperature, or sometimes precipitation is total vs a rate. Also, in some fields, equations can be metric or imperical, so getting units right was key. Here we have an example for calculating monthly growing degree days, with different units for the source data and thresholds.

Running this block we can see that regardless of the units used, they're always going to be cosnsistent, which is great for mixing and matching data from different sources.

Since Quebec is a French-speaking region of Canada, `xclim` also has the capability to dynamically translate metadata depending on the locale. We actually built a pretty comprehensive engine for doing this. It can support any language, really, so Lithuania would be possible if you're interested in implementing it. Here we have a calculation for days with temperature below 0 Celsius. I've also added a check for missing data, so any years that are missing more than 5% of values are dropped to strengthen the statistics.

Looking at the metadata of the object, we can see in the history that the information about how we calld this operation shows the missing data threshold, the call signature and operation, the version etc. Below that we also have the dynamically generated metadata for the Indicator, complete with the customized thresholds, for both French and English.

Here's an example of what we can calculate with xclim. On the left is the average annual temperature for Quebec for 30-year periods calculated from 14 different climate model projections. While the right shows us the change in average temperature based on years 1990-2020 across those 14 models until the end of the century. I just want to reassure you that for dramatic purposes, thse values are showing off the more extreme climate change scenario. There's still time to get our act together on the climate!

All models are inherently wrong, and climate models are not special, so adjusting them so that their value distributions match what we should expect is important is if we want to use them for real world scenarios is critical. This module is very complex and I'm not a die-hard stats person so I just wanted to briefly show what that looks like on the right here.

Building this tool has also involved a lot of upstream contributions as well, addressing bugs or adding features to better work with climate data. Much of the changes center around calendar systems, standard units, and statistics. I'll also mention that my team regularly contributes to the maintenance of a few libraries in the domain.

This is all great if you have the data, resources, and technical training to run your analyses, but what if...

For those with the Python knowledge, you can run your scripts on a web platform, connected directly to the data. This figure shows some of the ways we've made either our tools or data or information more accessible to all kinds of users. Technical users can use either the tools directly, or run them on our Jupyter-based research platform we call PAVICS; People who just want data outputs could use programmable dashboards we've made; or they can just grab data values from a map or pre-computed indicators.

Another platform out there is The Microsoft Planetary Computer which hosts a bunch of climate data and has had `xclim` available for a few years now, with examples on how to use it. But accessibility is constantly a concern for climate services and there are still other ways of making use of it.

Another approach to solving the problem is to turn it into a service that can be deployed on a server and fetched via a web-based standard. By show of hands, who knows the following web services...? ... The last one, Web Processing Service is what we decided to implement so that more user types could serve themselves.

So we decided to move forward with building a Web Service in partnership with a project based in Germany called Bird-house, the idea being to be able to run analyses using Web Standards built in Python. ...Bird-house likes to name their projects after Birds, so to be clever, we landed on "Finch" to make reference to environmental adaptations that Charles Darwin talked about when he wrote about finches in the Galapagos Islands.

Bird-house offers a library called Birdy to help with interacting with WPS processes. This service could be running locally or remotely, and we have the option of chaining processes or running some processes in parallel if the service is configured that way. Once we make a connection, calling the indicator calculation is almost exactly the same as in xclim. Once it's done, we can either download the data or stream it to our local computer.

Here we asked for estimates of growing degree days at certain point locations in Canada, and that's what we get out from the remote analysis. You can imagine that once this is setup we can go further and build a Web UI to make it even more accessible.

And that's what developers on the ClimateData.ca project did! This is the ClimateData.ca website that xclim and finch supports in the backend and it's been a huge success all-around. We're constantly making improvements, updating the datasets, etc. If you're interested in seeing what climate change may look like in Canada, please give it some traffic!

Today, our experience is very different having adopted Python and Open Source, and it's much more maintainable and reasonable. Being able to collaborate with many developers around the world has made for better tools and for a better software ecosystem in climate generally.

Thanks for coming and listening to me run through climate services! You can find our software here and the link to thses slides is at the bottom. I'm going to clean them up and finalize them very soon after PyCon Lithuania ends. Ačiū!