Thanks so much to PyCon Lithuania and the organizers for this fantastic conference so far. Today, I’m talking about how we've been using Python to build our open source offerings to better equip researchers interested in climate science.
This presentation is going to start by providing some context on climate adaptation information services, what my company has built with xclim and how we're actively making these kinds of analyses more accessible worldwide.
So who am I? I'm a research software developer from Montréal, Québec. My background is in environmental science, specifically GIS and agroclimate modelling. I only really started picking up dev work on the job. I'm also learning Japanese for fun.
My employer, Ouranos, is a not-for-profit based on Montréeal that works with the Canadian and Quebec governments on climate change adaptation. We were created in response to an extreme storm event that had 1.5 Million people without power for weeks and caused around 5.5 Billion dollars in damage. Our role is to connect government, industry, and academia with many types of climate information so that events like those are less impactful. For the past 8 years or so, we've been moving into software and research platform development. The core development team is small, but we do a lot of collaboration.
Before we get to the Python, it would be good to talk about the climate context. The fact that human-induced Climate Change is occurring is established fact. The temperature change alone has the potential to really impact a lot of things we depend on. Extreme global weather patterns are just one such side effect.
*"Since systematic scientific assessments began in the 1970s, the influence of human activities on the warming of the climate system has evolved from theory to established fact"* \- IPCC Sixth Assessment Report Technical Summary (IPCC AR6-TS)
Since we only have one Earth to run experiments on, climate models are one tool to give us physically consistent estimates on what the future _could_ look like. Unfortunately, this means we need more and more storage and computation resources to test more hypotheses. At some point it becomes completely unmanageable and really challenging to determine what we want or even use climate data, so we need intermediaries to help. This field is what we call Climate Services.
"Overpeck, Jonathan T., Gerald A. Meehl, Sandrine Bony, and David R. Easterling. “Climate Data Challenges in the 21st Century.” Science 331, no. 6018 (February 11, 2011): 700–702. https://doi.org/10.1126/science.1197869"
Climate Services has been a developing field for a few decades now, more so lately with Climate Change. The idea behind a climate service provider is to act as the bridge between researchers in climate and general audiences. You can imagine being a city planner or someone in an industry that can be impacted by climate conditions; if you don't have a background in climate science, where do you even begin? For this we provide information and training to help **make sense of big climate data**.
So what exactly do we provide? Depending on the context, it could be raw historical data to establish trends, or it could be future projections of climate indicators. My background is more agricultural, so we can imagine wanting to know things that would impact our growing season or placing stressing on the crops. Presenting this information is again dependent on the user; some people like maps, some people want time series, more advanced users might want raw data. In many general cases, they don't know what they want or need, so we help them figure that out!
Climate Services have been around for some time, but when I started working on them, there was a lot of not so great things we had to deal with...
This couldn't continue, so when we were negotiating with the Canadian Government for a development agreement for a website to show Climate Data for Canada, we were adamant that we needed to put some funding into a library to help tackle some of these logistical problems. This was approved.
At its base, the library needed to be able to calculate climate indicators, obviously, and what comes out should be easily used by users and fed into other tools. Often climate model data is averaged to remove bias, so statistics tools were also important, as well as ways of correcting bias from models and ensuring what we get out is physically possible. Operationally, it needs to handle Terabytes of data from different sources at times. It should be intuitive, and be relatively mistake-proof, and most importantly, we should be able to extend and build upon it, so that people can customize it to their needs.
At this point, you're probably wondering where Python is coming in, and it's on the next slide.
We decided to try our hand at building this all in Python for a number of reasons... Our goal ultimately was to ensure that we are ensuring that the research comes first and I think we've done that.
Xclim is what we came up with: The library is built with a few key modules, each handling things like indicators, statistics, bias correction and some other utilities. It manages to strike a nice balance between usability and extensibility, and most importantly, it's fast.
We based our data structures on several Open Source scientific Python libraries, namely, Pandas, Numpy and Xarray, ensuring that our code can benefit from the parallelization made possible by Dask. For projects based on xarray, it's customary to put an 'x' in the name, hence 'xclim'.
Xclim was built to be an operational library and as such, we built our algorithms based on more conventional libraries such as scipy for statistics, scikit-learn for bias-adjustment. Numba provides Just-in-time compilation and pandas provides the base API for array and time operations.
Units management and conventions are also key to ensuring that the outputs of operations can be easily used in other applications, and issues such as unit management are handled via libraries dedicated to ensuring that dimensions are always preserved and expected output units are always the same, regardless of inputs.
Since this is research software, we need to be validating at all times, so we've built an extensive array of tests thanks to `pytest`, `pytest-xdist`, `tox`, etc.
Here's the simplest example of a climate indicator I can find: We are taking daily values for snow depth and calculating the year average. Everything is well documented using NumPy Docstrings. The unit dimensions are length and we are resampling to a coarser time frequency. The input units could be inches or millimetres and we explicitly check this in the decorator using metadata standards. This is then passed to more checks to make sure everything is valid.
The `indicators` module is what we suggest for users and it handles... But for those who want to circumvent all these checks, we expose the core `indices` module, which you can use as a basis to build your own indicators or you can use it directly if you trust yourself.
We often get data from many sources and the units can sometimes be wildly different, like Celsius and Fahrenheit for temperature, or sometimes precipitation is total vs a rate. Also, in some fields, equations can be metric or imperical, so getting units right was key. Here we have an example for calculating monthly growing degree days, with different units for the source data and thresholds.
Running this block we can see that regardless of the units used, they're always going to be cosnsistent, which is great for mixing and matching data from different sources.
Since Quebec is a French-speaking region of Canada, `xclim` also has the capability to dynamically translate metadata depending on the locale. We actually built a pretty comprehensive engine for doing this. It can support any language, really, so Lithuania would be possible if you're interested in implementing it. Here we have a calculation for days with temperature below 0 Celsius. I've also added a check for missing data, so any years that are missing more than 5% of values are dropped to strengthen the statistics.
Looking at the metadata of the object, we can see in the history that the information about how we calld this operation shows the missing data threshold, the call signature and operation, the version etc. Below that we also have the dynamically generated metadata for the Indicator, complete with the customized thresholds, for both French and English.
Here's an example of what we can calculate with xclim. On the left is the average annual temperature for Quebec for 30-year periods calculated from 14 different climate model projections. While the right shows us the change in average temperature based on years 1990-2020 across those 14 models until the end of the century. I just want to reassure you that for dramatic purposes, thse values are showing off the more extreme climate change scenario. There's still time to get our act together on the climate!
All models are inherently wrong, and climate models are not special, so adjusting them so that their value distributions match what we should expect is important is if we want to use them for real world scenarios is critical. This module is very complex and I'm not a die-hard stats person so I just wanted to briefly show what that looks like on the right here.
Building this tool has also involved a lot of upstream contributions as well, addressing bugs or adding features to better work with climate data. Much of the changes center around calendar systems, standard units, and statistics. I'll also mention that my team regularly contributes to the maintenance of a few libraries in the domain.
This is all great if you have the data, resources, and technical training to run your analyses, but what if...
For those with the Python knowledge, you can run your scripts on a web platform, connected directly to the data. This figure shows some of the ways we've made either our tools or data or information more accessible to all kinds of users. Technical users can use either the tools directly, or run them on our Jupyter-based research platform we call PAVICS; People who just want data outputs could use programmable dashboards we've made; or they can just grab data values from a map or pre-computed indicators.
Another platform out there is The Microsoft Planetary Computer which hosts a bunch of climate data and has had `xclim` available for a few years now, with examples on how to use it. But accessibility is constantly a concern for climate services and there are still other ways of making use of it.
Another approach to solving the problem is to turn it into a service that can be deployed on a server and fetched via a web-based standard. By show of hands, who knows the following web services...? ... The last one, Web Processing Service is what we decided to implement so that more user types could serve themselves.
So we decided to move forward with building a Web Service in partnership with a project based in Germany called Bird-house, the idea being to be able to run analyses using Web Standards built in Python. ...Bird-house likes to name their projects after Birds, so to be clever, we landed on "Finch" to make reference to environmental adaptations that Charles Darwin talked about when he wrote about finches in the Galapagos Islands.
Bird-house offers a library called Birdy to help with interacting with WPS processes. This service could be running locally or remotely, and we have the option of chaining processes or running some processes in parallel if the service is configured that way. Once we make a connection, calling the indicator calculation is almost exactly the same as in xclim. Once it's done, we can either download the data or stream it to our local computer.
Here we asked for estimates of growing degree days at certain point locations in Canada, and that's what we get out from the remote analysis. You can imagine that once this is setup we can go further and build a Web UI to make it even more accessible.
And that's what developers on the ClimateData.ca project did! This is the ClimateData.ca website that xclim and finch supports in the backend and it's been a huge success all-around. We're constantly making improvements, updating the datasets, etc. If you're interested in seeing what climate change may look like in Canada, please give it some traffic!
Today, our experience is very different having adopted Python and Open Source, and it's much more maintainable and reasonable. Being able to collaborate with many developers around the world has made for better tools and for a better software ecosystem in climate generally.
Thanks for coming and listening to me run through climate services! You can find our software here and the link to thses slides is at the bottom. I'm going to clean them up and finalize them very soon after PyCon Lithuania ends. Ačiū!