Summary statistics¶
ScmRun
objects have methods specific to calculating summary statistics. In this notebook we demonstrate them.
At present, the following methods are available:
process_over
quantiles_over
groupby
groupby_all_except
import numpy as np
import pandas as pd
from scmdata.run import ScmRun, run_append
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/database/_database.py:9: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
import tqdm.autonotebook as tqdman
Helper bits and piecs¶
def new_timeseries(
n=101,
count=1,
model="example",
scenario="ssp119",
variable="Surface Temperature",
unit="K",
region="World",
cls=ScmRun,
**kwargs,
):
data = np.random.rand(n, count) * np.arange(n)[:, np.newaxis]
index = 2000 + np.arange(n)
return cls(
data,
columns={
"model": model,
"scenario": scenario,
"variable": variable,
"region": region,
"unit": unit,
**kwargs,
},
index=index,
)
Let’s create an ScmRun
which contains a few variables and a number of runs. Such a dataframe would be used to store the results from an ensemble of simple climate model runs.
runs = run_append(
[
new_timeseries(
count=3,
variable=[
"Surface Temperature",
"Atmospheric Concentrations|CO2",
"Radiative Forcing",
],
unit=["K", "ppm", "W/m^2"],
run_id=run_id,
)
for run_id in range(10)
]
)
runs.metadata["source"] = "fake data"
runs
<ScmRun (timeseries: 30, timepoints: 101)>
Time:
Start: 2000-01-01T00:00:00
End: 2100-01-01T00:00:00
Meta:
model region run_id scenario unit variable
0 example World 0 ssp119 K Surface Temperature
1 example World 0 ssp119 ppm Atmospheric Concentrations|CO2
2 example World 0 ssp119 W/m^2 Radiative Forcing
3 example World 1 ssp119 K Surface Temperature
4 example World 1 ssp119 ppm Atmospheric Concentrations|CO2
5 example World 1 ssp119 W/m^2 Radiative Forcing
6 example World 2 ssp119 K Surface Temperature
7 example World 2 ssp119 ppm Atmospheric Concentrations|CO2
8 example World 2 ssp119 W/m^2 Radiative Forcing
9 example World 3 ssp119 K Surface Temperature
10 example World 3 ssp119 ppm Atmospheric Concentrations|CO2
11 example World 3 ssp119 W/m^2 Radiative Forcing
12 example World 4 ssp119 K Surface Temperature
13 example World 4 ssp119 ppm Atmospheric Concentrations|CO2
14 example World 4 ssp119 W/m^2 Radiative Forcing
15 example World 5 ssp119 K Surface Temperature
16 example World 5 ssp119 ppm Atmospheric Concentrations|CO2
17 example World 5 ssp119 W/m^2 Radiative Forcing
18 example World 6 ssp119 K Surface Temperature
19 example World 6 ssp119 ppm Atmospheric Concentrations|CO2
20 example World 6 ssp119 W/m^2 Radiative Forcing
21 example World 7 ssp119 K Surface Temperature
22 example World 7 ssp119 ppm Atmospheric Concentrations|CO2
23 example World 7 ssp119 W/m^2 Radiative Forcing
24 example World 8 ssp119 K Surface Temperature
25 example World 8 ssp119 ppm Atmospheric Concentrations|CO2
26 example World 8 ssp119 W/m^2 Radiative Forcing
27 example World 9 ssp119 K Surface Temperature
28 example World 9 ssp119 ppm Atmospheric Concentrations|CO2
29 example World 9 ssp119 W/m^2 Radiative Forcing
process_over
¶
The process_over
method allows us to calculate a specific set of statistics on groups of timeseries. A number of pandas functions can be called including “sum”, “mean” and “describe”.
print(runs.process_over.__doc__)
Process the data over the input columns.
Parameters
----------
cols
Columns to perform the operation on. The timeseries will be grouped by all
other columns in :attr:`meta`.
operation : str or func
The operation to perform.
If a string is provided, the equivalent pandas groupby function is used. Note
that not all groupby functions are available as some do not make sense for
this particular application. Additional information about the arguments for
the pandas groupby functions can be found at <https://pandas.pydata.org/pan
das-docs/stable/reference/groupby.html>`_.
If a function is provided, it will be applied to each group. The function must
take a dataframe as its first argument and return a DataFrame, Series or scalar.
Note that quantile means the value of the data at a given point in the cumulative
distribution of values at each point in the timeseries, for each timeseries
once the groupby is applied. As a result, using ``q=0.5`` is the same as
taking the median and not the same as taking the mean/average.
na_override: [int, float]
Convert any nan value in the timeseries meta to this value during processsing.
The meta values converted back to nan's before the run is returned. This
should not need to be changed unless the existing metadata clashes with the
default na_override value.
This functionality is disabled if na_override is None, but may result in incorrect
results if the timeseries meta includes any nan's.
op_cols: dict of str: str
Dictionary containing any columns that should be overridden after processing.
If a required column from :class:`scmdata.ScmRun` is specified in ``cols`` and
``as_run=True``, an override must be provided for that column in ``op_cols``
otherwise the conversion to :class:`scmdata.ScmRun` will fail.
as_run: bool or subclass of BaseScmRun
If True, return the resulting timeseries as an :class:`scmdata.ScmRun` object,
otherwise if False, a :class:`pandas.DataFrame`or :class:`pandas.Series` is
returned (depending on the nature of the operation). Some operations may not be
able to be converted to a :class:`scmdata.ScmRun`. For example if the operation
returns scalar values rather than timeseries.
If a class is provided, the return value will be cast to this class.
**kwargs
Keyword arguments to pass ``operation`` (or the pandas operation if ``operation``
is a string)
Returns
-------
:class:`pandas.DataFrame` or :class:`pandas.Series` or :class:`scmdata.ScmRun`
The result of ``operation``, grouped by all columns in :attr:`meta`
other than :obj:`cols`
Raises
------
ValueError
If the operation is not an allowed operation
If the value of na_override clashes with any existing metadata
If ``operation`` produces a :class:`pandas.Series`, but `as_run`` is True
If ``as_run`` is not True, False or a subclass of :class:`scmdata.run.BaseScmRun`
:class:`scmdata.errors.MissingRequiredColumnError`
If `as_run` is not False and the result does not have the required metadata
to convert to an :class`ScmRun <scmdata.ScmRun>`.
This can be resolved by specifying additional metadata via ``op_cols``
Mean¶
mean = runs.process_over(cols="run_id", operation="mean")
mean
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
example | World | ssp119 | K | Surface Temperature | 0.0 | 0.596345 | 1.128918 | 1.514027 | 2.495971 | 1.973448 | 3.781227 | 4.163771 | 3.614195 | 5.647494 | ... | 47.382414 | 51.984829 | 37.887534 | 55.764676 | 55.719420 | 52.470270 | 40.295844 | 41.257616 | 41.233322 | 57.062650 |
W/m^2 | Radiative Forcing | 0.0 | 0.352113 | 1.151058 | 1.654809 | 2.465450 | 1.969925 | 3.181354 | 3.500164 | 3.919097 | 4.248761 | ... | 27.212770 | 50.252181 | 33.433778 | 55.230069 | 64.547888 | 60.127783 | 49.044180 | 59.789565 | 53.391649 | 33.517276 | |||
ppm | Atmospheric Concentrations|CO2 | 0.0 | 0.465642 | 0.852352 | 1.541577 | 2.084831 | 2.775635 | 3.229357 | 2.617146 | 3.008985 | 4.208213 | ... | 31.908838 | 43.124752 | 50.617634 | 61.602944 | 42.963605 | 36.562227 | 49.499952 | 42.543541 | 44.439230 | 53.544782 |
3 rows × 101 columns
Median¶
median = runs.process_over(cols="run_id", operation="median")
median
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
example | World | ssp119 | K | Surface Temperature | 0.0 | 0.671150 | 1.118483 | 1.422507 | 2.688332 | 1.686841 | 4.078258 | 4.608650 | 2.776327 | 6.475209 | ... | 50.179144 | 54.696913 | 39.183757 | 57.732697 | 62.288314 | 55.309794 | 37.126860 | 30.490080 | 36.499659 | 54.377530 |
W/m^2 | Radiative Forcing | 0.0 | 0.359544 | 1.295210 | 1.780380 | 2.437602 | 1.746289 | 3.149115 | 3.529822 | 4.993532 | 3.902942 | ... | 25.372975 | 49.245870 | 34.143264 | 53.098487 | 66.840681 | 62.592220 | 50.400562 | 53.436196 | 61.604735 | 30.662371 | |||
ppm | Atmospheric Concentrations|CO2 | 0.0 | 0.433650 | 0.857406 | 1.582820 | 2.278573 | 2.718223 | 3.519086 | 2.990340 | 2.446087 | 3.159592 | ... | 34.572217 | 35.874652 | 43.326254 | 64.652163 | 40.317418 | 32.605075 | 56.686231 | 41.038774 | 42.504582 | 50.797859 |
3 rows × 101 columns
Arbitrary functions¶
You are also able to run arbitrary functions for each group
def mean_and_invert(df, axis=0):
# Take a mean across the group and then invert the result
return -df.mean(axis=axis)
runs.process_over("run_id", operation=mean_and_invert)
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
example | World | ssp119 | K | Surface Temperature | -0.0 | -0.596345 | -1.128918 | -1.514027 | -2.495971 | -1.973448 | -3.781227 | -4.163771 | -3.614195 | -5.647494 | ... | -47.382414 | -51.984829 | -37.887534 | -55.764676 | -55.719420 | -52.470270 | -40.295844 | -41.257616 | -41.233322 | -57.062650 |
W/m^2 | Radiative Forcing | -0.0 | -0.352113 | -1.151058 | -1.654809 | -2.465450 | -1.969925 | -3.181354 | -3.500164 | -3.919097 | -4.248761 | ... | -27.212770 | -50.252181 | -33.433778 | -55.230069 | -64.547888 | -60.127783 | -49.044180 | -59.789565 | -53.391649 | -33.517276 | |||
ppm | Atmospheric Concentrations|CO2 | -0.0 | -0.465642 | -0.852352 | -1.541577 | -2.084831 | -2.775635 | -3.229357 | -2.617146 | -3.008985 | -4.208213 | ... | -31.908838 | -43.124752 | -50.617634 | -61.602944 | -42.963605 | -36.562227 | -49.499952 | -42.543541 | -44.439230 | -53.544782 |
3 rows × 101 columns
runs.process_over("run_id", operation=mean_and_invert, axis=1)
model region run_id scenario unit variable
example World 0 ssp119 K Surface Temperature -23.399211
1 ssp119 K Surface Temperature -28.058534
2 ssp119 K Surface Temperature -26.310644
3 ssp119 K Surface Temperature -21.819595
4 ssp119 K Surface Temperature -27.532120
5 ssp119 K Surface Temperature -24.613193
6 ssp119 K Surface Temperature -25.478714
7 ssp119 K Surface Temperature -23.475447
8 ssp119 K Surface Temperature -23.304137
9 ssp119 K Surface Temperature -26.109388
0 ssp119 W/m^2 Radiative Forcing -25.546476
1 ssp119 W/m^2 Radiative Forcing -23.974383
2 ssp119 W/m^2 Radiative Forcing -24.857438
3 ssp119 W/m^2 Radiative Forcing -25.299717
4 ssp119 W/m^2 Radiative Forcing -26.869726
5 ssp119 W/m^2 Radiative Forcing -25.939956
6 ssp119 W/m^2 Radiative Forcing -25.824626
7 ssp119 W/m^2 Radiative Forcing -24.491061
8 ssp119 W/m^2 Radiative Forcing -25.906752
9 ssp119 W/m^2 Radiative Forcing -24.877365
0 ssp119 ppm Atmospheric Concentrations|CO2 -24.211620
1 ssp119 ppm Atmospheric Concentrations|CO2 -19.433876
2 ssp119 ppm Atmospheric Concentrations|CO2 -24.452935
3 ssp119 ppm Atmospheric Concentrations|CO2 -22.292322
4 ssp119 ppm Atmospheric Concentrations|CO2 -26.544962
5 ssp119 ppm Atmospheric Concentrations|CO2 -23.423582
6 ssp119 ppm Atmospheric Concentrations|CO2 -23.834897
7 ssp119 ppm Atmospheric Concentrations|CO2 -26.199737
8 ssp119 ppm Atmospheric Concentrations|CO2 -27.005808
9 ssp119 ppm Atmospheric Concentrations|CO2 -24.482185
dtype: float64
Other quantiles¶
lower_likely_quantile = runs.process_over(cols="run_id", operation="quantile", q=0.17)
lower_likely_quantile
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
example | World | ssp119 | K | Surface Temperature | 0.0 | 0.277925 | 0.296714 | 0.583327 | 1.280206 | 0.704610 | 3.268624 | 2.307766 | 1.885095 | 2.369202 | ... | 20.546691 | 36.770388 | 19.251176 | 36.692466 | 19.611863 | 29.188249 | 19.126885 | 20.383975 | 11.456099 | 35.691795 |
W/m^2 | Radiative Forcing | 0.0 | 0.145196 | 0.588132 | 0.504262 | 1.470827 | 0.993488 | 1.425550 | 2.427890 | 0.480049 | 2.223684 | ... | 10.840713 | 36.531543 | 2.360360 | 27.740388 | 51.581516 | 46.929276 | 13.762326 | 37.527886 | 19.760797 | 6.033400 | |||
ppm | Atmospheric Concentrations|CO2 | 0.0 | 0.065628 | 0.232688 | 0.327447 | 0.451701 | 1.516487 | 2.219258 | 0.855934 | 1.578310 | 1.989964 | ... | 4.559917 | 16.317969 | 23.904091 | 39.425163 | 14.148972 | 10.722061 | 28.922266 | 20.008239 | 18.867379 | 21.921475 |
3 rows × 101 columns
quantiles_over
¶
If you want to calculate more than one summary statistic, quantiles_over
will calculate and label multiple summary statistics before returning them.
print(runs.quantiles_over.__doc__)
Calculate quantiles of the data over the input columns.
Parameters
----------
cols
Columns to perform the operation on. The timeseries will be grouped by all
other columns in :attr:`meta`.
quantiles
The quantiles to calculate. This should be a list of quantiles to calculate
(quantile values between 0 and 1). ``quantiles`` can also include the strings
"median" or "mean" if these values are to be calculated.
**kwargs
Passed to :meth:`~ScmRun.process_over`.
Returns
-------
:class:`pandas.DataFrame`
The quantiles of the timeseries, grouped by all columns in :attr:`meta`
other than :obj:`cols`. Each calculated quantile is given a label which is
stored in the ``quantile`` column within the output index.
Raises
------
TypeError
``operation`` is included in ``kwargs``. The operation is inferred from ``quantiles``.
summary_stats = runs.quantiles_over(
cols="run_id", quantiles=[0.05, 0.17, 0.5, 0.83, 0.95, "mean", "median"]
)
summary_stats
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | quantile | |||||||||||||||||||||
example | World | ssp119 | K | Surface Temperature | 0.05 | 0.0 | 0.128703 | 0.250772 | 0.362986 | 1.021021 | 0.400146 | 1.674319 | 1.390815 | 1.504516 | 0.548557 | ... | 12.390077 | 26.639664 | 11.700526 | 19.607688 | 5.336946 | 19.897363 | 15.685909 | 13.514556 | 5.701723 | 31.650094 |
W/m^2 | Radiative Forcing | 0.05 | 0.0 | 0.069799 | 0.203154 | 0.376197 | 0.749395 | 0.388147 | 0.391588 | 2.403882 | 0.290108 | 1.081456 | ... | 4.762287 | 22.284656 | 0.726211 | 16.677124 | 37.907835 | 29.133397 | 9.598161 | 33.422452 | 7.558986 | 2.394234 | |||
ppm | Atmospheric Concentrations|CO2 | 0.05 | 0.0 | 0.022913 | 0.129041 | 0.253741 | 0.397409 | 0.967627 | 1.116097 | 0.218300 | 0.791392 | 1.544788 | ... | 1.944475 | 4.758727 | 20.819875 | 15.518729 | 7.024698 | 4.129051 | 17.560585 | 12.477884 | 14.910313 | 4.244109 | |||
K | Surface Temperature | 0.17 | 0.0 | 0.277925 | 0.296714 | 0.583327 | 1.280206 | 0.704610 | 3.268624 | 2.307766 | 1.885095 | 2.369202 | ... | 20.546691 | 36.770388 | 19.251176 | 36.692466 | 19.611863 | 29.188249 | 19.126885 | 20.383975 | 11.456099 | 35.691795 | |||
W/m^2 | Radiative Forcing | 0.17 | 0.0 | 0.145196 | 0.588132 | 0.504262 | 1.470827 | 0.993488 | 1.425550 | 2.427890 | 0.480049 | 2.223684 | ... | 10.840713 | 36.531543 | 2.360360 | 27.740388 | 51.581516 | 46.929276 | 13.762326 | 37.527886 | 19.760797 | 6.033400 | |||
ppm | Atmospheric Concentrations|CO2 | 0.17 | 0.0 | 0.065628 | 0.232688 | 0.327447 | 0.451701 | 1.516487 | 2.219258 | 0.855934 | 1.578310 | 1.989964 | ... | 4.559917 | 16.317969 | 23.904091 | 39.425163 | 14.148972 | 10.722061 | 28.922266 | 20.008239 | 18.867379 | 21.921475 | |||
K | Surface Temperature | 0.5 | 0.0 | 0.671150 | 1.118483 | 1.422507 | 2.688332 | 1.686841 | 4.078258 | 4.608650 | 2.776327 | 6.475209 | ... | 50.179144 | 54.696913 | 39.183757 | 57.732697 | 62.288314 | 55.309794 | 37.126860 | 30.490080 | 36.499659 | 54.377530 | |||
W/m^2 | Radiative Forcing | 0.5 | 0.0 | 0.359544 | 1.295210 | 1.780380 | 2.437602 | 1.746289 | 3.149115 | 3.529822 | 4.993532 | 3.902942 | ... | 25.372975 | 49.245870 | 34.143264 | 53.098487 | 66.840681 | 62.592220 | 50.400562 | 53.436196 | 61.604735 | 30.662371 | |||
ppm | Atmospheric Concentrations|CO2 | 0.5 | 0.0 | 0.433650 | 0.857406 | 1.582820 | 2.278573 | 2.718223 | 3.519086 | 2.990340 | 2.446087 | 3.159592 | ... | 34.572217 | 35.874652 | 43.326254 | 64.652163 | 40.317418 | 32.605075 | 56.686231 | 41.038774 | 42.504582 | 50.797859 | |||
K | Surface Temperature | 0.83 | 0.0 | 0.908219 | 1.880651 | 2.452630 | 3.637426 | 3.482034 | 4.489497 | 5.688110 | 5.552448 | 8.605425 | ... | 73.847889 | 62.151777 | 46.561900 | 77.070555 | 86.858658 | 72.080184 | 55.662564 | 72.838865 | 75.982725 | 76.436065 | |||
W/m^2 | Radiative Forcing | 0.83 | 0.0 | 0.562836 | 1.695227 | 2.753568 | 3.799103 | 3.091657 | 4.859401 | 4.415586 | 6.973682 | 6.482025 | ... | 40.641648 | 64.372672 | 61.959632 | 84.024716 | 80.695721 | 72.047379 | 84.062565 | 91.813414 | 78.850076 | 59.895748 | |||
ppm | Atmospheric Concentrations|CO2 | 0.83 | 0.0 | 0.797124 | 1.275196 | 2.705109 | 3.299206 | 4.064952 | 4.126332 | 3.943280 | 5.010057 | 7.041511 | ... | 55.141889 | 71.809934 | 85.769916 | 84.020664 | 70.922486 | 64.533392 | 66.302927 | 61.872636 | 63.552881 | 85.189724 | |||
K | Surface Temperature | 0.95 | 0.0 | 0.974985 | 1.919968 | 2.588011 | 3.760912 | 4.413736 | 4.804926 | 6.505753 | 6.861928 | 8.818295 | ... | 77.811092 | 70.364449 | 69.122839 | 78.283629 | 90.688538 | 82.790394 | 73.150702 | 82.855424 | 83.714328 | 87.435935 | |||
W/m^2 | Radiative Forcing | 0.95 | 0.0 | 0.611123 | 1.896075 | 2.800272 | 3.984080 | 4.194063 | 5.653744 | 4.734796 | 7.261602 | 7.285331 | ... | 58.998791 | 78.787894 | 74.681198 | 88.517980 | 82.993456 | 83.173390 | 90.093766 | 95.568408 | 89.202464 | 74.523839 | |||
ppm | Atmospheric Concentrations|CO2 | 0.95 | 0.0 | 0.949709 | 1.628910 | 2.910253 | 3.666489 | 4.590527 | 5.276475 | 4.865429 | 6.359732 | 7.992446 | ... | 64.161437 | 89.081493 | 92.150281 | 91.477217 | 81.001114 | 74.193005 | 69.579542 | 75.717243 | 79.095337 | 95.629739 | |||
K | Surface Temperature | mean | 0.0 | 0.596345 | 1.128918 | 1.514027 | 2.495971 | 1.973448 | 3.781227 | 4.163771 | 3.614195 | 5.647494 | ... | 47.382414 | 51.984829 | 37.887534 | 55.764676 | 55.719420 | 52.470270 | 40.295844 | 41.257616 | 41.233322 | 57.062650 | |||
W/m^2 | Radiative Forcing | mean | 0.0 | 0.352113 | 1.151058 | 1.654809 | 2.465450 | 1.969925 | 3.181354 | 3.500164 | 3.919097 | 4.248761 | ... | 27.212770 | 50.252181 | 33.433778 | 55.230069 | 64.547888 | 60.127783 | 49.044180 | 59.789565 | 53.391649 | 33.517276 | |||
ppm | Atmospheric Concentrations|CO2 | mean | 0.0 | 0.465642 | 0.852352 | 1.541577 | 2.084831 | 2.775635 | 3.229357 | 2.617146 | 3.008985 | 4.208213 | ... | 31.908838 | 43.124752 | 50.617634 | 61.602944 | 42.963605 | 36.562227 | 49.499952 | 42.543541 | 44.439230 | 53.544782 | |||
K | Surface Temperature | median | 0.0 | 0.671150 | 1.118483 | 1.422507 | 2.688332 | 1.686841 | 4.078258 | 4.608650 | 2.776327 | 6.475209 | ... | 50.179144 | 54.696913 | 39.183757 | 57.732697 | 62.288314 | 55.309794 | 37.126860 | 30.490080 | 36.499659 | 54.377530 | |||
W/m^2 | Radiative Forcing | median | 0.0 | 0.359544 | 1.295210 | 1.780380 | 2.437602 | 1.746289 | 3.149115 | 3.529822 | 4.993532 | 3.902942 | ... | 25.372975 | 49.245870 | 34.143264 | 53.098487 | 66.840681 | 62.592220 | 50.400562 | 53.436196 | 61.604735 | 30.662371 | |||
ppm | Atmospheric Concentrations|CO2 | median | 0.0 | 0.433650 | 0.857406 | 1.582820 | 2.278573 | 2.718223 | 3.519086 | 2.990340 | 2.446087 | 3.159592 | ... | 34.572217 | 35.874652 | 43.326254 | 64.652163 | 40.317418 | 32.605075 | 56.686231 | 41.038774 | 42.504582 | 50.797859 |
21 rows × 101 columns
Plotting¶
Calculate quantiles within plotting function¶
We can use plumeplot
directly to plot quantiles. This will calculate the quantiles as part of making the plot so if you’re doing this lots it might be faster to pre-calculate the quantiles, then make the plot instead (see below)
Note that in this case the default setttings in plumeplot
don’t produce anything that helpful, we show how to modify them in the cell below.
runs.plumeplot(quantile_over="run_id")
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df.reset_index(inplace=True)
(<Axes: >,
[<matplotlib.patches.Patch at 0x7fcb54d33130>,
<matplotlib.collections.PolyCollection at 0x7fcb54d33940>,
<matplotlib.lines.Line2D at 0x7fcb54d11a90>,
<matplotlib.patches.Patch at 0x7fcb54cf0d30>,
<matplotlib.lines.Line2D at 0x7fcb54cf0ee0>,
<matplotlib.patches.Patch at 0x7fcb54cf0fa0>,
<matplotlib.lines.Line2D at 0x7fcb54d0dfa0>,
<matplotlib.lines.Line2D at 0x7fcb54d0dfd0>,
<matplotlib.lines.Line2D at 0x7fcb54d0dd90>])
runs.plumeplot(
quantile_over="run_id",
quantiles_plumes=[
((0.05, 0.95), 0.2),
((0.17, 0.83), 0.5),
(("median",), 1.0),
],
hue_var="variable",
hue_label="Variable",
style_var="scenario",
style_label="Scenario",
)
(<Axes: >,
[<matplotlib.patches.Patch at 0x7fcb52425160>,
<matplotlib.collections.PolyCollection at 0x7fcb52425e80>,
<matplotlib.collections.PolyCollection at 0x7fcb50bd5a60>,
<matplotlib.lines.Line2D at 0x7fcb50b73580>,
<matplotlib.patches.Patch at 0x7fcb50be7c10>,
<matplotlib.lines.Line2D at 0x7fcb5241ecd0>,
<matplotlib.lines.Line2D at 0x7fcb5241ed00>,
<matplotlib.lines.Line2D at 0x7fcb5241ed30>,
<matplotlib.patches.Patch at 0x7fcb50be7b50>,
<matplotlib.lines.Line2D at 0x7fcb50be7a60>])
Pre-calculated quantiles¶
Alternately, we can cast the output of quantiles_over
to an ScmRun
object for ease of filtering and plotting.
summary_stats_scmrun = ScmRun(summary_stats)
summary_stats_scmrun
<ScmRun (timeseries: 21, timepoints: 101)>
Time:
Start: 2000-01-01T00:00:00
End: 2100-01-01T00:00:00
Meta:
model quantile region scenario unit variable
0 example 0.05 World ssp119 K Surface Temperature
1 example 0.05 World ssp119 W/m^2 Radiative Forcing
2 example 0.05 World ssp119 ppm Atmospheric Concentrations|CO2
3 example 0.17 World ssp119 K Surface Temperature
4 example 0.17 World ssp119 W/m^2 Radiative Forcing
5 example 0.17 World ssp119 ppm Atmospheric Concentrations|CO2
6 example 0.5 World ssp119 K Surface Temperature
7 example 0.5 World ssp119 W/m^2 Radiative Forcing
8 example 0.5 World ssp119 ppm Atmospheric Concentrations|CO2
9 example 0.83 World ssp119 K Surface Temperature
10 example 0.83 World ssp119 W/m^2 Radiative Forcing
11 example 0.83 World ssp119 ppm Atmospheric Concentrations|CO2
12 example 0.95 World ssp119 K Surface Temperature
13 example 0.95 World ssp119 W/m^2 Radiative Forcing
14 example 0.95 World ssp119 ppm Atmospheric Concentrations|CO2
15 example mean World ssp119 K Surface Temperature
16 example mean World ssp119 W/m^2 Radiative Forcing
17 example mean World ssp119 ppm Atmospheric Concentrations|CO2
18 example median World ssp119 K Surface Temperature
19 example median World ssp119 W/m^2 Radiative Forcing
20 example median World ssp119 ppm Atmospheric Concentrations|CO2
As discussed above, casting the output of quantiles_over
to an ScmRun
object helps avoid repeatedly calculating the quantiles.
summary_stats_scmrun.plumeplot(
quantiles_plumes=[
((0.05, 0.95), 0.2),
((0.17, 0.83), 0.5),
(("median",), 1.0),
],
hue_var="variable",
hue_label="Variable",
style_var="scenario",
style_label="Scenario",
pre_calculated=True,
)
(<Axes: >,
[<matplotlib.patches.Patch at 0x7fcb50b67e20>,
<matplotlib.collections.PolyCollection at 0x7fcb50aed8b0>,
<matplotlib.collections.PolyCollection at 0x7fcb50b265e0>,
<matplotlib.lines.Line2D at 0x7fcb50ab0e20>,
<matplotlib.patches.Patch at 0x7fcb50b175b0>,
<matplotlib.lines.Line2D at 0x7fcb54d045e0>,
<matplotlib.lines.Line2D at 0x7fcb54d04b50>,
<matplotlib.lines.Line2D at 0x7fcb50be7370>,
<matplotlib.patches.Patch at 0x7fcb50b17880>,
<matplotlib.lines.Line2D at 0x7fcb50b17760>])
If we don’t want a plume plot, we can always our standard lineplot method.
summary_stats_scmrun.filter(variable="Radiative Forcing").lineplot(hue="quantile")
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/plotting.py:79: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar='sd'` for the same effect.
ax = sns.lineplot(data=plt_df, **kwargs)
<Axes: xlabel='time', ylabel='W/m^2'>
groupby
¶
The groupby
method allows us to group the data by columns in scmrun.meta
and then perform operations. An example is given below.
variable_means = []
for vdf in runs.groupby("variable"):
vdf_mean = vdf.timeseries().mean(axis=0)
vdf_mean.name = vdf.get_unique_meta("variable", True)
variable_means.append(vdf_mean)
pd.DataFrame(variable_means)
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Atmospheric Concentrations|CO2 | 0.0 | 0.465642 | 0.852352 | 1.541577 | 2.084831 | 2.775635 | 3.229357 | 2.617146 | 3.008985 | 4.208213 | ... | 31.908838 | 43.124752 | 50.617634 | 61.602944 | 42.963605 | 36.562227 | 49.499952 | 42.543541 | 44.439230 | 53.544782 |
Radiative Forcing | 0.0 | 0.352113 | 1.151058 | 1.654809 | 2.465450 | 1.969925 | 3.181354 | 3.500164 | 3.919097 | 4.248761 | ... | 27.212770 | 50.252181 | 33.433778 | 55.230069 | 64.547888 | 60.127783 | 49.044180 | 59.789565 | 53.391649 | 33.517276 |
Surface Temperature | 0.0 | 0.596345 | 1.128918 | 1.514027 | 2.495971 | 1.973448 | 3.781227 | 4.163771 | 3.614195 | 5.647494 | ... | 47.382414 | 51.984829 | 37.887534 | 55.764676 | 55.719420 | 52.470270 | 40.295844 | 41.257616 | 41.233322 | 57.062650 |
3 rows × 101 columns
groupby_all_except
¶
The groupby_all_except
method allows us to group the data by all columns in scmrun.meta
except for a certain set. Like with groupby
, we can then use the groups to perform operations. An example is given below. Note that, in most cases, using process_over
is likely to be more useful.
ensemble_means = []
for edf in runs.groupby_all_except("run_id"):
edf_mean = edf.timeseries().mean(axis=0)
edf_mean.name = edf.get_unique_meta("variable", True)
ensemble_means.append(edf_mean)
pd.DataFrame(ensemble_means)
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Surface Temperature | 0.0 | 0.596345 | 1.128918 | 1.514027 | 2.495971 | 1.973448 | 3.781227 | 4.163771 | 3.614195 | 5.647494 | ... | 47.382414 | 51.984829 | 37.887534 | 55.764676 | 55.719420 | 52.470270 | 40.295844 | 41.257616 | 41.233322 | 57.062650 |
Radiative Forcing | 0.0 | 0.352113 | 1.151058 | 1.654809 | 2.465450 | 1.969925 | 3.181354 | 3.500164 | 3.919097 | 4.248761 | ... | 27.212770 | 50.252181 | 33.433778 | 55.230069 | 64.547888 | 60.127783 | 49.044180 | 59.789565 | 53.391649 | 33.517276 |
Atmospheric Concentrations|CO2 | 0.0 | 0.465642 | 0.852352 | 1.541577 | 2.084831 | 2.775635 | 3.229357 | 2.617146 | 3.008985 | 4.208213 | ... | 31.908838 | 43.124752 | 50.617634 | 61.602944 | 42.963605 | 36.562227 | 49.499952 | 42.543541 | 44.439230 | 53.544782 |
3 rows × 101 columns
As we said, in most cases using process_over
is likely to be more useful. For example the above can be done using process_over
in one line (and more metadata is retained).
runs.process_over("run_id", "mean")
time | 2000-01-01 | 2001-01-01 | 2002-01-01 | 2003-01-01 | 2004-01-01 | 2005-01-01 | 2006-01-01 | 2007-01-01 | 2008-01-01 | 2009-01-01 | ... | 2091-01-01 | 2092-01-01 | 2093-01-01 | 2094-01-01 | 2095-01-01 | 2096-01-01 | 2097-01-01 | 2098-01-01 | 2099-01-01 | 2100-01-01 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
example | World | ssp119 | K | Surface Temperature | 0.0 | 0.596345 | 1.128918 | 1.514027 | 2.495971 | 1.973448 | 3.781227 | 4.163771 | 3.614195 | 5.647494 | ... | 47.382414 | 51.984829 | 37.887534 | 55.764676 | 55.719420 | 52.470270 | 40.295844 | 41.257616 | 41.233322 | 57.062650 |
W/m^2 | Radiative Forcing | 0.0 | 0.352113 | 1.151058 | 1.654809 | 2.465450 | 1.969925 | 3.181354 | 3.500164 | 3.919097 | 4.248761 | ... | 27.212770 | 50.252181 | 33.433778 | 55.230069 | 64.547888 | 60.127783 | 49.044180 | 59.789565 | 53.391649 | 33.517276 | |||
ppm | Atmospheric Concentrations|CO2 | 0.0 | 0.465642 | 0.852352 | 1.541577 | 2.084831 | 2.775635 | 3.229357 | 2.617146 | 3.008985 | 4.208213 | ... | 31.908838 | 43.124752 | 50.617634 | 61.602944 | 42.963605 | 36.562227 | 49.499952 | 42.543541 | 44.439230 | 53.544782 |
3 rows × 101 columns