Summary statistics

ScmRun objects have methods specific to calculating summary statistics. In this notebook we demonstrate them.

At present, the following methods are available:

  • process_over

  • quantiles_over

  • groupby

  • groupby_all_except

import numpy as np
import pandas as pd

from scmdata.run import ScmRun, run_append
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/database/_database.py:9: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  import tqdm.autonotebook as tqdman

Helper bits and piecs

def new_timeseries(
    n=101,
    count=1,
    model="example",
    scenario="ssp119",
    variable="Surface Temperature",
    unit="K",
    region="World",
    cls=ScmRun,
    **kwargs,
):
    data = np.random.rand(n, count) * np.arange(n)[:, np.newaxis]
    index = 2000 + np.arange(n)
    return cls(
        data,
        columns={
            "model": model,
            "scenario": scenario,
            "variable": variable,
            "region": region,
            "unit": unit,
            **kwargs,
        },
        index=index,
    )

Let’s create an ScmRun which contains a few variables and a number of runs. Such a dataframe would be used to store the results from an ensemble of simple climate model runs.

runs = run_append(
    [
        new_timeseries(
            count=3,
            variable=[
                "Surface Temperature",
                "Atmospheric Concentrations|CO2",
                "Radiative Forcing",
            ],
            unit=["K", "ppm", "W/m^2"],
            run_id=run_id,
        )
        for run_id in range(10)
    ]
)
runs.metadata["source"] = "fake data"
runs
<ScmRun (timeseries: 30, timepoints: 101)>
Time:
	Start: 2000-01-01T00:00:00
	End: 2100-01-01T00:00:00
Meta:
	      model region  run_id scenario   unit                        variable
	0   example  World       0   ssp119      K             Surface Temperature
	1   example  World       0   ssp119    ppm  Atmospheric Concentrations|CO2
	2   example  World       0   ssp119  W/m^2               Radiative Forcing
	3   example  World       1   ssp119      K             Surface Temperature
	4   example  World       1   ssp119    ppm  Atmospheric Concentrations|CO2
	5   example  World       1   ssp119  W/m^2               Radiative Forcing
	6   example  World       2   ssp119      K             Surface Temperature
	7   example  World       2   ssp119    ppm  Atmospheric Concentrations|CO2
	8   example  World       2   ssp119  W/m^2               Radiative Forcing
	9   example  World       3   ssp119      K             Surface Temperature
	10  example  World       3   ssp119    ppm  Atmospheric Concentrations|CO2
	11  example  World       3   ssp119  W/m^2               Radiative Forcing
	12  example  World       4   ssp119      K             Surface Temperature
	13  example  World       4   ssp119    ppm  Atmospheric Concentrations|CO2
	14  example  World       4   ssp119  W/m^2               Radiative Forcing
	15  example  World       5   ssp119      K             Surface Temperature
	16  example  World       5   ssp119    ppm  Atmospheric Concentrations|CO2
	17  example  World       5   ssp119  W/m^2               Radiative Forcing
	18  example  World       6   ssp119      K             Surface Temperature
	19  example  World       6   ssp119    ppm  Atmospheric Concentrations|CO2
	20  example  World       6   ssp119  W/m^2               Radiative Forcing
	21  example  World       7   ssp119      K             Surface Temperature
	22  example  World       7   ssp119    ppm  Atmospheric Concentrations|CO2
	23  example  World       7   ssp119  W/m^2               Radiative Forcing
	24  example  World       8   ssp119      K             Surface Temperature
	25  example  World       8   ssp119    ppm  Atmospheric Concentrations|CO2
	26  example  World       8   ssp119  W/m^2               Radiative Forcing
	27  example  World       9   ssp119      K             Surface Temperature
	28  example  World       9   ssp119    ppm  Atmospheric Concentrations|CO2
	29  example  World       9   ssp119  W/m^2               Radiative Forcing

process_over

The process_over method allows us to calculate a specific set of statistics on groups of timeseries. A number of pandas functions can be called including “sum”, “mean” and “describe”.

print(runs.process_over.__doc__)
        Process the data over the input columns.

        Parameters
        ----------
        cols
            Columns to perform the operation on. The timeseries will be grouped by all
            other columns in :attr:`meta`.

        operation : str or func
            The operation to perform.

            If a string is provided, the equivalent pandas groupby function is used. Note
            that not all groupby functions are available as some do not make sense for
            this particular application. Additional information about the arguments for
            the pandas groupby functions can be found at <https://pandas.pydata.org/pan
            das-docs/stable/reference/groupby.html>`_.

            If a function is provided, it will be applied to each group. The function must
            take a dataframe as its first argument and return a DataFrame, Series or scalar.

            Note that quantile means the value of the data at a given point in the cumulative
            distribution of values at each point in the timeseries, for each timeseries
            once the groupby is applied. As a result, using ``q=0.5`` is the same as
            taking the median and not the same as taking the mean/average.

        na_override: [int, float]
            Convert any nan value in the timeseries meta to this value during processsing.
            The meta values converted back to nan's before the run is returned. This
            should not need to be changed unless the existing metadata clashes with the
            default na_override value.

            This functionality is disabled if na_override is None, but may result in incorrect
            results if the timeseries meta includes any nan's.

        op_cols: dict of str: str
            Dictionary containing any columns that should be overridden after processing.

            If a required column from :class:`scmdata.ScmRun` is specified in ``cols`` and
            ``as_run=True``, an override must be provided for that column in ``op_cols``
            otherwise the conversion to :class:`scmdata.ScmRun` will fail.

        as_run: bool or subclass of BaseScmRun
            If True, return the resulting timeseries as an :class:`scmdata.ScmRun` object,
            otherwise if False, a :class:`pandas.DataFrame`or :class:`pandas.Series` is
            returned (depending on the nature of the operation). Some operations may not be
            able to be converted to a :class:`scmdata.ScmRun`. For example if the operation
            returns scalar values rather than timeseries.

            If a class is provided, the return value will be cast to this class.
        **kwargs
            Keyword arguments to pass ``operation`` (or the pandas operation if ``operation``
            is a string)

        Returns
        -------
        :class:`pandas.DataFrame` or :class:`pandas.Series` or :class:`scmdata.ScmRun`
            The result of ``operation``, grouped by all columns in :attr:`meta`
            other than :obj:`cols`

        Raises
        ------
        ValueError
            If the operation is not an allowed operation

            If the value of na_override clashes with any existing metadata

            If ``operation`` produces a :class:`pandas.Series`, but `as_run`` is True

            If ``as_run`` is not True, False or a subclass of :class:`scmdata.run.BaseScmRun`

        :class:`scmdata.errors.MissingRequiredColumnError`
            If `as_run` is not False and the result does not have the required metadata
            to convert to an :class`ScmRun <scmdata.ScmRun>`.
            This can be resolved by specifying additional metadata via ``op_cols``

        

Mean

mean = runs.process_over(cols="run_id", operation="mean")
mean
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
model region scenario unit variable
example World ssp119 K Surface Temperature 0.0 0.596345 1.128918 1.514027 2.495971 1.973448 3.781227 4.163771 3.614195 5.647494 ... 47.382414 51.984829 37.887534 55.764676 55.719420 52.470270 40.295844 41.257616 41.233322 57.062650
W/m^2 Radiative Forcing 0.0 0.352113 1.151058 1.654809 2.465450 1.969925 3.181354 3.500164 3.919097 4.248761 ... 27.212770 50.252181 33.433778 55.230069 64.547888 60.127783 49.044180 59.789565 53.391649 33.517276
ppm Atmospheric Concentrations|CO2 0.0 0.465642 0.852352 1.541577 2.084831 2.775635 3.229357 2.617146 3.008985 4.208213 ... 31.908838 43.124752 50.617634 61.602944 42.963605 36.562227 49.499952 42.543541 44.439230 53.544782

3 rows × 101 columns

Median

median = runs.process_over(cols="run_id", operation="median")
median
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
model region scenario unit variable
example World ssp119 K Surface Temperature 0.0 0.671150 1.118483 1.422507 2.688332 1.686841 4.078258 4.608650 2.776327 6.475209 ... 50.179144 54.696913 39.183757 57.732697 62.288314 55.309794 37.126860 30.490080 36.499659 54.377530
W/m^2 Radiative Forcing 0.0 0.359544 1.295210 1.780380 2.437602 1.746289 3.149115 3.529822 4.993532 3.902942 ... 25.372975 49.245870 34.143264 53.098487 66.840681 62.592220 50.400562 53.436196 61.604735 30.662371
ppm Atmospheric Concentrations|CO2 0.0 0.433650 0.857406 1.582820 2.278573 2.718223 3.519086 2.990340 2.446087 3.159592 ... 34.572217 35.874652 43.326254 64.652163 40.317418 32.605075 56.686231 41.038774 42.504582 50.797859

3 rows × 101 columns

Arbitrary functions

You are also able to run arbitrary functions for each group

def mean_and_invert(df, axis=0):
    # Take a mean across the group and then invert the result
    return -df.mean(axis=axis)


runs.process_over("run_id", operation=mean_and_invert)
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
model region scenario unit variable
example World ssp119 K Surface Temperature -0.0 -0.596345 -1.128918 -1.514027 -2.495971 -1.973448 -3.781227 -4.163771 -3.614195 -5.647494 ... -47.382414 -51.984829 -37.887534 -55.764676 -55.719420 -52.470270 -40.295844 -41.257616 -41.233322 -57.062650
W/m^2 Radiative Forcing -0.0 -0.352113 -1.151058 -1.654809 -2.465450 -1.969925 -3.181354 -3.500164 -3.919097 -4.248761 ... -27.212770 -50.252181 -33.433778 -55.230069 -64.547888 -60.127783 -49.044180 -59.789565 -53.391649 -33.517276
ppm Atmospheric Concentrations|CO2 -0.0 -0.465642 -0.852352 -1.541577 -2.084831 -2.775635 -3.229357 -2.617146 -3.008985 -4.208213 ... -31.908838 -43.124752 -50.617634 -61.602944 -42.963605 -36.562227 -49.499952 -42.543541 -44.439230 -53.544782

3 rows × 101 columns

runs.process_over("run_id", operation=mean_and_invert, axis=1)
model    region  run_id  scenario  unit   variable                      
example  World   0       ssp119    K      Surface Temperature              -23.399211
                 1       ssp119    K      Surface Temperature              -28.058534
                 2       ssp119    K      Surface Temperature              -26.310644
                 3       ssp119    K      Surface Temperature              -21.819595
                 4       ssp119    K      Surface Temperature              -27.532120
                 5       ssp119    K      Surface Temperature              -24.613193
                 6       ssp119    K      Surface Temperature              -25.478714
                 7       ssp119    K      Surface Temperature              -23.475447
                 8       ssp119    K      Surface Temperature              -23.304137
                 9       ssp119    K      Surface Temperature              -26.109388
                 0       ssp119    W/m^2  Radiative Forcing                -25.546476
                 1       ssp119    W/m^2  Radiative Forcing                -23.974383
                 2       ssp119    W/m^2  Radiative Forcing                -24.857438
                 3       ssp119    W/m^2  Radiative Forcing                -25.299717
                 4       ssp119    W/m^2  Radiative Forcing                -26.869726
                 5       ssp119    W/m^2  Radiative Forcing                -25.939956
                 6       ssp119    W/m^2  Radiative Forcing                -25.824626
                 7       ssp119    W/m^2  Radiative Forcing                -24.491061
                 8       ssp119    W/m^2  Radiative Forcing                -25.906752
                 9       ssp119    W/m^2  Radiative Forcing                -24.877365
                 0       ssp119    ppm    Atmospheric Concentrations|CO2   -24.211620
                 1       ssp119    ppm    Atmospheric Concentrations|CO2   -19.433876
                 2       ssp119    ppm    Atmospheric Concentrations|CO2   -24.452935
                 3       ssp119    ppm    Atmospheric Concentrations|CO2   -22.292322
                 4       ssp119    ppm    Atmospheric Concentrations|CO2   -26.544962
                 5       ssp119    ppm    Atmospheric Concentrations|CO2   -23.423582
                 6       ssp119    ppm    Atmospheric Concentrations|CO2   -23.834897
                 7       ssp119    ppm    Atmospheric Concentrations|CO2   -26.199737
                 8       ssp119    ppm    Atmospheric Concentrations|CO2   -27.005808
                 9       ssp119    ppm    Atmospheric Concentrations|CO2   -24.482185
dtype: float64

Other quantiles

lower_likely_quantile = runs.process_over(cols="run_id", operation="quantile", q=0.17)
lower_likely_quantile
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
model region scenario unit variable
example World ssp119 K Surface Temperature 0.0 0.277925 0.296714 0.583327 1.280206 0.704610 3.268624 2.307766 1.885095 2.369202 ... 20.546691 36.770388 19.251176 36.692466 19.611863 29.188249 19.126885 20.383975 11.456099 35.691795
W/m^2 Radiative Forcing 0.0 0.145196 0.588132 0.504262 1.470827 0.993488 1.425550 2.427890 0.480049 2.223684 ... 10.840713 36.531543 2.360360 27.740388 51.581516 46.929276 13.762326 37.527886 19.760797 6.033400
ppm Atmospheric Concentrations|CO2 0.0 0.065628 0.232688 0.327447 0.451701 1.516487 2.219258 0.855934 1.578310 1.989964 ... 4.559917 16.317969 23.904091 39.425163 14.148972 10.722061 28.922266 20.008239 18.867379 21.921475

3 rows × 101 columns

quantiles_over

If you want to calculate more than one summary statistic, quantiles_over will calculate and label multiple summary statistics before returning them.

print(runs.quantiles_over.__doc__)
        Calculate quantiles of the data over the input columns.

        Parameters
        ----------
        cols
            Columns to perform the operation on. The timeseries will be grouped by all
            other columns in :attr:`meta`.

        quantiles
            The quantiles to calculate. This should be a list of quantiles to calculate
            (quantile values between 0 and 1). ``quantiles`` can also include the strings
            "median" or "mean" if these values are to be calculated.

        **kwargs
            Passed to :meth:`~ScmRun.process_over`.

        Returns
        -------
        :class:`pandas.DataFrame`
            The quantiles of the timeseries, grouped by all columns in :attr:`meta`
            other than :obj:`cols`. Each calculated quantile is given a label which is
            stored in the ``quantile`` column within the output index.

        Raises
        ------
        TypeError
            ``operation`` is included in ``kwargs``. The operation is inferred from ``quantiles``.
        
summary_stats = runs.quantiles_over(
    cols="run_id", quantiles=[0.05, 0.17, 0.5, 0.83, 0.95, "mean", "median"]
)
summary_stats
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
model region scenario unit variable quantile
example World ssp119 K Surface Temperature 0.05 0.0 0.128703 0.250772 0.362986 1.021021 0.400146 1.674319 1.390815 1.504516 0.548557 ... 12.390077 26.639664 11.700526 19.607688 5.336946 19.897363 15.685909 13.514556 5.701723 31.650094
W/m^2 Radiative Forcing 0.05 0.0 0.069799 0.203154 0.376197 0.749395 0.388147 0.391588 2.403882 0.290108 1.081456 ... 4.762287 22.284656 0.726211 16.677124 37.907835 29.133397 9.598161 33.422452 7.558986 2.394234
ppm Atmospheric Concentrations|CO2 0.05 0.0 0.022913 0.129041 0.253741 0.397409 0.967627 1.116097 0.218300 0.791392 1.544788 ... 1.944475 4.758727 20.819875 15.518729 7.024698 4.129051 17.560585 12.477884 14.910313 4.244109
K Surface Temperature 0.17 0.0 0.277925 0.296714 0.583327 1.280206 0.704610 3.268624 2.307766 1.885095 2.369202 ... 20.546691 36.770388 19.251176 36.692466 19.611863 29.188249 19.126885 20.383975 11.456099 35.691795
W/m^2 Radiative Forcing 0.17 0.0 0.145196 0.588132 0.504262 1.470827 0.993488 1.425550 2.427890 0.480049 2.223684 ... 10.840713 36.531543 2.360360 27.740388 51.581516 46.929276 13.762326 37.527886 19.760797 6.033400
ppm Atmospheric Concentrations|CO2 0.17 0.0 0.065628 0.232688 0.327447 0.451701 1.516487 2.219258 0.855934 1.578310 1.989964 ... 4.559917 16.317969 23.904091 39.425163 14.148972 10.722061 28.922266 20.008239 18.867379 21.921475
K Surface Temperature 0.5 0.0 0.671150 1.118483 1.422507 2.688332 1.686841 4.078258 4.608650 2.776327 6.475209 ... 50.179144 54.696913 39.183757 57.732697 62.288314 55.309794 37.126860 30.490080 36.499659 54.377530
W/m^2 Radiative Forcing 0.5 0.0 0.359544 1.295210 1.780380 2.437602 1.746289 3.149115 3.529822 4.993532 3.902942 ... 25.372975 49.245870 34.143264 53.098487 66.840681 62.592220 50.400562 53.436196 61.604735 30.662371
ppm Atmospheric Concentrations|CO2 0.5 0.0 0.433650 0.857406 1.582820 2.278573 2.718223 3.519086 2.990340 2.446087 3.159592 ... 34.572217 35.874652 43.326254 64.652163 40.317418 32.605075 56.686231 41.038774 42.504582 50.797859
K Surface Temperature 0.83 0.0 0.908219 1.880651 2.452630 3.637426 3.482034 4.489497 5.688110 5.552448 8.605425 ... 73.847889 62.151777 46.561900 77.070555 86.858658 72.080184 55.662564 72.838865 75.982725 76.436065
W/m^2 Radiative Forcing 0.83 0.0 0.562836 1.695227 2.753568 3.799103 3.091657 4.859401 4.415586 6.973682 6.482025 ... 40.641648 64.372672 61.959632 84.024716 80.695721 72.047379 84.062565 91.813414 78.850076 59.895748
ppm Atmospheric Concentrations|CO2 0.83 0.0 0.797124 1.275196 2.705109 3.299206 4.064952 4.126332 3.943280 5.010057 7.041511 ... 55.141889 71.809934 85.769916 84.020664 70.922486 64.533392 66.302927 61.872636 63.552881 85.189724
K Surface Temperature 0.95 0.0 0.974985 1.919968 2.588011 3.760912 4.413736 4.804926 6.505753 6.861928 8.818295 ... 77.811092 70.364449 69.122839 78.283629 90.688538 82.790394 73.150702 82.855424 83.714328 87.435935
W/m^2 Radiative Forcing 0.95 0.0 0.611123 1.896075 2.800272 3.984080 4.194063 5.653744 4.734796 7.261602 7.285331 ... 58.998791 78.787894 74.681198 88.517980 82.993456 83.173390 90.093766 95.568408 89.202464 74.523839
ppm Atmospheric Concentrations|CO2 0.95 0.0 0.949709 1.628910 2.910253 3.666489 4.590527 5.276475 4.865429 6.359732 7.992446 ... 64.161437 89.081493 92.150281 91.477217 81.001114 74.193005 69.579542 75.717243 79.095337 95.629739
K Surface Temperature mean 0.0 0.596345 1.128918 1.514027 2.495971 1.973448 3.781227 4.163771 3.614195 5.647494 ... 47.382414 51.984829 37.887534 55.764676 55.719420 52.470270 40.295844 41.257616 41.233322 57.062650
W/m^2 Radiative Forcing mean 0.0 0.352113 1.151058 1.654809 2.465450 1.969925 3.181354 3.500164 3.919097 4.248761 ... 27.212770 50.252181 33.433778 55.230069 64.547888 60.127783 49.044180 59.789565 53.391649 33.517276
ppm Atmospheric Concentrations|CO2 mean 0.0 0.465642 0.852352 1.541577 2.084831 2.775635 3.229357 2.617146 3.008985 4.208213 ... 31.908838 43.124752 50.617634 61.602944 42.963605 36.562227 49.499952 42.543541 44.439230 53.544782
K Surface Temperature median 0.0 0.671150 1.118483 1.422507 2.688332 1.686841 4.078258 4.608650 2.776327 6.475209 ... 50.179144 54.696913 39.183757 57.732697 62.288314 55.309794 37.126860 30.490080 36.499659 54.377530
W/m^2 Radiative Forcing median 0.0 0.359544 1.295210 1.780380 2.437602 1.746289 3.149115 3.529822 4.993532 3.902942 ... 25.372975 49.245870 34.143264 53.098487 66.840681 62.592220 50.400562 53.436196 61.604735 30.662371
ppm Atmospheric Concentrations|CO2 median 0.0 0.433650 0.857406 1.582820 2.278573 2.718223 3.519086 2.990340 2.446087 3.159592 ... 34.572217 35.874652 43.326254 64.652163 40.317418 32.605075 56.686231 41.038774 42.504582 50.797859

21 rows × 101 columns

Plotting

Calculate quantiles within plotting function

We can use plumeplot directly to plot quantiles. This will calculate the quantiles as part of making the plot so if you’re doing this lots it might be faster to pre-calculate the quantiles, then make the plot instead (see below)

Note that in this case the default setttings in plumeplot don’t produce anything that helpful, we show how to modify them in the cell below.

runs.plumeplot(quantile_over="run_id")
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/run.py:195: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
(<Axes: >,
 [<matplotlib.patches.Patch at 0x7fcb54d33130>,
  <matplotlib.collections.PolyCollection at 0x7fcb54d33940>,
  <matplotlib.lines.Line2D at 0x7fcb54d11a90>,
  <matplotlib.patches.Patch at 0x7fcb54cf0d30>,
  <matplotlib.lines.Line2D at 0x7fcb54cf0ee0>,
  <matplotlib.patches.Patch at 0x7fcb54cf0fa0>,
  <matplotlib.lines.Line2D at 0x7fcb54d0dfa0>,
  <matplotlib.lines.Line2D at 0x7fcb54d0dfd0>,
  <matplotlib.lines.Line2D at 0x7fcb54d0dd90>])
../_images/47c8bd4cfa4f310db453d692dd22510e80b77b67ea3e2e4a685870e1c394b9af.png
runs.plumeplot(
    quantile_over="run_id",
    quantiles_plumes=[
        ((0.05, 0.95), 0.2),
        ((0.17, 0.83), 0.5),
        (("median",), 1.0),
    ],
    hue_var="variable",
    hue_label="Variable",
    style_var="scenario",
    style_label="Scenario",
)
(<Axes: >,
 [<matplotlib.patches.Patch at 0x7fcb52425160>,
  <matplotlib.collections.PolyCollection at 0x7fcb52425e80>,
  <matplotlib.collections.PolyCollection at 0x7fcb50bd5a60>,
  <matplotlib.lines.Line2D at 0x7fcb50b73580>,
  <matplotlib.patches.Patch at 0x7fcb50be7c10>,
  <matplotlib.lines.Line2D at 0x7fcb5241ecd0>,
  <matplotlib.lines.Line2D at 0x7fcb5241ed00>,
  <matplotlib.lines.Line2D at 0x7fcb5241ed30>,
  <matplotlib.patches.Patch at 0x7fcb50be7b50>,
  <matplotlib.lines.Line2D at 0x7fcb50be7a60>])
../_images/53e9f3aa79f007a0432831141fbc64d34bcfb6d310782d1f1b7971db6b8990b2.png

Pre-calculated quantiles

Alternately, we can cast the output of quantiles_over to an ScmRun object for ease of filtering and plotting.

summary_stats_scmrun = ScmRun(summary_stats)
summary_stats_scmrun
<ScmRun (timeseries: 21, timepoints: 101)>
Time:
	Start: 2000-01-01T00:00:00
	End: 2100-01-01T00:00:00
Meta:
	      model quantile region scenario   unit                        variable
	0   example     0.05  World   ssp119      K             Surface Temperature
	1   example     0.05  World   ssp119  W/m^2               Radiative Forcing
	2   example     0.05  World   ssp119    ppm  Atmospheric Concentrations|CO2
	3   example     0.17  World   ssp119      K             Surface Temperature
	4   example     0.17  World   ssp119  W/m^2               Radiative Forcing
	5   example     0.17  World   ssp119    ppm  Atmospheric Concentrations|CO2
	6   example      0.5  World   ssp119      K             Surface Temperature
	7   example      0.5  World   ssp119  W/m^2               Radiative Forcing
	8   example      0.5  World   ssp119    ppm  Atmospheric Concentrations|CO2
	9   example     0.83  World   ssp119      K             Surface Temperature
	10  example     0.83  World   ssp119  W/m^2               Radiative Forcing
	11  example     0.83  World   ssp119    ppm  Atmospheric Concentrations|CO2
	12  example     0.95  World   ssp119      K             Surface Temperature
	13  example     0.95  World   ssp119  W/m^2               Radiative Forcing
	14  example     0.95  World   ssp119    ppm  Atmospheric Concentrations|CO2
	15  example     mean  World   ssp119      K             Surface Temperature
	16  example     mean  World   ssp119  W/m^2               Radiative Forcing
	17  example     mean  World   ssp119    ppm  Atmospheric Concentrations|CO2
	18  example   median  World   ssp119      K             Surface Temperature
	19  example   median  World   ssp119  W/m^2               Radiative Forcing
	20  example   median  World   ssp119    ppm  Atmospheric Concentrations|CO2

As discussed above, casting the output of quantiles_over to an ScmRun object helps avoid repeatedly calculating the quantiles.

summary_stats_scmrun.plumeplot(
    quantiles_plumes=[
        ((0.05, 0.95), 0.2),
        ((0.17, 0.83), 0.5),
        (("median",), 1.0),
    ],
    hue_var="variable",
    hue_label="Variable",
    style_var="scenario",
    style_label="Scenario",
    pre_calculated=True,
)
(<Axes: >,
 [<matplotlib.patches.Patch at 0x7fcb50b67e20>,
  <matplotlib.collections.PolyCollection at 0x7fcb50aed8b0>,
  <matplotlib.collections.PolyCollection at 0x7fcb50b265e0>,
  <matplotlib.lines.Line2D at 0x7fcb50ab0e20>,
  <matplotlib.patches.Patch at 0x7fcb50b175b0>,
  <matplotlib.lines.Line2D at 0x7fcb54d045e0>,
  <matplotlib.lines.Line2D at 0x7fcb54d04b50>,
  <matplotlib.lines.Line2D at 0x7fcb50be7370>,
  <matplotlib.patches.Patch at 0x7fcb50b17880>,
  <matplotlib.lines.Line2D at 0x7fcb50b17760>])
../_images/53e9f3aa79f007a0432831141fbc64d34bcfb6d310782d1f1b7971db6b8990b2.png

If we don’t want a plume plot, we can always our standard lineplot method.

summary_stats_scmrun.filter(variable="Radiative Forcing").lineplot(hue="quantile")
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/envs/v0.15.2/lib/python3.9/site-packages/scmdata/plotting.py:79: FutureWarning: 

The `ci` parameter is deprecated. Use `errorbar='sd'` for the same effect.

  ax = sns.lineplot(data=plt_df, **kwargs)
<Axes: xlabel='time', ylabel='W/m^2'>
../_images/385f6e63ec56cf41482df19cf40acbbaee1b04290cbb848ef0a5bdae023dd021.png

groupby

The groupby method allows us to group the data by columns in scmrun.meta and then perform operations. An example is given below.

variable_means = []
for vdf in runs.groupby("variable"):
    vdf_mean = vdf.timeseries().mean(axis=0)
    vdf_mean.name = vdf.get_unique_meta("variable", True)
    variable_means.append(vdf_mean)

pd.DataFrame(variable_means)
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
Atmospheric Concentrations|CO2 0.0 0.465642 0.852352 1.541577 2.084831 2.775635 3.229357 2.617146 3.008985 4.208213 ... 31.908838 43.124752 50.617634 61.602944 42.963605 36.562227 49.499952 42.543541 44.439230 53.544782
Radiative Forcing 0.0 0.352113 1.151058 1.654809 2.465450 1.969925 3.181354 3.500164 3.919097 4.248761 ... 27.212770 50.252181 33.433778 55.230069 64.547888 60.127783 49.044180 59.789565 53.391649 33.517276
Surface Temperature 0.0 0.596345 1.128918 1.514027 2.495971 1.973448 3.781227 4.163771 3.614195 5.647494 ... 47.382414 51.984829 37.887534 55.764676 55.719420 52.470270 40.295844 41.257616 41.233322 57.062650

3 rows × 101 columns

groupby_all_except

The groupby_all_except method allows us to group the data by all columns in scmrun.meta except for a certain set. Like with groupby, we can then use the groups to perform operations. An example is given below. Note that, in most cases, using process_over is likely to be more useful.

ensemble_means = []
for edf in runs.groupby_all_except("run_id"):
    edf_mean = edf.timeseries().mean(axis=0)
    edf_mean.name = edf.get_unique_meta("variable", True)
    ensemble_means.append(edf_mean)

pd.DataFrame(ensemble_means)
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
Surface Temperature 0.0 0.596345 1.128918 1.514027 2.495971 1.973448 3.781227 4.163771 3.614195 5.647494 ... 47.382414 51.984829 37.887534 55.764676 55.719420 52.470270 40.295844 41.257616 41.233322 57.062650
Radiative Forcing 0.0 0.352113 1.151058 1.654809 2.465450 1.969925 3.181354 3.500164 3.919097 4.248761 ... 27.212770 50.252181 33.433778 55.230069 64.547888 60.127783 49.044180 59.789565 53.391649 33.517276
Atmospheric Concentrations|CO2 0.0 0.465642 0.852352 1.541577 2.084831 2.775635 3.229357 2.617146 3.008985 4.208213 ... 31.908838 43.124752 50.617634 61.602944 42.963605 36.562227 49.499952 42.543541 44.439230 53.544782

3 rows × 101 columns

As we said, in most cases using process_over is likely to be more useful. For example the above can be done using process_over in one line (and more metadata is retained).

runs.process_over("run_id", "mean")
time 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 ... 2091-01-01 2092-01-01 2093-01-01 2094-01-01 2095-01-01 2096-01-01 2097-01-01 2098-01-01 2099-01-01 2100-01-01
model region scenario unit variable
example World ssp119 K Surface Temperature 0.0 0.596345 1.128918 1.514027 2.495971 1.973448 3.781227 4.163771 3.614195 5.647494 ... 47.382414 51.984829 37.887534 55.764676 55.719420 52.470270 40.295844 41.257616 41.233322 57.062650
W/m^2 Radiative Forcing 0.0 0.352113 1.151058 1.654809 2.465450 1.969925 3.181354 3.500164 3.919097 4.248761 ... 27.212770 50.252181 33.433778 55.230069 64.547888 60.127783 49.044180 59.789565 53.391649 33.517276
ppm Atmospheric Concentrations|CO2 0.0 0.465642 0.852352 1.541577 2.084831 2.775635 3.229357 2.617146 3.008985 4.208213 ... 31.908838 43.124752 50.617634 61.602944 42.963605 36.562227 49.499952 42.543541 44.439230 53.544782

3 rows × 101 columns