Processing

scmdata has some support for processing ScmRun instances to calculate statistics of interest. Here we provide examples of how to use them.

At present, we can calculate:

crossing times (e.g. 1.5C crossing times)
crossing time quantiles
exceedance probabilities
peak
peak year
categorisation in line with SR1.5
a set of summary variables

Load some data

For this demonstration, we are going to use MAGICC output from [RCMIP Phase 2] as available at https://zenodo.org/record/4624566/files/data-processed-submission-database-hadcrut5-target-MAGICCv7.5.1.tar.gz?download=1. Here we have just extracted the air temperature output for the SSPs from 1995 to 2100.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

import scmdata.processing
from scmdata import ScmRun

/tmp/ipykernel_836/579853008.py:3: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd

/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/database/_database.py:9: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  import tqdm.autonotebook as tqdman

magicc_output = ScmRun("magicc-rcmip-phase-2-gsat-output.csv")
magicc_output

<ScmRun (timeseries: 4800, timepoints: 106)>
Time:
	Start: 1995-01-01T00:00:00
	End: 2100-01-01T00:00:00
Meta:
	     climate_model  ensemble_member        model region scenario unit  \
	0     MAGICCv7.5.1                0  unspecified  World   ssp585    K   
	1     MAGICCv7.5.1                1  unspecified  World   ssp585    K   
	2     MAGICCv7.5.1                2  unspecified  World   ssp585    K   
	3     MAGICCv7.5.1                3  unspecified  World   ssp585    K   
	4     MAGICCv7.5.1                4  unspecified  World   ssp585    K   
	...            ...              ...          ...    ...      ...  ...   
	4795  MAGICCv7.5.1              595  unspecified  World   ssp119    K   
	4796  MAGICCv7.5.1              596  unspecified  World   ssp119    K   
	4797  MAGICCv7.5.1              597  unspecified  World   ssp119    K   
	4798  MAGICCv7.5.1              598  unspecified  World   ssp119    K   
	4799  MAGICCv7.5.1              599  unspecified  World   ssp119    K   
	
	                            variable  
	0     Surface Air Temperature Change  
	1     Surface Air Temperature Change  
	2     Surface Air Temperature Change  
	3     Surface Air Temperature Change  
	4     Surface Air Temperature Change  
	...                              ...  
	4795  Surface Air Temperature Change  
	4796  Surface Air Temperature Change  
	4797  Surface Air Temperature Change  
	4798  Surface Air Temperature Change  
	4799  Surface Air Temperature Change  
	
	[4800 rows x 7 columns]

Crossing times

The first thing we do is show how to calculate the crossing times of a given threshold.

crossing_time_15 = scmdata.processing.calculate_crossing_times(
    magicc_output,
    threshold=1.5,
)
crossing_time_15

climate_model  ensemble_member  model        region  scenario  unit  variable                      
MAGICCv7.5.1   0                unspecified  World   ssp585    K     Surface Air Temperature Change    2025.0
               1                unspecified  World   ssp585    K     Surface Air Temperature Change    2029.0
               2                unspecified  World   ssp585    K     Surface Air Temperature Change    2024.0
               3                unspecified  World   ssp585    K     Surface Air Temperature Change    2023.0
               4                unspecified  World   ssp585    K     Surface Air Temperature Change    2023.0
                                                                                                        ...  
               595              unspecified  World   ssp119    K     Surface Air Temperature Change    2023.0
               596              unspecified  World   ssp119    K     Surface Air Temperature Change       NaN
               597              unspecified  World   ssp119    K     Surface Air Temperature Change    2026.0
               598              unspecified  World   ssp119    K     Surface Air Temperature Change    2019.0
               599              unspecified  World   ssp119    K     Surface Air Temperature Change    2024.0
Length: 4800, dtype: float64

The output is a pd.Series, which is useful for many other pieces of work.

For example, we could make a plot with e.g. seaborn.

label = "1.5C crossing time"
pdf = crossing_time_15.reset_index().rename({0: label}, axis="columns")
sns.histplot(data=pdf, x=label, hue="scenario")

<Axes: xlabel='1.5C crossing time', ylabel='Count'>

../_images/49c85ff137e7e956ba965ae670cd6265e2cdcb354cb3827ba2c1e4742c07fcb5.png

label = "2.0C crossing time"
crossing_time_20 = scmdata.processing.calculate_crossing_times(
    magicc_output,
    threshold=2.0,
)

pdf = crossing_time_20.reset_index().rename({0: label}, axis="columns")
sns.histplot(data=pdf, x=label, hue="scenario")

<Axes: xlabel='2.0C crossing time', ylabel='Count'>

../_images/c1dd0ec4773cf080255b2101980e657c8733efd9af878690ed890f2853a16e5e.png

Crossing time quantiles

Calculating the quantiles of crossing times is a bit fiddly because some ensemble members will not cross the threshold at all. We show how quantiles can be calculated sensibly below. The calculation will return nan if that quantile in the crossing times corresponds to an ensemble member which never crosses the threshold.

scmdata.processing.calculate_crossing_times_quantiles(
    crossing_time_15,
    groupby=["climate_model", "model", "scenario"],
    quantiles=(0.05, 0.17, 0.5, 0.83, 0.95),
)

climate_model  model        scenario     quantile
MAGICCv7.5.1   unspecified  ssp119       0.05        2021.00
                                         0.17        2023.00
                                         0.50        2027.00
                                         0.83            NaN
                                         0.95            NaN
                            ssp126       0.05        2021.00
                                         0.17        2023.00
                                         0.50        2028.00
                                         0.83        2038.17
                                         0.95            NaN
                            ssp245       0.05        2021.00
                                         0.17        2023.00
                                         0.50        2028.00
                                         0.83        2034.00
                                         0.95        2039.00
                            ssp370       0.05        2021.00
                                         0.17        2024.00
                                         0.50        2028.00
                                         0.83        2032.00
                                         0.95        2036.00
                            ssp434       0.05        2021.00
                                         0.17        2023.00
                                         0.50        2027.00
                                         0.83        2034.00
                                         0.95        2041.00
                            ssp460       0.05        2021.00
                                         0.17        2023.00
                                         0.50        2027.00
                                         0.83        2032.00
                                         0.95        2036.00
                            ssp534-over  0.05        2020.00
                                         0.17        2022.00
                                         0.50        2025.00
                                         0.83        2030.00
                                         0.95        2033.05
                            ssp585       0.05        2020.00
                                         0.17        2022.00
                                         0.50        2025.00
                                         0.83        2030.00
                                         0.95        2034.00
dtype: float64

In the above, we can see that the 83rd percentile of crossing times is in fact to not cross at all under the SSP1-1.9 scenario.

Datetime output

If desired, data could be interpolated first before calculating the crossing times. In such cases, returning the output as datetime rather than year might be helpful.

scmdata.processing.calculate_crossing_times(
    magicc_output.resample("MS"),
    threshold=2.0,
    return_year=False,
)

climate_model  ensemble_member  model        region  scenario  unit  variable                      
MAGICCv7.5.1   0                unspecified  World   ssp585    K     Surface Air Temperature Change   2042-02-01
               1                unspecified  World   ssp585    K     Surface Air Temperature Change   2041-10-01
               2                unspecified  World   ssp585    K     Surface Air Temperature Change   2035-11-01
               3                unspecified  World   ssp585    K     Surface Air Temperature Change   2035-09-01
               4                unspecified  World   ssp585    K     Surface Air Temperature Change   2036-01-01
                                                                                                         ...    
               595              unspecified  World   ssp119    K     Surface Air Temperature Change          NaT
               596              unspecified  World   ssp119    K     Surface Air Temperature Change          NaT
               597              unspecified  World   ssp119    K     Surface Air Temperature Change          NaT
               598              unspecified  World   ssp119    K     Surface Air Temperature Change   2027-07-01
               599              unspecified  World   ssp119    K     Surface Air Temperature Change          NaT
Length: 4800, dtype: datetime64[ns]

Exceedance probabilities

Next we show how to calculate exceedance probabilities.

exceedance_probability_2C = scmdata.processing.calculate_exceedance_probabilities(
    magicc_output,
    process_over_cols=["ensemble_member", "variable"],
    threshold=2.0,
)
exceedance_probability_2C

climate_model  model        region  scenario     unit         
MAGICCv7.5.1   unspecified  World   ssp119       dimensionless    0.091667
                                    ssp126       dimensionless    0.350000
                                    ssp245       dimensionless    0.995000
                                    ssp370       dimensionless    1.000000
                                    ssp434       dimensionless    0.868333
                                    ssp460       dimensionless    1.000000
                                    ssp534-over  dimensionless    0.983333
                                    ssp585       dimensionless    1.000000
Name: 2.0 exceedance probability, dtype: float64

We can make a plot to compare exceedance probabilities over multiple scenarios.

exceedance_probability_15C = scmdata.processing.calculate_exceedance_probabilities(
    magicc_output,
    process_over_cols=["ensemble_member", "variable"],
    threshold=1.5,
)

pdf = (
    pd.DataFrame(
        [
            exceedance_probability_2C,
            exceedance_probability_15C,
        ]
    )
    .T.melt(ignore_index=False, value_name="Exceedance probability")
    .reset_index()
)
display(pdf)  # noqa

ax = sns.barplot(data=pdf, x="scenario", y="Exceedance probability", hue="variable")
ax.tick_params(labelrotation=30)
ax.set_ylim([0, 1])
ax.legend(loc="center left", bbox_to_anchor=(1.01, 0.5), title="threshold")
plt.tight_layout()

	climate_model	model	region	scenario	unit	variable	Exceedance probability
0	MAGICCv7.5.1	unspecified	World	ssp119	dimensionless	2.0 exceedance probability	0.091667
1	MAGICCv7.5.1	unspecified	World	ssp126	dimensionless	2.0 exceedance probability	0.350000
2	MAGICCv7.5.1	unspecified	World	ssp245	dimensionless	2.0 exceedance probability	0.995000
3	MAGICCv7.5.1	unspecified	World	ssp370	dimensionless	2.0 exceedance probability	1.000000
4	MAGICCv7.5.1	unspecified	World	ssp434	dimensionless	2.0 exceedance probability	0.868333
5	MAGICCv7.5.1	unspecified	World	ssp460	dimensionless	2.0 exceedance probability	1.000000
6	MAGICCv7.5.1	unspecified	World	ssp534-over	dimensionless	2.0 exceedance probability	0.983333
7	MAGICCv7.5.1	unspecified	World	ssp585	dimensionless	2.0 exceedance probability	1.000000
8	MAGICCv7.5.1	unspecified	World	ssp119	dimensionless	1.5 exceedance probability	0.743333
9	MAGICCv7.5.1	unspecified	World	ssp126	dimensionless	1.5 exceedance probability	0.940000
10	MAGICCv7.5.1	unspecified	World	ssp245	dimensionless	1.5 exceedance probability	1.000000
11	MAGICCv7.5.1	unspecified	World	ssp370	dimensionless	1.5 exceedance probability	1.000000
12	MAGICCv7.5.1	unspecified	World	ssp434	dimensionless	1.5 exceedance probability	1.000000
13	MAGICCv7.5.1	unspecified	World	ssp460	dimensionless	1.5 exceedance probability	1.000000
14	MAGICCv7.5.1	unspecified	World	ssp534-over	dimensionless	1.5 exceedance probability	1.000000
15	MAGICCv7.5.1	unspecified	World	ssp585	dimensionless	1.5 exceedance probability	1.000000

../_images/c544ffc462b6e8d9361946a4d5ec4ede62795fae5c82c0b379b94f1817e96a89.png

Exceedance probabilities over time

It is also possible to calculate exceedance probabilities over time.

res = scmdata.processing.calculate_exceedance_probabilities_over_time(
    magicc_output,
    process_over_cols="ensemble_member",
    threshold=1.5,
)
res = scmdata.ScmRun(res)
res.lineplot(style="variable")

<Axes: xlabel='time', ylabel='dimensionless'>

../_images/f9edfeebabde25187d3e2ab1a242345bdc71e84ed5ff445b1af2f88afff736c6.png

Note that taking the maximum exceedance probability over all time will be less than or equal to the exceedance probability calculated with calculate_exceedance_probabilities because the order of operations matters: calculating whether each ensemble member exceeds the threshold or not then seeing how many ensemble members out of the total exceed the threshold is not the same as seeing how many ensemble members exceed the threshold at each timestep and then taking the maximum over all timesteps. In general, taking the maximum value from calculate_exceedance_probabilities_over_time will be less than or equal to the results of calculate_exceedance_probabilities, as demonstrated below.

comparison = (
    pd.DataFrame(
        {
            "calculate_exceedance_probabilities": exceedance_probability_15C,
            "max of calculate_exceedance_probabilities_over_time": (
                res.timeseries(meta=exceedance_probability_15C.index.names).max(axis=1)
            ),
        }
    )
    * 100
)
comparison.round(1)

					calculate_exceedance_probabilities	max of calculate_exceedance_probabilities_over_time
climate_model	model	region	scenario	unit
MAGICCv7.5.1	unspecified	World	ssp119	dimensionless	74.3	73.8
			ssp126	dimensionless	94.0	93.7
			ssp245	dimensionless	100.0	100.0
			ssp370	dimensionless	100.0	100.0
			ssp434	dimensionless	100.0	100.0
			ssp460	dimensionless	100.0	100.0
			ssp534-over	dimensionless	100.0	100.0
			ssp585	dimensionless	100.0	100.0

Peak

We can calculate the peaks in each timeseries.

peak_warming = scmdata.processing.calculate_peak(magicc_output)
peak_warming

climate_model  ensemble_member  model        region  scenario  unit  variable                           
MAGICCv7.5.1   0                unspecified  World   ssp585    K     Peak Surface Air Temperature Change    4.448044
               1                unspecified  World   ssp585    K     Peak Surface Air Temperature Change    5.372057
               2                unspecified  World   ssp585    K     Peak Surface Air Temperature Change    5.942238
               3                unspecified  World   ssp585    K     Peak Surface Air Temperature Change    5.220904
               4                unspecified  World   ssp585    K     Peak Surface Air Temperature Change    5.637467
                                                                                                              ...   
               595              unspecified  World   ssp119    K     Peak Surface Air Temperature Change    1.809161
               596              unspecified  World   ssp119    K     Peak Surface Air Temperature Change    1.288471
               597              unspecified  World   ssp119    K     Peak Surface Air Temperature Change    1.669414
               598              unspecified  World   ssp119    K     Peak Surface Air Temperature Change    2.459078
               599              unspecified  World   ssp119    K     Peak Surface Air Temperature Change    1.746321
Name: 0, Length: 4800, dtype: float64

From this we can then calculate median peak warming by scenario (or other quantiles).

peak_warming.groupby(["model", "scenario"]).median()

model        scenario   
unspecified  ssp119         1.653154
             ssp126         1.878890
             ssp245         2.838238
             ssp370         4.372581
             ssp434         2.371833
             ssp460         3.485626
             ssp534-over    2.613975
             ssp585         5.278692
Name: 0, dtype: float64

Or make a plot.

label = "Peak warming"
pdf = peak_warming.reset_index().rename({0: label}, axis="columns")
sns.histplot(data=pdf, x=label, hue="scenario")

<Axes: xlabel='Peak warming', ylabel='Count'>

../_images/27a040c178d5e5f11355cbe10e8b5413837e2a35523e220060648c82e4b32567.png

Similarly to exceedance probabilties, the order of operations matters: calculating the median of the peaks is different to calculating the median then taking the peak of this median timeseries. In general, the max of the median timeseries is less than or equal to the median of the peak in each timeseries.

comparison = pd.DataFrame(
    {
        "median of peak warming": peak_warming.groupby(["model", "scenario"]).median(),
        "max of median timeseries": (
            magicc_output.process_over(
                list(set(magicc_output.meta.columns) - {"model", "scenario"}),
                "median",
            ).max(axis=1)
        ),
    }
)
comparison.round(3)

		median of peak warming	max of median timeseries
model	scenario
unspecified	ssp119	1.653	1.645
	ssp126	1.879	1.874
	ssp245	2.838	2.838
	ssp370	4.373	4.373
	ssp434	2.372	2.368
	ssp460	3.486	3.486
	ssp534-over	2.614	2.613
	ssp585	5.279	5.279

Peak time

We can calculate the peak time in each timeseries.

peak_warming_year = scmdata.processing.calculate_peak_time(magicc_output)
peak_warming_year

climate_model  ensemble_member  model        region  scenario  unit  variable                                   
MAGICCv7.5.1   0                unspecified  World   ssp585    K     Year of peak Surface Air Temperature Change    2100
               1                unspecified  World   ssp585    K     Year of peak Surface Air Temperature Change    2100
               2                unspecified  World   ssp585    K     Year of peak Surface Air Temperature Change    2100
               3                unspecified  World   ssp585    K     Year of peak Surface Air Temperature Change    2100
               4                unspecified  World   ssp585    K     Year of peak Surface Air Temperature Change    2100
                                                                                                                    ... 
               595              unspecified  World   ssp119    K     Year of peak Surface Air Temperature Change    2048
               596              unspecified  World   ssp119    K     Year of peak Surface Air Temperature Change    2036
               597              unspecified  World   ssp119    K     Year of peak Surface Air Temperature Change    2039
               598              unspecified  World   ssp119    K     Year of peak Surface Air Temperature Change    2059
               599              unspecified  World   ssp119    K     Year of peak Surface Air Temperature Change    2048
Name: 0, Length: 4800, dtype: int64

From this we can then calculate median peak warming time by scenario (or other quantiles).

peak_warming_year.groupby(["model", "scenario"]).median()

model        scenario   
unspecified  ssp119         2048.0
             ssp126         2069.0
             ssp245         2100.0
             ssp370         2100.0
             ssp434         2094.0
             ssp460         2100.0
             ssp534-over    2060.0
             ssp585         2100.0
Name: 0, dtype: float64

label = "Peak warming year"
pdf = peak_warming_year.reset_index().rename({0: label}, axis="columns")
ax = sns.histplot(data=pdf, x=label, hue="scenario", bins=np.arange(2025, 2100 + 1))

../_images/ab466cc4defaadfed4f95657199755b3ff0bcc8a704fa818ff75fc8ea58a75f8.png

SR1.5 categorisation

It is also possible to categorise the scenarios using the same categorisation as in SR1.5. To do this we have to first calculate the appropriate quantiles.

magicc_output_categorisation_quantiles = scmdata.ScmRun(
    magicc_output.quantiles_over("ensemble_member", quantiles=[0.33, 0.5, 0.66])
)
magicc_output_categorisation_quantiles

/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)

<ScmRun (timeseries: 24, timepoints: 106)>
Time:
	Start: 1995-01-01T00:00:00
	End: 2100-01-01T00:00:00
Meta:
	   climate_model        model  quantile region     scenario unit  \
 MAGICCv7.5.1  unspecified      0.33  World       ssp119    K   
 MAGICCv7.5.1  unspecified      0.33  World       ssp126    K   
 MAGICCv7.5.1  unspecified      0.33  World       ssp245    K   
 MAGICCv7.5.1  unspecified      0.33  World       ssp370    K   
 MAGICCv7.5.1  unspecified      0.33  World       ssp434    K   
 MAGICCv7.5.1  unspecified      0.33  World       ssp460    K   
 MAGICCv7.5.1  unspecified      0.33  World  ssp534-over    K   
 MAGICCv7.5.1  unspecified      0.33  World       ssp585    K   
 MAGICCv7.5.1  unspecified      0.50  World       ssp119    K   
 MAGICCv7.5.1  unspecified      0.50  World       ssp126    K   
MAGICCv7.5.1  unspecified      0.50  World       ssp245    K   
MAGICCv7.5.1  unspecified      0.50  World       ssp370    K   
MAGICCv7.5.1  unspecified      0.50  World       ssp434    K   
MAGICCv7.5.1  unspecified      0.50  World       ssp460    K   
MAGICCv7.5.1  unspecified      0.50  World  ssp534-over    K   
MAGICCv7.5.1  unspecified      0.50  World       ssp585    K   
MAGICCv7.5.1  unspecified      0.66  World       ssp119    K   
MAGICCv7.5.1  unspecified      0.66  World       ssp126    K   
MAGICCv7.5.1  unspecified      0.66  World       ssp245    K   
MAGICCv7.5.1  unspecified      0.66  World       ssp370    K   
MAGICCv7.5.1  unspecified      0.66  World       ssp434    K   
MAGICCv7.5.1  unspecified      0.66  World       ssp460    K   
MAGICCv7.5.1  unspecified      0.66  World  ssp534-over    K   
MAGICCv7.5.1  unspecified      0.66  World       ssp585    K   
	
	                          variable  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
 Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  
Surface Air Temperature Change  

scmdata.processing.categorisation_sr15(
    magicc_output_categorisation_quantiles, ["climate_model", "scenario"]
)

climate_model  scenario   
MAGICCv7.5.1   ssp119         1.5C high overshoot
               ssp126                   Higher 2C
               ssp245                    Above 2C
               ssp370                    Above 2C
               ssp434                    Above 2C
               ssp460                    Above 2C
               ssp534-over               Above 2C
               ssp585                    Above 2C
Name: category, dtype: object

Set of summary variables

It is also possible to calculate a set of summary variables using the convenience function calculate_summary_stats. The documentation is given below.

print(scmdata.processing.calculate_summary_stats.__doc__)

    Calculate common summary statistics

    Parameters
    ----------
    scmrun : :class:`scmdata.ScmRun`
        Data of which to calculate the stats

    index : list[str]
        Columns to use in the index of the output (unit is added if not
        included)

    exceedance_probabilities_thresholds : list[float]
        Thresholds to use for exceedance probabilities

    exceedance_probabilities_variable : str
        Variable to use for exceedance probability calculations

    exceedance_probabilities_naming_base : str
        String to use as the base for naming the exceedance probabilities. Each
        exceedance probability output column will have a name given by
        ``exceedance_probabilities_naming_base.format(threshold)`` where
        threshold is the exceedance probability threshold to use. If not
        supplied, the default output of
        :func:`scmdata.processing.calculate_exceedance_probabilities` will be
        used.

    peak_quantiles : list[float]
        Quantiles to report in peak calculations

    peak_variable : str
        Variable of which to calculate the peak

    peak_naming_base : str
        Base to use for naming the peak outputs. This is combined with the
        quantile. If not supplied, ``"{} peak"`` is used so the outputs will be
        named e.g. "0.05 peak", "0.5 peak", "0.95 peak".

    peak_time_naming_base : str
        Base to use for naming the peak time outputs. This is combined with the
        quantile. If not supplied, ``"{} peak year"`` is used (unless
        ``peak_return_year`` is ``False`` in which case ``"{} peak time"`` is
        used) so the outputs will be named e.g. "0.05 peak year", "0.5 peak
        year", "0.95 peak year".

    peak_return_year : bool
        If ``True``, return the year of the peak of ``peak_variable``,
        otherwise return full dates

    categorisation_variable : str
        Variable to use for categorisation. Note that this variable point to
        timeseries that contain global-mean surface air temperatures  (GSAT)
        relative to 1850-1900 (using another reference period will not break
        this function, but is inconsistent with the original algorithm).

    categorisation_quantile_cols : list[str]
        Columns which represent individual ensemble members in the output (e.g.
        ["ensemble_member"]). The quantiles are taking over these columns
        before the data is passed to
        :func:`scmdata.processing.categorisation_sr15`.

    progress : bool
        Should a progress bar be shown whilst the calculations are done?

    Returns
    -------
    :class:`pd.DataFrame`
        Summary statistics, with each column being a statistic and the index
        being given by ``index``
    

It can be used to calculate summary statistics as shown below.

summary_stats = scmdata.processing.calculate_summary_stats(
    magicc_output,
    ["climate_model", "model", "scenario", "region"],
    exceedance_probabilities_thresholds=np.arange(1, 2.51, 0.1),
    exceedance_probabilities_variable="Surface Air Temperature Change",
    exceedance_probabilities_naming_base="Exceedance Probability|{:.2f}K",
    peak_quantiles=[0.05, 0.5, 0.95],
    peak_variable="Surface Air Temperature Change",
    peak_naming_base="Peak Surface Air Temperature Change|{}th quantile",
    peak_time_naming_base="Year of peak Surface Air Temperature Change|{}th quantile",
    peak_return_year=True,
    progress=True,
)
summary_stats

/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)
/home/docs/checkouts/readthedocs.org/user_builds/scmdata/checkouts/stable/src/scmdata/run.py:197: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df.reset_index(inplace=True)

  0%|          | 0/23 [00:00<?, ?it/s]

 22%|██▏       | 5/23 [00:00<00:00, 49.64it/s]

 48%|████▊     | 11/23 [00:00<00:00, 50.38it/s]

 74%|███████▍  | 17/23 [00:00<00:00, 53.27it/s]

100%|██████████| 23/23 [00:00<00:00, 54.97it/s]

100%|██████████| 23/23 [00:00<00:00, 53.52it/s]

climate_model  model        scenario     region  unit           statistic                   
MAGICCv7.5.1   unspecified  ssp119       World   dimensionless  Exceedance Probability|1.00K         1.0
                                                                Exceedance Probability|1.10K         1.0
                                                                Exceedance Probability|1.20K         1.0
                                                                Exceedance Probability|1.30K    0.976667
                                                                Exceedance Probability|1.40K    0.888333
                                                                                                  ...   
                            ssp370       World                  SR1.5 category                  Above 2C
                            ssp434       World                  SR1.5 category                  Above 2C
                            ssp460       World                  SR1.5 category                  Above 2C
                            ssp534-over  World                  SR1.5 category                  Above 2C
                            ssp585       World                  SR1.5 category                  Above 2C
Name: value, Length: 184, dtype: object

We can then use pandas to create summary tables of interest.

summary_stats.unstack(["climate_model", "statistic", "unit"])

		climate_model	MAGICCv7.5.1
		statistic	Exceedance Probability\|1.00K	Exceedance Probability\|1.10K	Exceedance Probability\|1.20K	Exceedance Probability\|1.30K	Exceedance Probability\|1.40K	Exceedance Probability\|1.50K	Exceedance Probability\|1.60K	Exceedance Probability\|1.70K	Exceedance Probability\|1.80K	Exceedance Probability\|1.90K	...	Exceedance Probability\|2.30K	Exceedance Probability\|2.40K	Exceedance Probability\|2.50K	Peak Surface Air Temperature Change\|0.05th quantile	Peak Surface Air Temperature Change\|0.5th quantile	Peak Surface Air Temperature Change\|0.95th quantile	Year of peak Surface Air Temperature Change\|0.05th quantile	Year of peak Surface Air Temperature Change\|0.5th quantile	Year of peak Surface Air Temperature Change\|0.95th quantile	SR1.5 category
		unit	dimensionless	dimensionless	dimensionless	dimensionless	dimensionless	dimensionless	dimensionless	dimensionless	dimensionless	dimensionless	...	dimensionless	dimensionless	dimensionless	K	K	K	K	K	K
model	scenario	region
unspecified	ssp119	World	1.0	1.0	1.0	0.976667	0.888333	0.743333	0.608333	0.418333	0.286667	0.156667	...	0.008333	0.003333	0.0	1.345672	1.653154	2.071348	2037.0	2048.0	2050.0	1.5C high overshoot
	ssp126	World	1.0	1.0	1.0	1.0	0.988333	0.94	0.836667	0.735	0.618333	0.465	...	0.096667	0.056667	0.025	1.478604	1.87889	2.425129	2059.0	2069.0	2081.0	Higher 2C
	ssp245	World	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	0.911667	0.846667	0.783333	2.218657	2.838238	3.642782	2100.0	2100.0	2100.0	Above 2C
	ssp370	World	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	3.476406	4.372581	5.35643	2100.0	2100.0	2100.0	Above 2C
	ssp434	World	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.996667	0.978333	0.946667	...	0.58	0.463333	0.356667	1.898714	2.371833	2.958278	2092.0	2094.0	2100.0	Above 2C
	ssp460	World	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	1.0	1.0	0.995	2.753816	3.485626	4.373157	2100.0	2100.0	2100.0	Above 2C
	ssp534-over	World	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.996667	...	0.811667	0.725	0.63	2.100912	2.613975	3.299928	2058.0	2060.0	2069.0	Above 2C
	ssp585	World	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	4.094323	5.278692	6.75008	2100.0	2100.0	2100.0	Above 2C

8 rows × 23 columns

index = ["climate_model", "scenario"]
pivot_merge_unit = summary_stats.to_frame().reset_index()
pivot_merge_unit["statistic"] = pivot_merge_unit["statistic"] + pivot_merge_unit[
    "unit"
].apply(lambda x: f"({x})" if x else "")
pivot_merge_unit = pivot_merge_unit.drop("unit", axis="columns")
pivot_merge_unit = pivot_merge_unit.set_index(
    list(set(pivot_merge_unit.columns) - {"value"})
).unstack("statistic")
pivot_merge_unit

				value
			statistic	Exceedance Probability\|1.00K(dimensionless)	Exceedance Probability\|1.10K(dimensionless)	Exceedance Probability\|1.20K(dimensionless)	Exceedance Probability\|1.30K(dimensionless)	Exceedance Probability\|1.40K(dimensionless)	Exceedance Probability\|1.50K(dimensionless)	Exceedance Probability\|1.60K(dimensionless)	Exceedance Probability\|1.70K(dimensionless)	Exceedance Probability\|1.80K(dimensionless)	Exceedance Probability\|1.90K(dimensionless)	...	Exceedance Probability\|2.30K(dimensionless)	Exceedance Probability\|2.40K(dimensionless)	Exceedance Probability\|2.50K(dimensionless)	Peak Surface Air Temperature Change\|0.05th quantile(K)	Peak Surface Air Temperature Change\|0.5th quantile(K)	Peak Surface Air Temperature Change\|0.95th quantile(K)	SR1.5 category	Year of peak Surface Air Temperature Change\|0.05th quantile(K)	Year of peak Surface Air Temperature Change\|0.5th quantile(K)	Year of peak Surface Air Temperature Change\|0.95th quantile(K)
region	scenario	model	climate_model
World	ssp119	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	0.976667	0.888333	0.743333	0.608333	0.418333	0.286667	0.156667	...	0.008333	0.003333	0.0	1.345672	1.653154	2.071348	1.5C high overshoot	2037.0	2048.0	2050.0
	ssp126	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	1.0	0.988333	0.94	0.836667	0.735	0.618333	0.465	...	0.096667	0.056667	0.025	1.478604	1.87889	2.425129	Higher 2C	2059.0	2069.0	2081.0
	ssp245	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	0.911667	0.846667	0.783333	2.218657	2.838238	3.642782	Above 2C	2100.0	2100.0	2100.0
	ssp370	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	3.476406	4.372581	5.35643	Above 2C	2100.0	2100.0	2100.0
	ssp434	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.996667	0.978333	0.946667	...	0.58	0.463333	0.356667	1.898714	2.371833	2.958278	Above 2C	2092.0	2094.0	2100.0
	ssp460	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	1.0	1.0	0.995	2.753816	3.485626	4.373157	Above 2C	2100.0	2100.0	2100.0
	ssp534-over	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.996667	...	0.811667	0.725	0.63	2.100912	2.613975	3.299928	Above 2C	2058.0	2060.0	2069.0
	ssp585	unspecified	MAGICCv7.5.1	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	...	1.0	1.0	1.0	4.094323	5.278692	6.75008	Above 2C	2100.0	2100.0	2100.0

8 rows × 23 columns