scmdata.processing

Miscellaneous functions for processing scmdata.ScmRun

These functions are intended to be able to be used directly with scmdata.ScmRun.process_over().

scmdata.processing.calculate_crossing_times(scmrun, threshold, return_year=True)[source]

Calculate the time at which each timeseries crosses a given threshold

Parameters
  • scmrun (scmdata.ScmRun) – Data to calculate the crossing time of

  • threshold (float) – Value to use as the threshold for crossing

  • return_year (bool) – If True, return the year instead of the datetime

Returns

Crossing time for scmrun, using the meta of scmrun as the output’s index. If the threshold is not crossed, pd.NA is returned.

Return type

pd.Series

Notes

This function only returns times that are in the columns of scmrun. If you want a finer resolution then you should interpolate your data first. For example, if you have data on a ten-year timestep but want crossing times on an annual resolution, interpolate (or resample) to annual data before calling calculate_crossing_times.

scmdata.processing.calculate_exceedance_probabilities(scmrun, threshold, process_over_cols, output_name=None)[source]

Calculate exceedance probability over all time

Parameters
  • scmrun (scmdata.ScmRun) – Ensemble of which to calculate the exceedance probability

  • threshold (float) – Value to use as the threshold for exceedance

  • process_over_cols (list[str]) – Columns to not use when grouping the timeseries (typically “run_id” or “ensemble_member” or similar)

  • output_name (str) – If supplied, the name of the output series. If not supplied, “{threshold} exceedance probability” will be used.

Returns

Exceedance probability over all time over all members of each group in scmrun

Return type

pd.Series

Raises

ValueErrorscmrun has more than one variable or more than one unit (convert to a single unit before calling this function if needed)

Notes

See the notes of scmdata.processing.calculate_exceedance_probabilities_over_time() for an explanation of how the two calculations differ. For most purposes, this is the correct function to use.

scmdata.processing.calculate_exceedance_probabilities_over_time(scmrun, threshold, process_over_cols, output_name=None)[source]

Calculate exceedance probability at each point in time

Parameters
  • scmrun (scmdata.ScmRun) – Ensemble of which to calculate the exceedance probability over time

  • threshold (float) – Value to use as the threshold for exceedance

  • process_over_cols (list[str]) – Columns to not use when grouping the timeseries (typically “run_id” or “ensemble_member” or similar)

  • output_name (str) – If supplied, the value to put in the “variable” columns of the output pd.DataFrame. If not supplied, “{threshold} exceedance probability” will be used.

Returns

Timeseries of exceedance probability over time

Return type

pd.DataFrame

Raises

ValueErrorscmrun has more than one variable or more than one unit (convert to a single unit before calling this function if needed)

Notes

This differs from scmdata.processing.calculate_exceedance_probabilities() because it calculates the exceedance probability at each point in time. That is different from calculating the exceedance probability by first determining the number of ensemble members which cross the threshold at any point in time and then dividing by the number of ensemble members. In general, this function will produce a maximum exceedance probability which is equal to or less than the output of scmdata.processing.calculate_exceedance_probabilities(). In our opinion, scmdata.processing.calculate_exceedance_probabilities() is the correct function to use if you want to know the exceedance probability of a scenario. This function gives a sense of how the exceedance probability evolves over time but, as we said, will generally slightly underestimate the exceedance probability over all time.

scmdata.processing.calculate_peak(scmrun, output_name=None)[source]

Calculate peak i.e. maximum of each timeseries

Parameters
  • scmrun (scmdata.ScmRun) – Ensemble of which to calculate the exceedance probability over time

  • output_name (str) – If supplied, the value to put in the “variable” columns of the output series. If not supplied, “Peak {variable}” will be used.

Returns

Peak of each timeseries

Return type

pd.Series

scmdata.processing.calculate_peak_time(scmrun, output_name=None, return_year=True)[source]

Calculate peak time i.e. the time at which each timeseries reaches its maximum

Parameters
  • scmrun (scmdata.ScmRun) – Ensemble of which to calculate the exceedance probability over time

  • output_name (str) – If supplied, the value to put in the “variable” columns of the output series. If not supplied, “Peak {variable}” will be used.

  • return_year (bool) – If True, return the year instead of the datetime

Returns

Peak of each timeseries

Return type

pd.Series

scmdata.processing.calculate_summary_stats(scmrun, index, exceedance_probabilities_thresholds=(1.5, 2.0, 2.5), exceedance_probabilities_variable='Surface Air Temperature Change', exceedance_probabilities_naming_base=None, peak_quantiles=(0.05, 0.17, 0.5, 0.83, 0.95), peak_variable='Surface Air Temperature Change', peak_naming_base=None, peak_time_naming_base=None, peak_return_year=True, categorisation_variable='Surface Air Temperature Change', categorisation_quantile_cols=('ensemble_member',), progress=False)[source]

Calculate common summary statistics

Parameters
  • scmrun (scmdata.ScmRun) – Data of which to calculate the stats

  • index (list[str]) – Columns to use in the index of the output (unit is added if not included)

  • exceedance_probabilities_threshold (list[float]) – Thresholds to use for exceedance probabilities

  • exceedance_probabilities_variable (str) – Variable to use for exceedance probability calculations

  • exceedance_probabilities_naming_base (str) – String to use as the base for naming the exceedance probabilities. Each exceedance probability output column will have a name given by exceedance_probabilities_naming_base.format(threshold) where threshold is the exceedance probability threshold to use. If not supplied, the default output of scmdata.processing.calculate_exceedance_probabilities() will be used.

  • peak_quantiles (list[float]) – Quantiles to report in peak calculations

  • peak_variable (str) – Variable of which to calculate the peak

  • peak_naming_base (str) – Base to use for naming the peak outputs. This is combined with the quantile. If not supplied, "{} peak" is used so the outputs will be named e.g. “0.05 peak”, “0.5 peak”, “0.95 peak”.

  • peak_time_naming_base (str) – Base to use for naming the peak time outputs. This is combined with the quantile. If not supplied, "{} peak year" is used (unless peak_return_year is False in which case "{} peak time" is used) so the outputs will be named e.g. “0.05 peak year”, “0.5 peak year”, “0.95 peak year”.

  • peak_return_year (bool) – If True, return the year of the peak of peak_variable, otherwise return full dates

  • categorisation_variable (str) – Variable to use for categorisation. Note that this variable point to timeseries that contain global-mean surface air temperatures (GSAT) relative to 1850-1900 (using another reference period will not break this function, but is inconsistent with the original algorithm).

  • categorisation_quantile_cols (list[str]) – Columns which represent individual ensemble members in the output (e.g. [“ensemble_member”]). The quantiles are taking over these columns before the data is passed to scmdata.processing.categorisation_sr15().

  • progress (bool) – Should a progress bar be shown whilst the calculations are done?

Returns

Summary statistics, with each column being a statistic and the index being given by index

Return type

pd.DataFrame

scmdata.processing.categorisation_sr15(scmrun, index)[source]

Categorise using the algorithm employed in SR1.5

For more information, see the SR1.5 scenario analysis notebook.

Parameters
  • scmrun – Data to use for the classification. This should contain global-mean surface air temperatures (GSAT) relative to 1850-1900 (using another reference period will not break this function, but is inconsistent with the original algorithm). The data must have a “quantile” column and it must have the 0.33, 0.5 and 0.66 quantiles calculated. This can be done with scmdata.ScmRun.quantiles_over().

  • index (list[str]) – Columns in scmrun.meta to use as the index of the output

Returns

Categorisation of the timeseries

Return type

class: pd.Series

Raises
  • ValueError – More than one variable or one unit is in scmrun

  • DimensionalityError – The units cannot be converted to kelvin