scmdata.run

ScmRun provides a high level analysis tool for simple climate model relevant data

It provides a simple interface for reading/writing, subsetting and visualising model data. ScmRuns are able to hold multiple model runs which aids in analysis of ensembles of model runs.

BaseScmRun

class BaseScmRun(data=None, index=None, columns=None, metadata=None, copy_data=False, **kwargs)[source]

Bases: OpsMixin

Base class of a data container for timeseries data

add(other, op_cols, **kwargs)

Add values

Parameters:
  • other (ScmRun) – ScmRun containing data to add

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before adding as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the addition will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Sum of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0
>>> fos_plus_afolu = fos.add(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil + AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_plus_afolu.head()
time                                                                  2010-01-01  2020-01-01
model     region   scenario  unit       variable
idealised World|NH idealised gigatC / a Emissions|CO2|Fossil + AFOLU       0.001       4.005
          World|SH idealised gigatC / a Emissions|CO2|Fossil + AFOLU       2.003       6.007
>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_plus_sh = nh.add(sh, op_cols={"region": "World|NH + SH"})
>>> nh_plus_sh.head()
time                                                               2010-01-01  2020-01-01
model     region        scenario  unit       variable
idealised World|NH + SH idealised gigatC / a Emissions|CO2|Fossil         2.0        10.0
                                  megatC / a Emissions|CO2|AFOLU          4.0        12.0
adjust_median_to_target(target, evaluation_period, process_over=None, check_groups_identical=False, check_groups_identical_kwargs=None)

Adjust the median of (an ensemble of) timeseries to a specified target

Parameters:
  • target (float) – Value to which the median of each (group of) timeseries should be adjusted

  • evaluation_period (list[int]) – Period over which the median should be evaluated

  • process_over (list) – Metadata to treat as ‘ensemble members’ i.e. all other columns in the metadata of self will be used to group the timeseries before calculating the median. If not supplied, timeseries will not be grouped.

  • check_groups_identical (bool) – Should we check that the median of each group is the same before making the adjustment?

  • check_groups_identical_kwargs (dict) – Only used if check_groups_identical is True, in which case these are passed through to np.testing.assert_allclose

Raises:
  • NotImplementedErrorevaluation_period is based on times not years

  • AssertionError – If check_groups_identical is True and the median of each group is not the same before making the adjustment.

Returns:

ScmRun – Timeseries adjusted to have the intended median

append(other, inplace=False, duplicate_msg=True, metadata=None, **kwargs)[source]

Append additional data to the current data.

For details, see run_append().

Parameters:
  • other (GenericRun) –

    Data (in format which can be cast to ScmRun) to

    append.

  • inplace (bool) – If True, append data in place, modifying the current object. Otherwise, a new ScmRun instance is created.

  • duplicate_msg (str | bool) – If True, raise a scmdata.errors.NonUniqueMetadataError error so the user can see the duplicate timeseries. If False, take the average and do not raise a warning or error. If "warn", raise a warning if duplicate data is detected.

  • metadata (MetadataType | None) – If not None, override the metadata of the resulting ScmRun with metadata. Otherwise, the metadata for the runs are merged. In the case where there are duplicate metadata keys, the values from the first run are used.

  • **kwargs (Any) – Keywords to pass to ScmRun.__init__() when reading other

Returns:

ScmRun – Object containing the results of appending the timeseries in other.

Raises:

NonUniqueMetadataError – If the appending results in timeseries with duplicate metadata and duplicate_msg is True

append_timewise(other, align_columns)[source]

Append timeseries along the time axis

Parameters:
  • other (scmdata.ScmRun) – scmdata.ScmRun containing the timeseries to append

  • align_columns (list) – Columns used to align other and self when joining

Returns:

scmdata.ScmRun – Result of joining self and other along the time axis

apply(func, *args, **kwargs)[source]

Apply a function to each timeseries and append the results

func is called like func(ar, *args, **kwargs) for each ScmRun ar in this group. If the result of this function call is None, than it is excluded from the results.

The results are appended together using run_append(). The function can change the size of the input ScmRun as long as run_append() can be applied to all results.

Examples

>>> from scmdata import ScmRun
>>> def multiply_by_2(arr):
...     variable = arr.get_unique_meta("variable", True)
...     if variable == "Surface Temperature":
...         return arr * 2
...     return arr
...

>>> run = ScmRun(
...     data=[[1, 2], [3, 4]],
...     index=[2010, 2020],
...     columns={
...         "variable": ["Surface Temperature", "Carbon Uptake"],
...         "model": "model",
...         "scenario": "scenario",
...         "region": "World",
...         "unit": ["K", "GtC / yr"],
...     },
... )
>>> run.timeseries().sort_index()
time                                                2010-01-01  2020-01-01
model region scenario unit     variable
model World  scenario GtC / yr Carbon Uptake               2.0         4.0
                      K        Surface Temperature         1.0         3.0

>>> run.apply(multiply_by_2).timeseries().sort_index()
time                                                2010-01-01  2020-01-01
model region scenario unit     variable
model World  scenario GtC / yr Carbon Uptake               2.0         4.0
                      K        Surface Temperature         2.0         6.0
Parameters:
  • func (function) – Callable to apply to each timeseries.

  • *args (P.args) – Positional arguments passed to func.

  • **kwargs (P.kwargs) – Used to call func(ar, **kwargs) for each array ar.

Returns:

applied (ScmRun) – The result of splitting, applying and combining this array.

convert_unit(unit, context=None, inplace=False, **kwargs)[source]

Convert the units of a selection of timeseries.

Uses scmdata.units.UnitConverter to perform the conversion.

Parameters:
  • unit (str) – Unit to convert to. This must be recognised by UnitConverter.

  • context (str | None) – Context to use for the conversion i.e. which metric to apply when performing CO2-equivalent calculations. If None, no metric will be applied and CO2-equivalent calculations will raise DimensionalityError.

  • inplace (bool) – If True, apply the conversion inplace, otherwise a copy is performed.

  • **kwargs (Any) – Extra arguments which are passed to filter() to limit the timeseries which are attempted to be converted. Defaults to selecting the entire ScmRun, which will likely fail.

Returns:

ScmRun – A ScmRun object containing converted units.

Notes

If context is not None, then the context used for the conversion will be checked against any existing metadata and, if the conversion is valid, stored in the output’s metadata.

Raises:

ValueError"unit_context" is already included in self’s meta_attributes() and it does not match context for the variables to be converted.

copy()[source]

Return a copy.deepcopy() of self.

Also creates copies the underlying Timeseries data

Returns:

ScmRuncopy.deepcopy() of self

cumsum(out_var=None, check_annual=True)

Integrate with respect to time using a cumulative sum

This method should be used when dealing with piecewise-constant timeseries ( such as annual emissions) or step functions. In the case of annual emissions, each timestep represents a total flux over a whole year, rather than an average value or point in time estimate. When integrating, one can sum up each individual year to get the cumulative total, rather than using an alternative method for numerical integration, such as the trapizoidal rule which assumes that the values change linearly between timesteps.

This method requires data to be on uniform annual intervals. scmdata.run.ScmRun.resample() can be used to resample the data onto annual timesteps.

The output timesteps are the same as the timesteps of the input, but since the input timeseries are piecewise constant (i.e. a constant for a given year), the output can also be thought of as being a sum up to and including the last day of a given year. The functionality to modify the output timesteps to an arbitrary day/month of the year has not been implemented, if that would be useful raise an issue on GitHub.

If the timeseries are piecewise-linear, cumtrapz() should be used instead.

Parameters:
  • out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.

  • check_annual (bool) – If True (default), check that the timeseries are on uniform annual intervals.

Returns:

scmdata.ScmRunscmdata.ScmRun containing the integral of self with respect to time

See also

cumtrapz()

Raises:

ValueError – If an unknown method is provided Failed unit conversion Non-annual timeseries and check_annual is True

Warns:

UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.

cumtrapz(out_var=None)

Integrate with respect to time using the trapezoid rule

This method should be used when dealing with piecewise-linear timeseries ( Concentrations, Effective Radiative Forcing, decadal means etc). This method handles non-uniform intervals without having to resample to annual values first.

The result will contain the same timesteps as the input timeseries, with the first timestep being zero. Each subsequent value represents the integral up to the day and time of the timestep. The function scmdata.run.ScmRun.relative_to_ref_period() can be used to calculate an integral relative to a reference year.

Parameters:

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.

Returns:

scmdata.ScmRunscmdata.ScmRun containing the integral of self with respect to time

See also

cumsum()

Raises:

ValueError – If an unknown method is provided Failed unit conversion

Warns:

UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.

data_hierarchy_separator = '|'

String used to define different levels in our data hierarchies.

By default we follow pyam and use “|”. In such a case, emissions of CO2 for energy from coal would be “Emissions|CO2|Energy|Coal”.

Type:

str

delta_per_delta_time(out_var=None)

Calculate change in timeseries values for each timestep, divided by the size of the timestep

The output is placed on the middle of each timestep and is one timestep shorter than the input.

Parameters:

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Delta ” .

Returns:

scmdata.ScmRunscmdata.ScmRun containing the changes in values of self, normalised by the change in time

Warns:

UserWarning – The data contains nans. If this happens, the output data will also contain nans.

divide(other, op_cols, **kwargs)

Divide values (self / other)

Parameters:
  • other (ScmRun) – ScmRun containing data to divide

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before dividing as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the division will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Quotient of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0
>>> fos_divide_afolu = fos.divide(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil / AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_divide_afolu.convert_unit("dimensionless").head()
time                                                                     2010-01-01  2020-01-01
model     region   scenario  unit          variable
idealised World|NH idealised dimensionless Emissions|CO2|Fossil / AFOLU    0.000000  800.000000
          World|SH idealised dimensionless Emissions|CO2|Fossil / AFOLU  666.666667  857.142857
>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_divide_sh = nh.divide(sh, op_cols={"region": "World|NH / SH"})
>>> nh_divide_sh.convert_unit("dimensionless").head()
time                                                                  2010-01-01  2020-01-01
model     region        scenario  unit          variable
idealised World|NH / SH idealised dimensionless Emissions|CO2|Fossil    0.000000    0.666667
                                                Emissions|CO2|AFOLU     0.333333    0.714286
drop_meta(columns, inplace=False)[source]

Drop meta columns out of the Run

Parameters:
  • columns (Iterable[str] | str) – The column or columns to drop

  • inplace (bool) – If True, do operation inplace, otherwise a copy is performed.

Raises:

KeyError – If any of the columns do not exist in the meta DataFrame

Returns:

Self – Object without the specified meta columns.

property empty: bool

Indicate whether ScmRun is empty i.e. contains no data

Returns:

bool – If ScmRun is empty, return True, if not return False

filter(*, keep=True, inplace=False, log_if_empty=True, **kwargs)[source]

Return a filtered ScmRun (i.e., a subset of the data).

>>> from scmdata import ScmRun
>>> df = ScmRun(
...     data=[[1, 2, 3], [4, 5, 6], [3, 3, 1]],
...     index=[2005, 2010, 2015],
...     columns={
...         "model": "a_iam",
...         "scenario": ["a_scenario", "a_scenario", "a_scenario2"],
...         "region": "World",
...         "variable": [
...             "Primary Energy",
...             "Primary Energy|Coal",
...             "Primary Energy",
...         ],
...         "unit": "EJ/yr",
...     },
... )
>>> df
<ScmRun (timeseries: 3, timepoints: 3)>
Time:
    Start: 2005-01-01T00:00:00
    End: 2015-01-01T00:00:00
Meta:
       model region     scenario   unit             variable
    0  a_iam  World   a_scenario  EJ/yr       Primary Energy
    1  a_iam  World   a_scenario  EJ/yr  Primary Energy|Coal
    2  a_iam  World  a_scenario2  EJ/yr       Primary Energy

>>> df.filter(scenario="a_scenario")
<ScmRun (timeseries: 2, timepoints: 3)>
Time:
    Start: 2005-01-01T00:00:00
    End: 2015-01-01T00:00:00
Meta:
       model region    scenario   unit             variable
    0  a_iam  World  a_scenario  EJ/yr       Primary Energy
    1  a_iam  World  a_scenario  EJ/yr  Primary Energy|Coal

>>> df.filter(scenario="a_scenario", keep=False)
<ScmRun (timeseries: 1, timepoints: 3)>
Time:
    Start: 2005-01-01T00:00:00
    End: 2015-01-01T00:00:00
Meta:
       model region     scenario   unit        variable
    2  a_iam  World  a_scenario2  EJ/yr  Primary Energy

>>> df.filter(level=1)
<ScmRun (timeseries: 1, timepoints: 3)>
Time:
    Start: 2005-01-01T00:00:00
    End: 2015-01-01T00:00:00
Meta:
       model region    scenario   unit             variable
    1  a_iam  World  a_scenario  EJ/yr  Primary Energy|Coal

>>> df.filter(year=range(2000, 2011))
<ScmRun (timeseries: 3, timepoints: 2)>
Time:
    Start: 2005-01-01T00:00:00
    End: 2010-01-01T00:00:00
Meta:
       model region     scenario   unit             variable
    0  a_iam  World   a_scenario  EJ/yr       Primary Energy
    1  a_iam  World   a_scenario  EJ/yr  Primary Energy|Coal
    2  a_iam  World  a_scenario2  EJ/yr       Primary Energy
Parameters:
  • keep (bool) – If True, keep all timeseries satisfying the filters, otherwise drop all the timeseries satisfying the filters

  • inplace (bool) – If True, do operation inplace, otherwise a copy is performed.

  • log_if_empty (bool) – If True, log a warning level message if the result is empty.

  • **kwargs (MetadataValue | Iterable[MetadataValue]) –

    Argument names are keys with which to filter, values are used to do the filtering. Filtering can be done on:

    • all metadata columns with strings, “*” can be used as a wildcard in search strings

    • ’level’: the maximum “depth” of IAM variables (number of hierarchy levels, excluding the strings given in the ‘variable’ argument)

    • ’time’: takes a datetime.datetime or list of datetime.datetime’s TODO: default to np.datetime64

    • ’year’, ‘month’, ‘day’, hour’: takes an int or list of int’s (‘month’ and ‘day’ also accept str or list of str)

    If regexp=True is included in kwargs then the pseudo-regexp syntax in pattern_match() is disabled.

Returns:

ScmRun – Object containing a filtered subset of timeseries.

classmethod from_nc()

Read a netCDF4 file from disk

Parameters:

fname (str) – Filename to read

Return type:

BaseScmRun

See also

scmdata.run.ScmRun.from_nc()

get_meta_columns_except(*not_group)[source]

Get columns in meta except a set

Parameters:

not_group (str or list of str) – Columns to exclude from the grouping

Returns:

list – Meta columns except the ones supplied (sorted alphabetically)

get_unique_meta(meta, no_duplicates=False)[source]

Get unique values in a metadata column.

Parameters:
  • meta (str) – Column to retrieve metadata for

  • no_duplicates (bool | None) – Should I raise an error if there is more than one unique value in the metadata column?

Raises:
  • ValueError – There is more than one unique value in the metadata column and no_duplicates is True.

  • KeyError – If a meta column does not exist in the run’s metadata

Returns:

[List[Any], Any] – List of unique metadata values. If no_duplicates is True the metadata value will be returned (rather than a list).

groupby(*group)[source]

Group the object by unique metadata

Enables iteration over groups of data. For example, to iterate over each scenario in the object

>>> from scmdata import ScmRun
>>> run = ScmRun(
...     data=[[1, 2, 3], [4, 5, 6], [3, 3, 1]],
...     index=[2005, 2010, 2015],
...     columns={
...         "model": "a_iam",
...         "scenario": ["a_scenario", "a_scenario", "a_scenario2"],
...         "region": "World",
...         "variable": [
...             "Primary Energy",
...             "Primary Energy|Coal",
...             "Primary Energy",
...         ],
...         "unit": "EJ/yr",
...     },
... )

>>> for group in run.groupby("scenario"):
...     print(group)
...
<ScmRun (timeseries: 2, timepoints: 3)>
Time:
    Start: 2005-01-01T00:00:00
    End: 2015-01-01T00:00:00
Meta:
       model region    scenario   unit             variable
    0  a_iam  World  a_scenario  EJ/yr       Primary Energy
    1  a_iam  World  a_scenario  EJ/yr  Primary Energy|Coal
<ScmRun (timeseries: 1, timepoints: 3)>
Time:
    Start: 2005-01-01T00:00:00
    End: 2015-01-01T00:00:00
Meta:
       model region     scenario   unit        variable
    2  a_iam  World  a_scenario2  EJ/yr  Primary Energy
Parameters:

group (str or list of str) – Columns to group by

Returns:

RunGroupBy – See the documentation for RunGroupBy for more information

groupby_all_except(*not_group)[source]

Group the object by unique metadata apart from the input columns

In other words, the groups are determined by all columns in self.meta except for those in not_group

Parameters:

not_group (str or list of str) – Columns to exclude from the grouping

Returns:

RunGroupBy – See the documentation for RunGroupBy for more information

head(*args, **kwargs)[source]

Return head of self.timeseries().

Parameters:
  • *args (typing.Any) – Passed to self.timeseries().head()

  • **kwargs (typing.Any) – Passed to self.timeseries().head()

Returns:

pandas.DataFrame – Tail of self.timeseries()

integrate(out_var=None)

Integrate with respect to time

This function has been deprecated since the method of integration depends on the type of data being integrated.

Parameters:

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.

Returns:

scmdata.ScmRunscmdata.ScmRun containing the integral of self with respect to time

See also

cumsum(), cumtrapz()

Raises:

ValueError – If an unknown method is provided Failed unit conversion

Warns:
  • UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.

  • DeprecationWarning – This function has been deprecated in preference to cumsum() and cumtrapz().

interpolate(target_times, interpolation_type='linear', extrapolation_type='linear', uniform_year_length=False)[source]

Interpolate the data onto a new time frame.

Parameters:
  • target_times (Iterable[dt.datetime | (dt.date | (int | float))]) – Time grid onto which to interpolate

  • interpolation_type (str) – Interpolation type. Options are ‘linear’

  • extrapolation_type (str or None) – Extrapolation type. Options are None, ‘linear’ or ‘constant’

  • uniform_year_length (bool) –

    If True, a 365-day calendar is assumed where each year has an equal length

    By default, the interpolation takes into account the different number of days in leap years.

Raises:

ValueError – If uniform_year_length=True and sub-annual timeseries are present

Returns:

ScmRun – A new ScmRun containing the data interpolated onto the target_times grid

line_plot(**kwargs)

Make a line plot

Deprecated: use lineplot() instead

Parameters:

**kwargs – Keyword arguments to be passed to seaborn.lineplot. If none are passed, sensible defaults will be used.

Returns:

matplotlib.axes._subplots.AxesSubplot – Output of call to seaborn.lineplot

linear_regression()

Calculate linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Returns:

list of dict[str (Any]) – List of dictionaries. Each dictionary contains the metadata for the timeseries plus the gradient (with key "gradient") and intercept ( with key "intercept"). The gradient and intercept are stored as pint.Quantity.

linear_regression_gradient(unit=None)

Calculate gradients of a linear regression of each timeseries

Parameters:

unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.

Returns:

pandas.DataFrameself.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.

linear_regression_intercept(unit=None)

Calculate intercepts of a linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Parameters:

unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.

Returns:

pandas.DataFrameself.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.

linear_regression_scmrun()

Re-calculate the timeseries based on a linear regression

Returns:

scmdata.ScmRun – The timeseries, re-calculated based on a linear regression

lineplot(time_axis=None, **kwargs)

Make a line plot via `seaborn’s lineplot

See seaborn documentation for a complete description of the kwargs <https://seaborn.pydata.org/generated/seaborn.lineplot.html>`_

If only a single unit is present, it will be used as the y-axis label. The axis object is returned so this can be changed by the user if desired.

Parameters:
  • time_axis ({None, "year", "year-month", "days since 1970-01-01", "seconds since 1970-01-01"}) –

    Time axis to use for the plot.

    If None, datetime.datetime objects will be used.

    If "year", the year of each time point will be used.

    If "year-month", the year plus (month - 0.5) / 12 will be used.

    If "days since 1970-01-01", the number of days since 1st Jan 1970 will be used (calculated using the datetime module).

    If "seconds since 1970-01-01", the number of seconds since 1st Jan 1970 will be used (calculated using the datetime module).

  • **kwargs – Keyword arguments to be passed to seaborn.lineplot. If none are passed, sensible defaults will be used.

Returns:

matplotlib.axes._subplots.AxesSubplot – Output of call to seaborn.lineplot

long_data(time_axis=None)[source]

Return data in long form, particularly useful for plotting with seaborn

Parameters:

time_axis ({None, "year", "year-month", "days since 1970-01-01", "seconds since 1970-01-01"}) –

Time axis to use for the output’s columns.

If None, datetime.datetime objects will be used.

If "year", the year of each time point will be used.

If "year-month", the year plus (month - 0.5) / 12 will be used.

If "days since 1970-01-01", the number of days since 1st Jan 1970 will be used (calculated using the datetime module).

If "seconds since 1970-01-01", the number of seconds since 1st Jan 1970 will be used (calculated using the datetime module).

Returns:

pandas.DataFramepandas.DataFrame containing the data in ‘long form’ (i.e. one observation per row).

property meta: DataFrame

Metadata

property meta_attributes

Get a list of all meta keys

Returns:

list – Sorted list of meta keys

multiply(other, op_cols, **kwargs)

Multiply values

Parameters:
  • other (ScmRun) – ScmRun containing data to multiply

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before multiplying as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the multiplication will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Product of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0
>>> fos_times_afolu = fos.multiply(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil * AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_times_afolu.convert_unit("(GtC / yr) ** 2").head()
time                                                                       2010-01-01  2020-01-01
model     region   scenario  unit            variable
idealised World|NH idealised (GtC / yr) ** 2 Emissions|CO2|Fossil * AFOLU       0.000       0.020
          World|SH idealised (GtC / yr) ** 2 Emissions|CO2|Fossil * AFOLU       0.006       0.042
>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_times_sh = nh.multiply(sh, op_cols={"region": "World|NH * SH"})
>>> nh_times_sh.convert_unit("(GtC / yr) ** 2").head()
time                                                                    2010-01-01  2020-01-01
model     region        scenario  unit            variable
idealised World|NH * SH idealised (GtC / yr) ** 2 Emissions|CO2|Fossil    0.000000   24.000000
                                                  Emissions|CO2|AFOLU     0.000003    0.000035
plumeplot(ax=None, quantiles_plumes=(((0.05, 0.95), 0.5), ((0.5,), 1.0)), hue_var='scenario', hue_label='Scenario', palette=None, style_var='variable', style_label='Variable', dashes=None, linewidth=2, time_axis=None, pre_calculated=False, quantile_over=('ensemble_member',))

Make a plume plot, showing plumes for custom quantiles

Parameters:
  • ax (matplotlib.axes._subplots.AxesSubplot) – Axes on which to make the plot

  • quantiles_plumes (list[tuple[tuple, float]]) – Configuration to use when plotting quantiles. Each element is a tuple, the first element of which is itself a tuple and the second element of which is the alpha to use for the quantile. If the first element has length two, these two elements are the quantiles to plot and a plume will be made between these two quantiles. If the first element has length one, then a line will be plotted to represent this quantile.

  • hue_var (str) – The column of self.meta which should be used to distinguish different hues.

  • hue_label (str) – Label to use in the legend for hue_var.

  • palette (dict) – Dictionary defining the colour to use for different values of hue_var.

  • style_var (str) – The column of self.meta which should be used to distinguish different styles.

  • style_label (str) – Label to use in the legend for style_var.

  • dashes (dict) – Dictionary defining the style to use for different values of style_var.

  • linewidth (float) – Width of lines to use (for quantiles which are not to be shown as plumes)

  • time_axis (str) – Time axis to use for the plot (see timeseries())

  • pre_calculated (bool) – Are the quantiles pre-calculated? If no, the quantiles will be calculated within this function. Pre-calculating the quantiles using ScmRun.quantiles_over() can lead to faster plotting if multiple plots are to be made with the same quantiles.

  • quantile_over (str, tuple[str]) – Columns of self.meta over which the quantiles should be calculated. Only used if pre_calculated is False.

Returns:

matplotlib.axes._subplots.AxesSubplot, list – Axes on which the plot was made and the legend items we have made (in case the user wants to move the legend to a different position for example)

Examples

>>> from scmdata import ScmRun
>>> scmrun = ScmRun(
...     data=np.random.random((10, 3)).T,
...     columns={
...         "model": ["a_iam"],
...         "climate_model": ["a_model"] * 5 + ["a_model_2"] * 5,
...         "scenario": ["a_scenario"] * 5 + ["a_scenario_2"] * 5,
...         "ensemble_member": list(range(5)) + list(range(5)),
...         "region": ["World"],
...         "variable": ["Surface Air Temperature Change"],
...         "unit": ["K"],
...     },
...     index=[2005, 2010, 2015],
... )

Plot the plumes, calculated over the different ensemble members.

>>> scmrun.plumeplot(quantile_over="ensemble_member")  
(<Axes: ylabel='K'>, ...)

Pre-calculate the quantiles, then plot

>>> summary_stats = ScmRun(
...     scmrun.quantiles_over("ensemble_member", quantiles=[0.05, 0.5, 0.95])
... )
>>> summary_stats.plumeplot(pre_calculated=True)  
(<Axes: ylabel='K'>, ...)

Note

scmdata is not a plotting library so this function is provided as is, with little testing. In some ways, it is more intended as inspiration for other users than as a robust plotting tool.

process_over(cols, operation, na_override=-1000000.0, op_cols=None, as_run=False, **kwargs)[source]

Process the data over the input columns.

Parameters:
  • cols (str | list[str]) – Columns to perform the operation on. The timeseries will be grouped by all other columns in meta.

  • operation (str or func) –

    The operation to perform.

    If a string is provided, the equivalent pandas groupby function is used. Note that not all groupby functions are available as some do not make sense for this particular application. Additional information about the arguments for the pandas groupby functions can be found at <https://pandas.pydata.org/pan das-docs/stable/reference/groupby.html>`_.

    If a function is provided, it will be applied to each group. The function must take a dataframe as its first argument and return a DataFrame, Series or scalar.

    Note that quantile means the value of the data at a given point in the cumulative distribution of values at each point in the timeseries, for each timeseries once the groupby is applied. As a result, using q=0.5 is the same as taking the median and not the same as taking the mean/average.

  • na_override ([int, float]) –

    Convert any nan value in the timeseries meta to this value during processsing. The meta values converted back to nan’s before the run is returned. This should not need to be changed unless the existing metadata clashes with the default na_override value.

    This functionality is disabled if na_override is None, but may result in incorrect results if the timeseries meta includes any nan’s.

  • op_cols (dict of str: str) –

    Dictionary containing any columns that should be overridden after processing.

    If a required column from scmdata.ScmRun is specified in cols and as_run=True, an override must be provided for that column in op_cols otherwise the conversion to scmdata.ScmRun will fail.

  • as_run (bool or subclass of BaseScmRun) –

    If True, return the resulting timeseries as an scmdata.ScmRun object, otherwise if False, a pandas.DataFrame`or :class:`pandas.Series is returned (depending on the nature of the operation). Some operations may not be able to be converted to a scmdata.ScmRun. For example if the operation returns scalar values rather than timeseries.

    If a class is provided, the return value will be cast to this class.

  • **kwargs (Any) – Keyword arguments to pass operation (or the pandas operation if operation is a string)

Returns:

pandas.DataFrame or pandas.Series or scmdata.ScmRun – The result of operation, grouped by all columns in meta other than cols

Raises:
  • ValueError – If the operation is not an allowed operation If the value of na_override clashes with any existing metadata If operation produces a pandas.Series, but as_run` is True If as_run is not True, False or a subclass of scmdata.run.BaseScmRun

  • scmdata.errors.MissingRequiredColumnError – If as_run is not False and the result does not have the required metadata to convert to an :class`ScmRun <scmdata.ScmRun>`. This can be resolved by specifying additional metadata via op_cols

quantiles_over(cols, quantiles, **kwargs)[source]

Calculate quantiles of the data over the input columns.

Parameters:
  • cols (str | list[str]) – Columns to perform the operation on. The timeseries will be grouped by all other columns in meta.

  • quantiles (str | list[float]) – The quantiles to calculate. This should be a list of quantiles to calculate (quantile values between 0 and 1). quantiles can also include the strings “median” or “mean” if these values are to be calculated.

  • **kwargs (Any) – Passed to process_over().

Returns:

pandas.DataFrame – The quantiles of the timeseries, grouped by all columns in meta other than cols. Each calculated quantile is given a label which is stored in the quantile column within the output index.

Raises:

TypeErroroperation is included in kwargs. The operation is inferred from quantiles.

reduce(func, dim=None, axis=None, **kwargs)[source]

Apply a function along a given axis

This is to provide the GroupBy functionality in ScmRun.groupby() and is not generally called directly.

This implementation is very bare-bones - no reduction along the time time dimension is allowed and only the dim parameter is used.

Parameters:
  • func (function) –

  • dim (str) – Ignored

  • axis (int) – The dimension along which the function is applied. The only valid value is 0 which corresponds to the along the time-series dimension.

  • kwargs – Other parameters passed to func

Returns:

ScmRun

Raises:
relative_to_ref_period_mean(append_str=None, **kwargs)[source]

Return the timeseries relative to a given reference period mean.

The reference period mean is subtracted from all values in the input timeseries.

Parameters:
  • append_str – Deprecated

  • **kwargs – Arguments to pass to filter() to determine the data to be included in the reference time period. See the docs of filter() for valid options.

Returns:

ScmRun – New object containing the timeseries, adjusted to the reference period mean. The reference period year bounds are stored in the meta columns "reference_period_start_year" and "reference_period_end_year".

Raises:

NotImplementedErrorappend_str is not None

required_cols: tuple[str, ...] = ('variable', 'unit')

Required metadata columns

This is the bare minimum columns which are expected. Attempting to create a run without the metadata columns specified by required_cols will raise a MissingRequiredColumnError

resample(rule='AS', **kwargs)[source]

Resample the time index of the timeseries data onto a custom grid.

This helper function allows for values to be easily interpolated onto annual or monthly timesteps using the rules=’AS’ or ‘MS’ respectively. Internally, the interpolate function performs the regridding.

Parameters:
  • rule (str) – See the pandas user guide for a list of options. Note that Business-related offsets such as “BusinessDay” are not supported.

  • **kwargs (typing.Any) – Other arguments to pass through to interpolate()

Returns:

ScmRun – New ScmRun instance on a new time index

Examples

Resample a run to annual values

>>> scm_df = ScmRun(
...     pd.Series([1, 2, 10], index=(2000, 2001, 2009)),
...     columns={
...         "model": ["a_iam"],
...         "scenario": ["a_scenario"],
...         "region": ["World"],
...         "variable": ["Primary Energy"],
...         "unit": ["EJ/y"],
...     },
... )
>>> scm_df.timeseries().T  
model               a_iam
region              World
scenario       a_scenario
unit                 EJ/y
variable   Primary Energy
time
2000-01-01            1.0
2001-01-01            2.0
2009-01-01           10.0

An annual timeseries can be the created by interpolating to the start of years using the rule ‘AS’.

>>> res = scm_df.resample("AS")
>>> res.timeseries().T
model               a_iam
region              World
scenario       a_scenario
unit                 EJ/y
variable   Primary Energy
time
2000-01-01       1.000000
2001-01-01       2.000000
2002-01-01       2.999316
2003-01-01       3.998631
2004-01-01       4.997947
2005-01-01       6.000000
2006-01-01       6.999316
2007-01-01       7.998631
2008-01-01       8.997947
2009-01-01      10.000000
>>> m_df = scm_df.resample("MS")
>>> m_df.timeseries().T  
model               a_iam
region              World
scenario       a_scenario
unit                 EJ/y
variable   Primary Energy
time
2000-01-01       1.000000
2000-02-01       1.084699
2000-03-01       1.163934
...

Note that the values do not fall exactly on integer values as not all years are exactly the same length.

References

See the pandas documentation for resample <http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas. Series.resample.html> for more information about possible arguments.

round(decimals=3, inplace=False)[source]

Round data to a given number of decimal places.

For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc.

Parameters:
  • decimals (int) – Number of decimal places to round each value to.

  • inplace (bool) – If True, apply the conversion inplace, otherwise a copy is performed.

Returns:

ScmRunScmRun containing the rounded values.

set_meta(dimension, value, **filter_kwargs)[source]

Update metadata

Optionally, a subset of metadata may be modified through the use of additional filter_kwargs which are passed to filter(). The metadata associated with the non-filtered timeseries are not modified.

This method does not preserve the order of the timeseries.

Parameters:
  • dimension (str) – Dimension of meta to update

  • value (Any) – Value to set the targeted meta to

  • filter_kwargs (Any) –

    Arguments used to filter which timeseries are updated

    All the filtering functionality of filter() is available, except for “inplace”.

See also

filter()

Returns:

BaseScmRun – A new instance with the updated metadata.

property shape: tuple[int, int]

Get the shape of the underlying data as (num_timeseries, num_timesteps)

Returns:

tuple of int

subtract(other, op_cols, **kwargs)

Subtract values

Parameters:
  • other (ScmRun) – ScmRun containing data to subtract

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before subtracting as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the subtraction will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Difference between self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0
>>> fos_minus_afolu = fos.subtract(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil - AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_minus_afolu.head()
time                                                                  2010-01-01  2020-01-01
model     region   scenario  unit       variable
idealised World|NH idealised gigatC / a Emissions|CO2|Fossil - AFOLU      -0.001       3.995
          World|SH idealised gigatC / a Emissions|CO2|Fossil - AFOLU       1.997       5.993
>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_minus_sh = nh.subtract(sh, op_cols={"region": "World|NH - SH"})
>>> nh_minus_sh.head()
time                                                               2010-01-01  2020-01-01
model     region        scenario  unit       variable
idealised World|NH - SH idealised gigatC / a Emissions|CO2|Fossil        -2.0        -2.0
                                  megatC / a Emissions|CO2|AFOLU         -2.0        -2.0
tail(*args, **kwargs)[source]

Return tail of self.timeseries().

Parameters:
  • *args (typing.Any) – Passed to self.timeseries().tail()

  • **kwargs (typing.Any) – Passed to self.timeseries().tail()

Returns:

pandas.DataFrame – Tail of self.timeseries()

time_mean(rule)[source]

Take time mean of self

Note that this method will not copy the metadata attribute to the returned value.

Parameters:

rule (["AC", "AS", "A"]) –

How to take the time mean. The names reflect the pandas user guide where they can, but only the options given above are supported. For clarity, if rule is 'AC', then the mean is an annual mean i.e. each time point in the result is the mean of all values for that particular year. If rule is 'AS', then the mean is an annual mean centred on the beginning of the year i.e. each time point in the result is the mean of all values from July 1st in the previous year to June 30 in the given year. If rule is 'A', then the mean is an annual mean centred on the end of the year i.e. each time point in the result is the mean of all values from July 1st of the given year to June 30 in the next year.

Returns:

ScmRun – The time mean of self.

property time_points

Time points of the data

Returns:

scmdata.time.TimePoints

timeseries(meta=None, check_duplicated=True, time_axis=None, drop_all_nan_times=False)[source]

Return the data with metadata as a pandas.DataFrame.

Parameters:
  • meta (list[str]) – The list of meta columns that will be included in the output’s MultiIndex. If None (default), then all metadata will be used.

  • check_duplicated (bool) – If True, an exception is raised if any of the timeseries have duplicated metadata

  • time_axis ({None, "year", "year-month", "days since 1970-01-01", "seconds since 1970-01-01"}) – See long_data() for a description of the options.

  • drop_all_nan_times (bool) – Should time points which contain only nan values be dropped? This operation is applied after any transforms introduced by the value of time_axis.

Returns:

pandas.DataFrame – DataFrame with datetimes as columns and timeseries as rows. Metadata is in the index.

Raises:
  • NonUniqueMetadataError – If the metadata are not unique between timeseries and check_duplicated is True

  • NotImplementedError – The value of time_axis is not recognised

  • ValueError – The value of time_axis would result in columns which aren’t unique

to_csv(fname, **kwargs)[source]

Write timeseries data to a csv file

Parameters:

fname (FilePath) – Path to write the file into

Return type:

None

to_iamdataframe()[source]

Convert to a LongDatetimeIamDataFrame instance.

LongDatetimeIamDataFrame is a subclass of pyam.IamDataFrame. We use LongDatetimeIamDataFrame to ensure all times can be handled, see docstring of LongDatetimeIamDataFrame for details.

Returns:

LongDatetimeIamDataFrameLongDatetimeIamDataFrame instance containing the same data.

Raises:

ImportError – If pyam is not installed

to_xarray(dimensions=('region',), extras=(), unify_units=True)

Convert to a xarray.Dataset

Parameters:
  • dimensions (iterable of str) –

    Dimensions for each variable in the returned dataset. If an “_id” co-ordinate is

    required (see extras documentation for when “_id” is required) and is not included in dimensions then it will be the last dimension (or second last dimension if “time” is also not included in dimensions). If “time” is not included in dimensions it will be the last dimension.

  • extras (iterable of str) – Columns in self.meta from which to create “non-dimension co-ordinates” (see xarray terminology for more details). These non-dimension co-ordinates store extra information and can be mapped to each timeseries found in the data variables of the output xarray.Dataset. Where possible, these non-dimension co-ordinates will use dimension co-ordinates as their own co-ordinates. However, if the metadata in extras is not defined by a single dimension in dimensions, then the extras co-ordinates will have dimensions of “_id”. This “_id” co-ordinate maps the values in the extras co-ordinates to each timeseries in the serialised dataset. Where “_id” is required, an extra “_id” dimension will also be added to dimensions.

  • unify_units (bool) – If a given variable has multiple units, should we attempt to unify them?

Returns:

xarray.Dataset – Data in self, re-formatted as an xarray.Dataset

Raises:
  • ValueError – If a variable has multiple units and unify_units is False.

  • ValueError – If a variable has multiple units which are not able to be converted to a common unit because they have different base units.

property values: ndarray[Any, dtype[float64]]

Timeseries values without metadata

The values are returned such that each row is a different timeseries being a row and each column is a different time (although no time information is included as a plain numpy.ndarray is returned).

Returns:

np.ndarray – The array in the same shape as ScmRun.shape(), that is (num_timeseries, num_timesteps).

ScmRun

class ScmRun(data=None, index=None, columns=None, metadata=None, copy_data=False, **kwargs)[source]

Bases: BaseScmRun

Data container for holding one or many time-series of SCM data.

required_cols: tuple[str, ...] = ('model', 'scenario', 'region', 'variable', 'unit')

Minimum metadata columns required by an ScmRun.

If an application requires a different set of required metadata, this can be specified by overriding required_cols on a custom class inheriting scmdata.run.BaseScmRun. Note that at a minimum, (“variable”, “unit”) columns are required.

run_append

run_append(runs, inplace=False, duplicate_msg=True, metadata=None)[source]

Append together many objects.

When appending many objects, it may be more efficient to call this routine once with a list of ScmRun’s, than using ScmRun.append() multiple times.

Parameters:
  • runs (list of ScmRun or pd.DataFrame) – The runs to append. Values will be attempted to be cast to ScmRun.

  • inplace (bool) – If True, then the operation updates the first item in runs inplace. Otherwise, the results are appended to a new object.

  • duplicate_msg (str | bool) – If True, raise a NonUniqueMetadataError error so the user can see the duplicate timeseries. If False, take the average and do not raise a warning or error. If "warn", raise a warning if duplicate data is detected.

  • metadata (MetadataType | None) – If not None, override the metadata of the resulting ScmRun with metadata. Otherwise, the metadata for the runs are merged. In the case where there are duplicate metadata keys, the values from the first run are used.

Returns:

ScmRun – Object containing the appended data. The resultant class will be determined by the type of the first object.

Raises:
  • TypeError – If inplace is True but the first element in dfs is not an instance of ScmRun runs argument is not a list

  • ValueErrorduplicate_msg option is not recognised. No runs are provided to be appended