scmdata.ops

Operations for ScmRun objects

These largely rely on Pint’s Pandas interface to handle unit conversions automatically

scmdata.ops.add(self, other, op_cols, **kwargs)[source]

Add values

Parameters
  • other (ScmRun) – ScmRun containing data to add

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before adding as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the addition will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns

Sum of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> total = fos.add(afolu, op_cols={"variable": "Emissions|CO2"})
>>> total.head()
time                                                   2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable      unit
idealised idealised World|NH Emissions|CO2 gigatC / a                  1.0                 13.0                 25.0
                    World|SH Emissions|CO2 gigatC / a                  5.0                 17.0                 29.0
>>>
>>> nh = start.filter(region="*NH")
>>> nh.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU  GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
>>>
>>> sh = start.filter(region="*SH")
>>> sh.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU  GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> world = nh.add(sh, op_cols={"region": "World"})
>>> world.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region variable             unit
idealised idealised World  Emissions|CO2|Fossil gigatC / a                  2.0                 14.0                 26.0
                           Emissions|CO2|AFOLU  gigatC / a                  4.0                 16.0                 28.0
scmdata.ops.adjust_median_to_target(self, target, evaluation_period, process_over=None, check_groups_identical=False, check_groups_identical_kwargs=None)[source]

Adjust the median of (an ensemble of) timeseries to a specified target

Parameters
  • target (float) – Value to which the median of each (group of) timeseries should be adjusted

  • evaluation_period (list[int]) – Period over which the median should be evaluated

  • process_over (list) – Metadata to treat as ‘ensemble members’ i.e. all other columns in the metadata of self will be used to group the timeseries before calculating the median. If not supplied, timeseries will not be grouped.

  • check_groups_identical (bool) – Should we check that the median of each group is the same before making the adjustment?

  • check_groups_identical_kwargs (dict) – Only used if check_groups_identical is True, in which case these are passed through to np.testing.assert_allclose

Raises
  • NotImplementedErrorevaluation_period is based on times not years

  • AssertionError – If check_groups_identical is True and the median of each group is not the same before making the adjustment.

Returns

Timeseries adjusted to have the intended median

Return type

ScmRun

scmdata.ops.cumsum(self, out_var=None, check_annual=True)[source]

Integrate with respect to time using a cumulative sum

This method should be used when dealing with piecewise-constant timeseries ( such as annual emissions) or step functions. In the case of annual emissions, each timestep represents a total flux over a whole year, rather than an average value or point in time estimate. When integrating, one can sum up each individual year to get the cumulative total, rather than using an alternative method for numerical integration, such as the trapizoidal rule which assumes that the values change linearly between timesteps.

This method requires data to be on uniform annual intervals. scmdata.run.ScmRun.resample() can be used to resample the data onto annual timesteps.

The output timesteps are the same as the timesteps of the input, but since the input timeseries are piecewise constant (i.e. a constant for a given year), the output can also be thought of as being a sum up to and including the last day of a given year. The functionality to modify the output timesteps to an arbitrary day/month of the year has not been implemented, if that would be useful raise an issue on GitHub.

If the timeseries are piecewise-linear, cumtrapz() should be used instead.

Parameters
  • out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.

  • check_annual (bool) – If True (default), check that the timeseries are on uniform annual intervals.

Returns

scmdata.ScmRun containing the integral of self with respect to time

Return type

scmdata.ScmRun

See also

cumtrapz()

Raises

ValueError – If an unknown method is provided Failed unit conversion Non-annual timeseries and check_annual is True

Warns

UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.

scmdata.ops.cumtrapz(self, out_var=None)[source]

Integrate with respect to time using the trapezoid rule

This method should be used when dealing with piecewise-linear timeseries ( Concentrations, Effective Radiative Forcing, decadal means etc). This method handles non-uniform intervals without having to resample to annual values first.

The result will contain the same timesteps as the input timeseries, with the first timestep being zero. Each subsequent value represents the integral up to the day and time of the timestep. The function scmdata.run.ScmRun.relative_to_ref_period() can be used to calculate an integral relative to a reference year.

Parameters

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.

Returns

scmdata.ScmRun containing the integral of self with respect to time

Return type

scmdata.ScmRun

See also

cumsum()

Raises

ValueError – If an unknown method is provided Failed unit conversion

Warns

UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.

scmdata.ops.delta_per_delta_time(self, out_var=None)[source]

Calculate change in timeseries values for each timestep, divided by the size of the timestep

The output is placed on the middle of each timestep and is one timestep shorter than the input.

Parameters

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Delta ” .

Returns

scmdata.ScmRun containing the changes in values of self, normalised by the change in time

Return type

scmdata.ScmRun

Warns

UserWarning – The data contains nans. If this happens, the output data will also contain nans.

scmdata.ops.divide(self, other, op_cols, **kwargs)[source]

Divide values (self / other)

Parameters
  • other (ScmRun) – ScmRun containing data to divide

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before dividing as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the division will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns

Quotient of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> fos_afolu_ratio = fos.divide(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"}
... )
>>> fos_afolu_ratio.head()
time                                                                     2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable                     unit
idealised idealised World|NH Emissions|CO2|Fossil : AFOLU dimensionless             0.000000             0.857143             0.923077
                    World|SH Emissions|CO2|Fossil : AFOLU dimensionless             0.666667             0.888889             0.933333
scmdata.ops.inject_ops_methods(cls)[source]

Inject the operation methods

Parameters

cls – Target class

scmdata.ops.integrate(self, out_var=None)[source]

Integrate with respect to time

This function has been deprecated since the method of integration depends on the type of data being integrated.

Parameters

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.

Returns

scmdata.ScmRun containing the integral of self with respect to time

Return type

scmdata.ScmRun

See also

cumsum(), cumtrapz()

Raises

ValueError – If an unknown method is provided Failed unit conversion

Warns
  • UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.

  • DeprecationWarning – This function has been deprecated in preference to cumsum() and cumtrapz().

scmdata.ops.linear_regression(self)[source]

Calculate linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Returns

list of dict[str – List of dictionaries. Each dictionary contains the metadata for the timeseries plus the gradient (with key "gradient") and intercept ( with key "intercept"). The gradient and intercept are stored as pint.Quantity.

Return type

Any]

scmdata.ops.linear_regression_gradient(self, unit=None)[source]

Calculate gradients of a linear regression of each timeseries

Parameters

unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.

Returns

self.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.

Return type

pandas.DataFrame

scmdata.ops.linear_regression_intercept(self, unit=None)[source]

Calculate intercepts of a linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Parameters

unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.

Returns

self.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.

Return type

pandas.DataFrame

scmdata.ops.linear_regression_scmrun(self)[source]

Re-calculate the timeseries based on a linear regression

Returns

The timeseries, re-calculated based on a linear regression

Return type

scmdata.ScmRun

scmdata.ops.multiply(self, other, op_cols, **kwargs)[source]

Multiply values

Parameters
  • other (ScmRun) – ScmRun containing data to multiply

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before multiplying as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the multiplication will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns

Product of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> fos_times_afolu = fos.multiply(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"}
... )
>>> fos_times_afolu.head()
time                                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable                     unit
idealised idealised World|NH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2                  0.0                 42.0                156.0
                    World|SH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2                  6.0                 72.0                210.0
scmdata.ops.prep_for_op(inp, op_cols, meta, ur=<openscm_units._unit_registry.ScmUnitRegistry object>)[source]

Prepare dataframe for operation

Parameters
  • inp (ScmRun) – ScmRun containing data to prepare

  • op_cols (dict of str: str) – Dictionary containing the columns to drop in order to prepare for the operation as the keys (the values are not used). For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then we will drop the “variable” column from the index.

  • ur (pint.UnitRegistry) – Pint unit registry to use for the operation

Returns

Timeseries to use for the operation. They are the transpose of the normal ScmRun.timeseries() output with the columns being Pint arrays (unless “unit” is in op_cols in which case no units are available to be used so the columns are standard numpy arrays). We do this so that we can use Pint’s Pandas interface to handle unit conversions automatically.

Return type

pandas.DataFrame

scmdata.ops.set_op_values(output, op_cols)[source]

Set operation values in output

Parameters
  • output (pandas.Dataframe) – Dataframe of which to update the values

  • op_cols (dict of str: str) – Dictionary containing the columns to update as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

Returns

output with the relevant columns being set according to op_cols.

Return type

pandas.Dataframe

scmdata.ops.subtract(self, other, op_cols, **kwargs)[source]

Subtract values

Parameters
  • other (ScmRun) – ScmRun containing data to subtract

  • op_cols (dict of str: str) – Dictionary containing the columns to drop before subtracting as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the subtraction will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

  • **kwargs (any) – Passed to prep_for_op()

Returns

Difference between self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> fos_minus_afolu = fos.subtract(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil - AFOLU"}
... )
>>> fos_minus_afolu.head()
time                                                                  2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable                     unit
idealised idealised World|NH Emissions|CO2|Fossil - AFOLU gigatC / a                 -1.0                 -1.0                 -1.0
                    World|SH Emissions|CO2|Fossil - AFOLU gigatC / a                 -1.0                 -1.0                 -1.0
>>>
>>> nh_minus_sh = nh.subtract(sh, op_cols={"region": "World|NH - SH"})
>>> nh_minus_sh.head()
time                                                               2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region        variable             unit
idealised idealised World|NH - SH Emissions|CO2|Fossil gigatC / a                 -2.0                 -2.0                 -2.0
                                  Emissions|CO2|AFOLU  gigatC / a                 -2.0                 -2.0                 -2.0