scmdata.ops

Operations for ScmRun objects

These largely rely on Pint’s Pandas interface to handle unit conversions automatically

scmdata.ops.add(self, other, op_cols, **kwargs)[source]

Add values

Parameters:

other (ScmRun) – ScmRun containing data to add
op_cols (dict of str: str) – Dictionary containing the columns to drop before adding as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the addition will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

Sum of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type:

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> total = fos.add(afolu, op_cols={"variable": "Emissions|CO2"})
>>> total.head()
time                                                   2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable      unit
idealised idealised World|NH Emissions|CO2 gigatC / a                  1.0                 13.0                 25.0
                    World|SH Emissions|CO2 gigatC / a                  5.0                 17.0                 29.0
>>>
>>> nh = start.filter(region="*NH")
>>> nh.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU  GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
>>>
>>> sh = start.filter(region="*SH")
>>> sh.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU  GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> world = nh.add(sh, op_cols={"region": "World"})
>>> world.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region variable             unit
idealised idealised World  Emissions|CO2|Fossil gigatC / a                  2.0                 14.0                 26.0
                           Emissions|CO2|AFOLU  gigatC / a                  4.0                 16.0                 28.0

scmdata.ops.adjust_median_to_target(self, target, evaluation_period, process_over=None, check_groups_identical=False, check_groups_identical_kwargs=None)[source]

Adjust the median of (an ensemble of) timeseries to a specified target

Parameters:

target (float) – Value to which the median of each (group of) timeseries should be adjusted
evaluation_period (list[int]) – Period over which the median should be evaluated
process_over (list) – Metadata to treat as ‘ensemble members’ i.e. all other columns in the metadata of self will be used to group the timeseries before calculating the median. If not supplied, timeseries will not be grouped.
check_groups_identical (bool) – Should we check that the median of each group is the same before making the adjustment?
check_groups_identical_kwargs (dict) – Only used if check_groups_identical is True, in which case these are passed through to np.testing.assert_allclose

Raises:

NotImplementedError – evaluation_period is based on times not years
AssertionError – If check_groups_identical is True and the median of each group is not the same before making the adjustment.

Returns:

Timeseries adjusted to have the intended median

Return type:

ScmRun

scmdata.ops.cumsum(self, out_var=None, check_annual=True)[source]

Integrate with respect to time using a cumulative sum

This method should be used when dealing with piecewise-constant timeseries ( such as annual emissions) or step functions. In the case of annual emissions, each timestep represents a total flux over a whole year, rather than an average value or point in time estimate. When integrating, one can sum up each individual year to get the cumulative total, rather than using an alternative method for numerical integration, such as the trapizoidal rule which assumes that the values change linearly between timesteps.

This method requires data to be on uniform annual intervals. scmdata.run.ScmRun.resample() can be used to resample the data onto annual timesteps.

The output timesteps are the same as the timesteps of the input, but since the input timeseries are piecewise constant (i.e. a constant for a given year), the output can also be thought of as being a sum up to and including the last day of a given year. The functionality to modify the output timesteps to an arbitrary day/month of the year has not been implemented, if that would be useful raise an issue on GitHub.

If the timeseries are piecewise-linear, cumtrapz() should be used instead.

Parameters:

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.
check_annual (bool) – If True (default), check that the timeseries are on uniform annual intervals.

Returns:

scmdata.ScmRun containing the integral of self with respect to time

Return type:

scmdata.ScmRun

See also

cumsum()

Raises:: ValueError – If an unknown method is provided Failed unit conversion
Warns:: UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.

scmdata.ops.delta_per_delta_time(self, out_var=None)[source]

Calculate change in timeseries values for each timestep, divided by the size of the timestep

The output is placed on the middle of each timestep and is one timestep shorter than the input.

Parameters:: out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Delta ” .
Returns:: scmdata.ScmRun containing the changes in values of self, normalised by the change in time
Return type:: scmdata.ScmRun
Warns:: UserWarning – The data contains nans. If this happens, the output data will also contain nans.

scmdata.ops.divide(self, other, op_cols, **kwargs)[source]

Divide values (self / other)

Parameters:

other (ScmRun) – ScmRun containing data to divide
op_cols (dict of str: str) – Dictionary containing the columns to drop before dividing as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the division will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

Quotient of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type:

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> fos_afolu_ratio = fos.divide(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"}
... )
>>> fos_afolu_ratio.head()
time                                                                     2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable                     unit
idealised idealised World|NH Emissions|CO2|Fossil : AFOLU dimensionless             0.000000             0.857143             0.923077
                    World|SH Emissions|CO2|Fossil : AFOLU dimensionless             0.666667             0.888889             0.933333

scmdata.ops.inject_ops_methods(cls)[source]

Inject the operation methods

Parameters:: cls – Target class

scmdata.ops.integrate(self, out_var=None)[source]

Integrate with respect to time

This function has been deprecated since the method of integration depends on the type of data being integrated.

Parameters:: out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.
Returns:: scmdata.ScmRun containing the integral of self with respect to time
Return type:: scmdata.ScmRun

See also

cumsum(), cumtrapz()

Raises:

ValueError – If an unknown method is provided Failed unit conversion

Warns:

UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.
DeprecationWarning – This function has been deprecated in preference to cumsum() and cumtrapz().

scmdata.ops.linear_regression(self)[source]

Calculate linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Returns:: list of dict[str – List of dictionaries. Each dictionary contains the metadata for the timeseries plus the gradient (with key "gradient") and intercept ( with key "intercept"). The gradient and intercept are stored as pint.Quantity.
Return type:: Any]

scmdata.ops.linear_regression_gradient(self, unit=None)[source]

Calculate gradients of a linear regression of each timeseries

Parameters:: unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
Returns:: self.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.
Return type:: pandas.DataFrame

scmdata.ops.linear_regression_intercept(self, unit=None)[source]

Calculate intercepts of a linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Parameters:: unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
Returns:: self.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.
Return type:: pandas.DataFrame

scmdata.ops.linear_regression_scmrun(self)[source]

Re-calculate the timeseries based on a linear regression

Returns:: The timeseries, re-calculated based on a linear regression
Return type:: scmdata.ScmRun

scmdata.ops.multiply(self, other, op_cols, **kwargs)[source]

Multiply values

Parameters:

other (ScmRun) – ScmRun containing data to multiply
op_cols (dict of str: str) – Dictionary containing the columns to drop before multiplying as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the multiplication will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

Product of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type:

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> fos_times_afolu = fos.multiply(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"}
... )
>>> fos_times_afolu.head()
time                                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable                     unit
idealised idealised World|NH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2                  0.0                 42.0                156.0
                    World|SH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2                  6.0                 72.0                210.0

scmdata.ops.prep_for_op(inp, op_cols, meta, ur=None)[source]

Prepare dataframe for operation

Parameters:

inp (ScmRun) – ScmRun containing data to prepare
op_cols (dict of str: str) – Dictionary containing the columns to drop in order to prepare for the operation as the keys (the values are not used). For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then we will drop the “variable” column from the index.
ur (pint.UnitRegistry) – Pint unit registry to use for the operation

Returns:

Timeseries to use for the operation. They are the transpose of the normal ScmRun.timeseries() output with the columns being Pint arrays (unless “unit” is in op_cols in which case no units are available to be used so the columns are standard numpy arrays). We do this so that we can use Pint’s Pandas interface to handle unit conversions automatically.

Return type:

pandas.DataFrame

scmdata.ops.set_op_values(output, op_cols)[source]

Set operation values in output

Parameters:

output (pandas.Dataframe) – Dataframe of which to update the values
op_cols (dict of str: str) – Dictionary containing the columns to update as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

Returns:

output with the relevant columns being set according to op_cols.

Return type:

pandas.Dataframe

scmdata.ops.subtract(self, other, op_cols, **kwargs)[source]

Subtract values

Parameters:

other (ScmRun) – ScmRun containing data to subtract
op_cols (dict of str: str) – Dictionary containing the columns to drop before subtracting as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the subtraction will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

Difference between self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Return type:

ScmRun

Examples

>>> import numpy as np
>>> from scmdata import ScmRun
>>>
>>> IDX = [2010, 2020, 2030]
>>>
>>>
>>> start = ScmRun(
...     data=np.arange(18).reshape(3, 6),
...     index=IDX,
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Cumulative Emissions|CO2",
...             "Surface Air Temperature Change",
...         ],
...         "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"],
...         "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )
>>>
>>> start.head()
time                                                            2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable                 unit     region   model     scenario
Emissions|CO2|Fossil     GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
Emissions|CO2|AFOLU      GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
Emissions|CO2|Fossil     GtC / yr World|SH idealised idealised                  2.0                  8.0                 14.0
Emissions|CO2|AFOLU      GtC / yr World|SH idealised idealised                  3.0                  9.0                 15.0
Cumulative Emissions|CO2 GtC      World    idealised idealised                  4.0                 10.0                 16.0
>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable             unit     region   model     scenario
Emissions|CO2|Fossil GtC / yr World|NH idealised idealised                  0.0                  6.0                 12.0
                              World|SH idealised idealised                  2.0                  8.0                 14.0
>>>
>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
variable            unit     region   model     scenario
Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised                  1.0                  7.0                 13.0
                             World|SH idealised idealised                  3.0                  9.0                 15.0
>>>
>>> fos_minus_afolu = fos.subtract(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil - AFOLU"}
... )
>>> fos_minus_afolu.head()
time                                                                  2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region   variable                     unit
idealised idealised World|NH Emissions|CO2|Fossil - AFOLU gigatC / a                 -1.0                 -1.0                 -1.0
                    World|SH Emissions|CO2|Fossil - AFOLU gigatC / a                 -1.0                 -1.0                 -1.0
>>>
>>> nh_minus_sh = nh.subtract(sh, op_cols={"region": "World|NH - SH"})
>>> nh_minus_sh.head()
time                                                               2010-01-01 00:00:00  2020-01-01 00:00:00  2030-01-01 00:00:00
model     scenario  region        variable             unit
idealised idealised World|NH - SH Emissions|CO2|Fossil gigatC / a                 -2.0                 -2.0                 -2.0
                                  Emissions|CO2|AFOLU  gigatC / a                 -2.0                 -2.0                 -2.0