scmdata.ops

Operations for ScmRun objects

These largely rely on Pint’s Pandas interface to handle unit conversions automatically

prep_for_op

prep_for_op(inp, op_cols, meta, ur=None)[source]

Prepare dataframe for operation

Parameters:

inp (ScmRun) – ScmRun containing data to prepare
op_cols (dict of str: str) – Dictionary containing the columns to drop in order to prepare for the operation as the keys (the values are not used). For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then we will drop the “variable” column from the index.
ur (pint.UnitRegistry) – Pint unit registry to use for the operation

Returns:

pandas.DataFrame – Timeseries to use for the operation. They are the transpose of the normal ScmRun.timeseries() output with the columns being Pint arrays (unless “unit” is in op_cols in which case no units are available to be used so the columns are standard numpy arrays). We do this so that we can use Pint’s Pandas interface to handle unit conversions automatically.

set_op_values

set_op_values(output, op_cols)[source]

Set operation values in output

Parameters:

output (pandas.Dataframe) – Dataframe of which to update the values
op_cols (dict of str: str) – Dictionary containing the columns to update as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.

Returns:

pandas.Dataframe – output with the relevant columns being set according to op_cols.

subtract

subtract(self, other, op_cols, **kwargs)[source]

Subtract values

Parameters:

other (ScmRun) – ScmRun containing data to subtract
op_cols (dict of str: str) – Dictionary containing the columns to drop before subtracting as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the subtraction will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Difference between self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun

>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )

>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0

>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0

>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0

>>> fos_minus_afolu = fos.subtract(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil - AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_minus_afolu.head()
time                                                                  2010-01-01  2020-01-01
model     region   scenario  unit       variable
idealised World|NH idealised gigatC / a Emissions|CO2|Fossil - AFOLU      -0.001       3.995
          World|SH idealised gigatC / a Emissions|CO2|Fossil - AFOLU       1.997       5.993

>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_minus_sh = nh.subtract(sh, op_cols={"region": "World|NH - SH"})
>>> nh_minus_sh.head()
time                                                               2010-01-01  2020-01-01
model     region        scenario  unit       variable
idealised World|NH - SH idealised gigatC / a Emissions|CO2|Fossil        -2.0        -2.0
                                  megatC / a Emissions|CO2|AFOLU         -2.0        -2.0

add

add(self, other, op_cols, **kwargs)[source]

Add values

Parameters:

other (ScmRun) – ScmRun containing data to add
op_cols (dict of str: str) – Dictionary containing the columns to drop before adding as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the addition will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Sum of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun

>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )

>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0

>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0

>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0

>>> fos_plus_afolu = fos.add(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil + AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_plus_afolu.head()
time                                                                  2010-01-01  2020-01-01
model     region   scenario  unit       variable
idealised World|NH idealised gigatC / a Emissions|CO2|Fossil + AFOLU       0.001       4.005
          World|SH idealised gigatC / a Emissions|CO2|Fossil + AFOLU       2.003       6.007

>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_plus_sh = nh.add(sh, op_cols={"region": "World|NH + SH"})
>>> nh_plus_sh.head()
time                                                               2010-01-01  2020-01-01
model     region        scenario  unit       variable
idealised World|NH + SH idealised gigatC / a Emissions|CO2|Fossil         2.0        10.0
                                  megatC / a Emissions|CO2|AFOLU          4.0        12.0

multiply

multiply(self, other, op_cols, **kwargs)[source]

Multiply values

Parameters:

other (ScmRun) – ScmRun containing data to multiply
op_cols (dict of str: str) – Dictionary containing the columns to drop before multiplying as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the multiplication will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Product of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun

>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )

>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0

>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0

>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0

>>> fos_times_afolu = fos.multiply(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil * AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_times_afolu.convert_unit("(GtC / yr) ** 2").head()
time                                                                       2010-01-01  2020-01-01
model     region   scenario  unit            variable
idealised World|NH idealised (GtC / yr) ** 2 Emissions|CO2|Fossil * AFOLU       0.000       0.020
          World|SH idealised (GtC / yr) ** 2 Emissions|CO2|Fossil * AFOLU       0.006       0.042

>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_times_sh = nh.multiply(sh, op_cols={"region": "World|NH * SH"})
>>> nh_times_sh.convert_unit("(GtC / yr) ** 2").head()
time                                                                    2010-01-01  2020-01-01
model     region        scenario  unit            variable
idealised World|NH * SH idealised (GtC / yr) ** 2 Emissions|CO2|Fossil    0.000000   24.000000
                                                  Emissions|CO2|AFOLU     0.000003    0.000035

divide

divide(self, other, op_cols, **kwargs)[source]

Divide values (self / other)

Parameters:

other (ScmRun) – ScmRun containing data to divide
op_cols (dict of str: str) – Dictionary containing the columns to drop before dividing as the keys and the value those columns should hold in the output as the values. For example, if we have op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"} then the division will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
**kwargs (any) – Passed to prep_for_op()

Returns:

ScmRun – Quotient of self and other, using op_cols to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.

Examples

>>> import numpy as np
>>> from scmdata import ScmRun

>>> start = ScmRun(
...     data=np.arange(8).reshape(2, 4),
...     index=[2010, 2020],
...     columns={
...         "variable": [
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...             "Emissions|CO2|Fossil",
...             "Emissions|CO2|AFOLU",
...         ],
...         "unit": ["GtC / yr", "MtC / yr", "GtC / yr", "MtC / yr"],
...         "region": [
...             "World|NH",
...             "World|NH",
...             "World|SH",
...             "World|SH",
...         ],
...         "model": "idealised",
...         "scenario": "idealised",
...     },
... )

>>> start.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
                             MtC / yr Emissions|CO2|AFOLU          1.0         5.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0
                             MtC / yr Emissions|CO2|AFOLU          3.0         7.0

>>> fos = start.filter(variable="*Fossil")
>>> fos.head()
time                                                        2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised GtC / yr Emissions|CO2|Fossil         0.0         4.0
          World|SH idealised GtC / yr Emissions|CO2|Fossil         2.0         6.0

>>> afolu = start.filter(variable="*AFOLU")
>>> afolu.head()
time                                                       2010-01-01  2020-01-01
model     region   scenario  unit     variable
idealised World|NH idealised MtC / yr Emissions|CO2|AFOLU         1.0         5.0
          World|SH idealised MtC / yr Emissions|CO2|AFOLU         3.0         7.0

>>> fos_divide_afolu = fos.divide(
...     afolu, op_cols={"variable": "Emissions|CO2|Fossil / AFOLU"}
... )
>>> # The rows align and the units are handled automatically
>>> fos_divide_afolu.convert_unit("dimensionless").head()
time                                                                     2010-01-01  2020-01-01
model     region   scenario  unit          variable
idealised World|NH idealised dimensionless Emissions|CO2|Fossil / AFOLU    0.000000  800.000000
          World|SH idealised dimensionless Emissions|CO2|Fossil / AFOLU  666.666667  857.142857

>>> nh = start.filter(region="World|NH")
>>> sh = start.filter(region="World|SH")
>>> nh_divide_sh = nh.divide(sh, op_cols={"region": "World|NH / SH"})
>>> nh_divide_sh.convert_unit("dimensionless").head()
time                                                                  2010-01-01  2020-01-01
model     region        scenario  unit          variable
idealised World|NH / SH idealised dimensionless Emissions|CO2|Fossil    0.000000    0.666667
                                                Emissions|CO2|AFOLU     0.333333    0.714286

cumsum

cumsum(self, out_var=None, check_annual=True)[source]

Integrate with respect to time using a cumulative sum

This method should be used when dealing with piecewise-constant timeseries ( such as annual emissions) or step functions. In the case of annual emissions, each timestep represents a total flux over a whole year, rather than an average value or point in time estimate. When integrating, one can sum up each individual year to get the cumulative total, rather than using an alternative method for numerical integration, such as the trapizoidal rule which assumes that the values change linearly between timesteps.

This method requires data to be on uniform annual intervals. scmdata.run.ScmRun.resample() can be used to resample the data onto annual timesteps.

The output timesteps are the same as the timesteps of the input, but since the input timeseries are piecewise constant (i.e. a constant for a given year), the output can also be thought of as being a sum up to and including the last day of a given year. The functionality to modify the output timesteps to an arbitrary day/month of the year has not been implemented, if that would be useful raise an issue on GitHub.

If the timeseries are piecewise-linear, cumtrapz() should be used instead.

Parameters:

out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.
check_annual (bool) – If True (default), check that the timeseries are on uniform annual intervals.

Returns:

scmdata.ScmRun – scmdata.ScmRun containing the integral of self with respect to time

cumtrapz

cumtrapz(self, out_var=None)[source]

Integrate with respect to time using the trapezoid rule

This method should be used when dealing with piecewise-linear timeseries ( Concentrations, Effective Radiative Forcing, decadal means etc). This method handles non-uniform intervals without having to resample to annual values first.

The result will contain the same timesteps as the input timeseries, with the first timestep being zero. Each subsequent value represents the integral up to the day and time of the timestep. The function scmdata.run.ScmRun.relative_to_ref_period() can be used to calculate an integral relative to a reference year.

Parameters:: out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.
Returns:: scmdata.ScmRun – scmdata.ScmRun containing the integral of self with respect to time

integrate

integrate(self, out_var=None)[source]

Integrate with respect to time

This function has been deprecated since the method of integration depends on the type of data being integrated.

Parameters:: out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.
Returns:: scmdata.ScmRun – scmdata.ScmRun containing the integral of self with respect to time

delta_per_delta_time

delta_per_delta_time(self, out_var=None)[source]

Calculate change in timeseries values for each timestep, divided by the size of the timestep

The output is placed on the middle of each timestep and is one timestep shorter than the input.

Parameters:: out_var (str) – If provided, the variable column of the output is set equal to out_var. Otherwise, the output variables are equal to the input variables, prefixed with “Delta ” .
Returns:: scmdata.ScmRun – scmdata.ScmRun containing the changes in values of self, normalised by the change in time
Warns:: UserWarning – The data contains nans. If this happens, the output data will also contain nans.

linear_regression

linear_regression(self)[source]

Calculate linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Returns:: list of dict[str (Any]) – List of dictionaries. Each dictionary contains the metadata for the timeseries plus the gradient (with key "gradient") and intercept ( with key "intercept"). The gradient and intercept are stored as pint.Quantity.

linear_regression_gradient

linear_regression_gradient(self, unit=None)[source]

Calculate gradients of a linear regression of each timeseries

Parameters:: unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
Returns:: pandas.DataFrame – self.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.

linear_regression_intercept

linear_regression_intercept(self, unit=None)[source]

Calculate intercepts of a linear regression of each timeseries

Note

Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with self.time_points.values.astype("datetime64[s]").astype("int"). This decision does not matter for the gradients, but is important for the intercept values.

Parameters:: unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
Returns:: pandas.DataFrame – self.meta plus a column with the value of the gradient for each timeseries. The "unit" column is updated to show the unit of the gradient.

linear_regression_scmrun

linear_regression_scmrun(self)[source]

Re-calculate the timeseries based on a linear regression

Returns:: scmdata.ScmRun – The timeseries, re-calculated based on a linear regression

adjust_median_to_target

adjust_median_to_target(self, target, evaluation_period, process_over=None, check_groups_identical=False, check_groups_identical_kwargs=None)[source]

Adjust the median of (an ensemble of) timeseries to a specified target

Parameters:

target (float) – Value to which the median of each (group of) timeseries should be adjusted
evaluation_period (list[int]) – Period over which the median should be evaluated
process_over (list) – Metadata to treat as ‘ensemble members’ i.e. all other columns in the metadata of self will be used to group the timeseries before calculating the median. If not supplied, timeseries will not be grouped.
check_groups_identical (bool) – Should we check that the median of each group is the same before making the adjustment?
check_groups_identical_kwargs (dict) – Only used if check_groups_identical is True, in which case these are passed through to np.testing.assert_allclose

Raises:

NotImplementedError – evaluation_period is based on times not years
AssertionError – If check_groups_identical is True and the median of each group is not the same before making the adjustment.

Returns:

ScmRun – Timeseries adjusted to have the intended median

inject_ops_methods

inject_ops_methods(cls)[source]

Inject the operation methods

Parameters:: cls – Target class