scmdata.ops
Operations for ScmRun
objects
These largely rely on Pint’s Pandas interface to handle unit conversions automatically
- scmdata.ops.add(self, other, op_cols, **kwargs)[source]
Add values
- Parameters:
op_cols (dict of str: str) – Dictionary containing the columns to drop before adding as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the addition will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns:
Sum of
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type:
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> total = fos.add(afolu, op_cols={"variable": "Emissions|CO2"}) >>> total.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2 gigatC / a 1.0 13.0 25.0 World|SH Emissions|CO2 gigatC / a 5.0 17.0 29.0 >>> >>> nh = start.filter(region="*NH") >>> nh.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 >>> >>> sh = start.filter(region="*SH") >>> sh.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 >>> >>> world = nh.add(sh, op_cols={"region": "World"}) >>> world.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World Emissions|CO2|Fossil gigatC / a 2.0 14.0 26.0 Emissions|CO2|AFOLU gigatC / a 4.0 16.0 28.0
- scmdata.ops.adjust_median_to_target(self, target, evaluation_period, process_over=None, check_groups_identical=False, check_groups_identical_kwargs=None)[source]
Adjust the median of (an ensemble of) timeseries to a specified target
- Parameters:
target (float) – Value to which the median of each (group of) timeseries should be adjusted
evaluation_period (list[int]) – Period over which the median should be evaluated
process_over (list) – Metadata to treat as ‘ensemble members’ i.e. all other columns in the metadata of
self
will be used to group the timeseries before calculating the median. If not supplied, timeseries will not be grouped.check_groups_identical (bool) – Should we check that the median of each group is the same before making the adjustment?
check_groups_identical_kwargs (dict) – Only used if
check_groups_identical
isTrue
, in which case these are passed through to np.testing.assert_allclose
- Raises:
NotImplementedError –
evaluation_period
is based on times not yearsAssertionError – If
check_groups_identical
isTrue
and the median of each group is not the same before making the adjustment.
- Returns:
Timeseries adjusted to have the intended median
- Return type:
- scmdata.ops.cumsum(self, out_var=None, check_annual=True)[source]
Integrate with respect to time using a cumulative sum
This method should be used when dealing with piecewise-constant timeseries ( such as annual emissions) or step functions. In the case of annual emissions, each timestep represents a total flux over a whole year, rather than an average value or point in time estimate. When integrating, one can sum up each individual year to get the cumulative total, rather than using an alternative method for numerical integration, such as the trapizoidal rule which assumes that the values change linearly between timesteps.
This method requires data to be on uniform annual intervals.
scmdata.run.ScmRun.resample()
can be used to resample the data onto annual timesteps.The output timesteps are the same as the timesteps of the input, but since the input timeseries are piecewise constant (i.e. a constant for a given year), the output can also be thought of as being a sum up to and including the last day of a given year. The functionality to modify the output timesteps to an arbitrary day/month of the year has not been implemented, if that would be useful raise an issue on GitHub.
If the timeseries are piecewise-linear,
cumtrapz()
should be used instead.- Parameters:
- Returns:
scmdata.ScmRun
containing the integral ofself
with respect to time- Return type:
See also
- Raises:
ValueError – If an unknown method is provided Failed unit conversion Non-annual timeseries and check_annual is True
- Warns:
UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.
- scmdata.ops.cumtrapz(self, out_var=None)[source]
Integrate with respect to time using the trapezoid rule
This method should be used when dealing with piecewise-linear timeseries ( Concentrations, Effective Radiative Forcing, decadal means etc). This method handles non-uniform intervals without having to resample to annual values first.
The result will contain the same timesteps as the input timeseries, with the first timestep being zero. Each subsequent value represents the integral up to the day and time of the timestep. The function
scmdata.run.ScmRun.relative_to_ref_period()
can be used to calculate an integral relative to a reference year.- Parameters:
out_var (str) – If provided, the variable column of the output is set equal to
out_var
. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.- Returns:
scmdata.ScmRun
containing the integral ofself
with respect to time- Return type:
See also
- Raises:
ValueError – If an unknown method is provided Failed unit conversion
- Warns:
UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.
- scmdata.ops.delta_per_delta_time(self, out_var=None)[source]
Calculate change in timeseries values for each timestep, divided by the size of the timestep
The output is placed on the middle of each timestep and is one timestep shorter than the input.
- Parameters:
out_var (str) – If provided, the variable column of the output is set equal to
out_var
. Otherwise, the output variables are equal to the input variables, prefixed with “Delta ” .- Returns:
scmdata.ScmRun
containing the changes in values ofself
, normalised by the change in time- Return type:
- Warns:
UserWarning – The data contains nans. If this happens, the output data will also contain nans.
- scmdata.ops.divide(self, other, op_cols, **kwargs)[source]
Divide values (self / other)
- Parameters:
op_cols (dict of str: str) – Dictionary containing the columns to drop before dividing as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the division will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns:
Quotient of
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type:
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> fos_afolu_ratio = fos.divide( ... afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"} ... ) >>> fos_afolu_ratio.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2|Fossil : AFOLU dimensionless 0.000000 0.857143 0.923077 World|SH Emissions|CO2|Fossil : AFOLU dimensionless 0.666667 0.888889 0.933333
- scmdata.ops.inject_ops_methods(cls)[source]
Inject the operation methods
- Parameters:
cls – Target class
- scmdata.ops.integrate(self, out_var=None)[source]
Integrate with respect to time
This function has been deprecated since the method of integration depends on the type of data being integrated.
- Parameters:
out_var (str) – If provided, the variable column of the output is set equal to
out_var
. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.- Returns:
scmdata.ScmRun
containing the integral ofself
with respect to time- Return type:
See also
- Raises:
ValueError – If an unknown method is provided Failed unit conversion
- Warns:
UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.
DeprecationWarning – This function has been deprecated in preference to
cumsum()
andcumtrapz()
.
- scmdata.ops.linear_regression(self)[source]
Calculate linear regression of each timeseries
Note
Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with
self.time_points.values.astype("datetime64[s]").astype("int")
. This decision does not matter for the gradients, but is important for the intercept values.- Returns:
list of dict[str – List of dictionaries. Each dictionary contains the metadata for the timeseries plus the gradient (with key
"gradient"
) and intercept ( with key"intercept"
). The gradient and intercept are stored aspint.Quantity
.- Return type:
Any]
- scmdata.ops.linear_regression_gradient(self, unit=None)[source]
Calculate gradients of a linear regression of each timeseries
- Parameters:
unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
- Returns:
self.meta
plus a column with the value of the gradient for each timeseries. The"unit"
column is updated to show the unit of the gradient.- Return type:
- scmdata.ops.linear_regression_intercept(self, unit=None)[source]
Calculate intercepts of a linear regression of each timeseries
Note
Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with
self.time_points.values.astype("datetime64[s]").astype("int")
. This decision does not matter for the gradients, but is important for the intercept values.- Parameters:
unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
- Returns:
self.meta
plus a column with the value of the gradient for each timeseries. The"unit"
column is updated to show the unit of the gradient.- Return type:
- scmdata.ops.linear_regression_scmrun(self)[source]
Re-calculate the timeseries based on a linear regression
- Returns:
The timeseries, re-calculated based on a linear regression
- Return type:
- scmdata.ops.multiply(self, other, op_cols, **kwargs)[source]
Multiply values
- Parameters:
op_cols (dict of str: str) – Dictionary containing the columns to drop before multiplying as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the multiplication will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns:
Product of
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type:
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> fos_times_afolu = fos.multiply( ... afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"} ... ) >>> fos_times_afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2 0.0 42.0 156.0 World|SH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2 6.0 72.0 210.0
- scmdata.ops.prep_for_op(inp, op_cols, meta, ur=None)[source]
Prepare dataframe for operation
- Parameters:
op_cols (dict of str: str) – Dictionary containing the columns to drop in order to prepare for the operation as the keys (the values are not used). For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then we will drop the “variable” column from the index.ur (
pint.UnitRegistry
) – Pint unit registry to use for the operation
- Returns:
Timeseries to use for the operation. They are the transpose of the normal
ScmRun.timeseries()
output with the columns being Pint arrays (unless “unit” is in op_cols in which case no units are available to be used so the columns are standard numpy arrays). We do this so that we can use Pint’s Pandas interface to handle unit conversions automatically.- Return type:
- scmdata.ops.set_op_values(output, op_cols)[source]
Set operation values in output
- Parameters:
output (
pandas.Dataframe
) – Dataframe of which to update the valuesop_cols (dict of str: str) – Dictionary containing the columns to update as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.
- Returns:
output
with the relevant columns being set according toop_cols
.- Return type:
pandas.Dataframe
- scmdata.ops.subtract(self, other, op_cols, **kwargs)[source]
Subtract values
- Parameters:
op_cols (dict of str: str) – Dictionary containing the columns to drop before subtracting as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the subtraction will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns:
Difference between
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type:
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> fos_minus_afolu = fos.subtract( ... afolu, op_cols={"variable": "Emissions|CO2|Fossil - AFOLU"} ... ) >>> fos_minus_afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2|Fossil - AFOLU gigatC / a -1.0 -1.0 -1.0 World|SH Emissions|CO2|Fossil - AFOLU gigatC / a -1.0 -1.0 -1.0 >>> >>> nh_minus_sh = nh.subtract(sh, op_cols={"region": "World|NH - SH"}) >>> nh_minus_sh.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH - SH Emissions|CO2|Fossil gigatC / a -2.0 -2.0 -2.0 Emissions|CO2|AFOLU gigatC / a -2.0 -2.0 -2.0