scmdata.run
- class scmdata.run.ScmRun(data: Optional[Any] = None, index: Optional[Any] = None, columns: Optional[Union[Dict[str, list], Dict[str, str]]] = None, metadata: Optional[Dict[str, Union[str, int, float]]] = None, copy_data: bool = False, **kwargs: Any)[source]
Bases:
BaseScmRun
Data container for holding one or many time-series of SCM data.
- __init__(data: Optional[Any] = None, index: Optional[Any] = None, columns: Optional[Union[Dict[str, list], Dict[str, str]]] = None, metadata: Optional[Dict[str, Union[str, int, float]]] = None, copy_data: bool = False, **kwargs: Any)
Initialize the container with timeseries data.
- Parameters
data (Union[ScmRun, IamDataFrame, pd.DataFrame, np.ndarray, str]) –
If a
ScmRun
object is provided, then a newScmRun
is created with a copy of the values and metadata from :obj: data.A
pandas.DataFrame
with IAMC-format data columns (the result fromScmRun.timeseries()
) can be provided without any additionalcolumns
andindex
information.If a numpy array of timeseries data is provided,
columns
andindex
must also be specified. The shape of the numpy array should be(n_times, n_series)
where n_times is the number of timesteps and n_series is the number of time series.If a string is passed, data will be attempted to be read from file. Currently, reading from CSV, gzipped CSV and Excel formatted files is supported. The string could be a URL in a format handled by pandas. Valid URL schemes include http, ftp, s3, gs, and file if pandas>1.2 is used. For more information about the remote formats that can be read, see the
pd.read_csv
documentation for the version of pandas which is installed.If no data is provided than an empty
ScmRun
object is created.index (np.ndarray) –
If
index
is notNone
, then theindex
is used as the timesteps for run. All timeseries in the run use the same set of timesteps.The values will be attempted to be converted to
numpy.datetime[s]
values. Possible input formats include :int
Start of yearfloat
Decimal yearstr
Usesdateutil.parser()
. Slow and should be avoided if possible
If
index
isNone
, than the time index will be obtained from thedata
if possible.columns –
If None, ScmRun will attempt to infer the values from the source. Otherwise, use this dict to write the metadata for each timeseries in data. For each metadata key (e.g. “model”, “scenario”), an array of values (one per time series) is expected. Alternatively, providing a list of length 1 applies the same value to all timeseries in data. For example, if you had three timeseries from ‘rcp26’ for 3 different models ‘model’, ‘model2’ and ‘model3’, the column dict would look like either ‘col_1’ or ‘col_2’:
>>> col_1 = { "scenario": ["rcp26"], "model": ["model1", "model2", "model3"], "region": ["unspecified"], "variable": ["unspecified"], "unit": ["unspecified"] } >>> col_2 = { "scenario": ["rcp26", "rcp26", "rcp26"], "model": ["model1", "model2", "model3"], "region": ["unspecified"], "variable": ["unspecified"], "unit": ["unspecified"] } >>> assert pd.testing.assert_frame_equal( ScmRun(d, columns=col_1).meta, ScmRun(d, columns=col_2).meta )
metadata –
Optional dictionary of metadata for instance as a whole.
This can be used to store information such as the longer-form information about a particular dataset, for example, dataset description or DOIs.
Defaults to an empty
dict
if no default metadata are provided.copy_data (bool) –
If True, an explicit copy of data is performed.
Note
The copy can be very expensive on large timeseries and should only be needed in cases where the original data is manipulated.
**kwargs – Additional parameters passed to
_read_file()
to read files
- Raises
If you try to load from multiple files at once. If you wish to do this, please use
scmdata.run.run_append()
instead. * Not specifyingindex
andcolumns
ifdata
is anumpy.ndarray
scmdata.errors.MissingRequiredColumn – If metadata for
required_cols
is not foundTypeError – Timeseries cannot be read from
data
- add(other, op_cols, **kwargs)
Add values
- Parameters
op_cols (dict of str: str) – Dictionary containing the columns to drop before adding as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the addition will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns
Sum of
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> total = fos.add(afolu, op_cols={"variable": "Emissions|CO2"}) >>> total.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2 gigatC / a 1.0 13.0 25.0 World|SH Emissions|CO2 gigatC / a 5.0 17.0 29.0 >>> >>> nh = start.filter(region="*NH") >>> nh.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 >>> >>> sh = start.filter(region="*SH") >>> sh.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 >>> >>> world = nh.add(sh, op_cols={"region": "World"}) >>> world.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World Emissions|CO2|Fossil gigatC / a 2.0 14.0 26.0 Emissions|CO2|AFOLU gigatC / a 4.0 16.0 28.0
- adjust_median_to_target(target, evaluation_period, process_over=None, check_groups_identical=False, check_groups_identical_kwargs=None)
Adjust the median of (an ensemble of) timeseries to a specified target
- Parameters
target (float) – Value to which the median of each (group of) timeseries should be adjusted
evaluation_period (list[int]) – Period over which the median should be evaluated
process_over (list) – Metadata to treat as ‘ensemble members’ i.e. all other columns in the metadata of
self
will be used to group the timeseries before calculating the median. If not supplied, timeseries will not be grouped.check_groups_identical (bool) – Should we check that the median of each group is the same before making the adjustment?
check_groups_identical_kwargs (dict) – Only used if
check_groups_identical
isTrue
, in which case these are passed through to np.testing.assert_allclose
- Raises
NotImplementedError –
evaluation_period
is based on times not yearsAssertionError – If
check_groups_identical
isTrue
and the median of each group is not the same before making the adjustment.
- Returns
Timeseries adjusted to have the intended median
- Return type
- append(other, inplace: bool = False, duplicate_msg: Union[str, bool] = True, metadata: Optional[Dict[str, Union[str, int, float]]] = None, **kwargs: Any)
Append additional data to the current data.
For details, see
run_append()
.- Parameters
other – Data (in format which can be cast to
ScmRun
) to appendinplace – If
True
, append data in place and returnNone
. Otherwise, return a newScmRun
instance with the appended data.duplicate_msg – If
True
, raise ascmdata.errors.NonUniqueMetadataError
error so the user can see the duplicate timeseries. IfFalse
, take the average and do not raise a warning or error. If"warn"
, raise a warning if duplicate data is detected.metadata – If not
None
, override the metadata of the resultingScmRun
withmetadata
. Otherwise, the metadata for the runs are merged. In the case where there are duplicate metadata keys, the values from the first run are used.**kwargs – Keywords to pass to
ScmRun.__init__()
when readingother
- Returns
If not
inplace
, return a newScmRun
instance containing the result of the append.- Return type
- Raises
NonUniqueMetadataError – If the appending results in timeseries with duplicate metadata and
duplicate_msg
isTrue
- append_timewise(other, align_columns)
Append timeseries along the time axis
- Parameters
other (
scmdata.ScmRun
) –scmdata.ScmRun
containing the timeseries to appendalign_columns (list) – Columns used to align
other
andself
when joining
- Returns
Result of joining
self
andother
along the time axis- Return type
scmdata.ScmRun
- apply(func, *args, **kwargs)
Apply a function to each timeseries and append the results
func is called like func(ar, *args, **kwargs) for each
ScmRun
ar
in this group. If the result of this function call is None, than it is excluded from the results.The results are appended together using
run_append()
. The function can change the size of the inputScmRun
as long asrun_append()
can be applied to all results.Examples
>>> def multiply_by_2(arr): ... variable = arr.get_unique_meta("variable", True) ... if variable == "Surface Temperature": ... return arr * 2 ... return arr >>> run.apply(multiply_by_2)
- Parameters
func (function) – Callable to apply to each timeseries.
*args – Positional arguments passed to func.
**kwargs – Used to call func(ar, **kwargs) for each array ar.
- Returns
applied – The result of splitting, applying and combining this array.
- Return type
- convert_unit(unit: str, context: Optional[str] = None, inplace: bool = False, **kwargs: Any)
Convert the units of a selection of timeseries.
Uses
scmdata.units.UnitConverter
to perform the conversion.- Parameters
unit – Unit to convert to. This must be recognised by
UnitConverter
.context – Context to use for the conversion i.e. which metric to apply when performing CO2-equivalent calculations. If
None
, no metric will be applied and CO2-equivalent calculations will raiseDimensionalityError
.inplace – If True, apply the conversion inplace and return None
**kwargs – Extra arguments which are passed to
filter()
to limit the timeseries which are attempted to be converted. Defaults to selecting the entire ScmRun, which will likely fail.
- Returns
If
inplace
is notFalse
, a newScmRun
instance with the converted units.- Return type
Notes
If
context
is notNone
, then the context used for the conversion will be checked against any existing metadata and, if the conversion is valid, stored in the output’s metadata.- Raises
ValueError –
"unit_context"
is already included inself
’smeta_attributes()
and it does not matchcontext
for the variables to be converted.
- copy()
Return a
copy.deepcopy()
of self.Also creates copies the underlying Timeseries data
- Returns
copy.deepcopy()
ofself
- Return type
- cumsum(out_var=None, check_annual=True)
Integrate with respect to time using a cumulative sum
This method should be used when dealing with piecewise-constant timeseries ( such as annual emissions) or step functions. In the case of annual emissions, each timestep represents a total flux over a whole year, rather than an average value or point in time estimate. When integrating, one can sum up each individual year to get the cumulative total, rather than using an alternative method for numerical integration, such as the trapizoidal rule which assumes that the values change linearly between timesteps.
This method requires data to be on uniform annual intervals.
scmdata.run.ScmRun.resample()
can be used to resample the data onto annual timesteps.The output timesteps are the same as the timesteps of the input, but since the input timeseries are piecewise constant (i.e. a constant for a given year), the output can also be thought of as being a sum up to and including the last day of a given year. The functionality to modify the output timesteps to an arbitrary day/month of the year has not been implemented, if that would be useful raise an issue on GitHub.
If the timeseries are piecewise-linear,
cumtrapz()
should be used instead.- Parameters
- Returns
scmdata.ScmRun
containing the integral ofself
with respect to time- Return type
See also
- Raises
ValueError – If an unknown method is provided Failed unit conversion Non-annual timeseries and check_annual is True
- Warns
UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.
- cumtrapz(out_var=None)
Integrate with respect to time using the trapezoid rule
This method should be used when dealing with piecewise-linear timeseries ( Concentrations, Effective Radiative Forcing, decadal means etc). This method handles non-uniform intervals without having to resample to annual values first.
The result will contain the same timesteps as the input timeseries, with the first timestep being zero. Each subsequent value represents the integral up to the day and time of the timestep. The function
scmdata.run.ScmRun.relative_to_ref_period()
can be used to calculate an integral relative to a reference year.- Parameters
out_var (str) – If provided, the variable column of the output is set equal to
out_var
. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.- Returns
scmdata.ScmRun
containing the integral ofself
with respect to time- Return type
See also
- Raises
ValueError – If an unknown method is provided Failed unit conversion
- Warns
UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.
- data_hierarchy_separator = '|'
String used to define different levels in our data hierarchies.
By default we follow pyam and use “|”. In such a case, emissions of CO2 for energy from coal would be “Emissions|CO2|Energy|Coal”.
- Type
- delta_per_delta_time(out_var=None)
Calculate change in timeseries values for each timestep, divided by the size of the timestep
The output is placed on the middle of each timestep and is one timestep shorter than the input.
- Parameters
out_var (str) – If provided, the variable column of the output is set equal to
out_var
. Otherwise, the output variables are equal to the input variables, prefixed with “Delta ” .- Returns
scmdata.ScmRun
containing the changes in values ofself
, normalised by the change in time- Return type
- Warns
UserWarning – The data contains nans. If this happens, the output data will also contain nans.
- divide(other, op_cols, **kwargs)
Divide values (self / other)
- Parameters
op_cols (dict of str: str) – Dictionary containing the columns to drop before dividing as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the division will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns
Quotient of
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> fos_afolu_ratio = fos.divide( ... afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"} ... ) >>> fos_afolu_ratio.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2|Fossil : AFOLU dimensionless 0.000000 0.857143 0.923077 World|SH Emissions|CO2|Fossil : AFOLU dimensionless 0.666667 0.888889 0.933333
- drop_meta(columns: Union[list, str], inplace: Optional[bool] = False)
Drop meta columns out of the Run
- Parameters
columns – The column or columns to drop
inplace – If True, do operation inplace and return None.
- Raises
KeyError – If any of the columns do not exist in the meta
DataFrame
- filter(keep: bool = True, inplace: bool = False, log_if_empty: bool = True, **kwargs: Any)
Return a filtered ScmRun (i.e., a subset of the data).
>>> df <scmdata.ScmRun (timeseries: 3, timepoints: 3)> Time: Start: 2005-01-01T00:00:00 End: 2015-01-01T00:00:00 Meta: model scenario region variable unit climate_model 0 a_iam a_scenario World Primary Energy EJ/yr a_model 1 a_iam a_scenario World Primary Energy|Coal EJ/yr a_model 2 a_iam a_scenario2 World Primary Energy EJ/yr a_model [3 rows x 7 columns] >>> df.filter(scenario="a_scenario") <scmdata.ScmRun (timeseries: 2, timepoints: 3)> Time: Start: 2005-01-01T00:00:00 End: 2015-01-01T00:00:00 Meta: model scenario region variable unit climate_model 0 a_iam a_scenario World Primary Energy EJ/yr a_model 1 a_iam a_scenario World Primary Energy|Coal EJ/yr a_model [2 rows x 7 columns] >>> df.filter(scenario="a_scenario", keep=False) <scmdata.ScmRun (timeseries: 1, timepoints: 3)> Time: Start: 2005-01-01T00:00:00 End: 2015-01-01T00:00:00 Meta: model scenario region variable unit climate_model 2 a_iam a_scenario2 World Primary Energy EJ/yr a_model [1 rows x 7 columns] >>> df.filter(level=1) <scmdata.ScmRun (timeseries: 2, timepoints: 3)> Time: Start: 2005-01-01T00:00:00 End: 2015-01-01T00:00:00 Meta: model scenario region variable unit climate_model 0 a_iam a_scenario World Primary Energy EJ/yr a_model 2 a_iam a_scenario2 World Primary Energy EJ/yr a_model [2 rows x 7 columns] >>> df.filter(year=range(2000, 2011)) <scmdata.ScmRun (timeseries: 3, timepoints: 2)> Time: Start: 2005-01-01T00:00:00 End: 2010-01-01T00:00:00 Meta: model scenario region variable unit climate_model 0 a_iam a_scenario World Primary Energy EJ/yr a_model 1 a_iam a_scenario World Primary Energy|Coal EJ/yr a_model 2 a_iam a_scenario2 World Primary Energy EJ/yr a_model [2 rows x 7 columns]
- Parameters
keep – If True, keep all timeseries satisfying the filters, otherwise drop all the timeseries satisfying the filters
inplace – If True, do operation inplace and return None
log_if_empty – If
True
, log a warning level message if the result is empty.**kwargs –
Argument names are keys with which to filter, values are used to do the filtering. Filtering can be done on:
all metadata columns with strings, “*” can be used as a wildcard in search strings
’level’: the maximum “depth” of IAM variables (number of hierarchy levels, excluding the strings given in the ‘variable’ argument)
’time’: takes a
datetime.datetime
or list ofdatetime.datetime
’s TODO: default to np.datetime64’year’, ‘month’, ‘day’, hour’: takes an
int
or list ofint
’s (‘month’ and ‘day’ also acceptstr
or list ofstr
)
If
regexp=True
is included inkwargs
then the pseudo-regexp syntax inpattern_match()
is disabled.
- Returns
If not
inplace
, return a new instance with the filtered data.- Return type
- classmethod from_nc(fname)
Read a netCDF4 file from disk
- Parameters
fname (str) – Filename to read
See also
- get_meta_columns_except(*not_group)
Get columns in meta except a set
- get_unique_meta(meta: str, no_duplicates: Optional[bool] = False) Union[List[Any], Any]
Get unique values in a metadata column.
- Parameters
meta – Column to retrieve metadata for
no_duplicates – Should I raise an error if there is more than one unique value in the metadata column?
- Raises
ValueError – There is more than one unique value in the metadata column and
no_duplicates
isTrue
.KeyError – If a
meta
column does not exist in the run’s metadata
- Returns
List of unique metadata values. If
no_duplicates
isTrue
the metadata value will be returned (rather than a list).- Return type
[List[Any], Any]
- groupby(*group)
Group the object by unique metadata
Enables iteration over groups of data. For example, to iterate over each scenario in the object
>>> for group in df.groupby("scenario"): >>> print(group) <scmdata.ScmRun (timeseries: 2, timepoints: 3)> Time: Start: 2005-01-01T00:00:00 End: 2015-01-01T00:00:00 Meta: model scenario region variable unit climate_model 0 a_iam a_scenario World Primary Energy EJ/yr a_model 1 a_iam a_scenario World Primary Energy|Coal EJ/yr a_model <scmdata.ScmRun (timeseries: 1, timepoints: 3)> Time: Start: 2005-01-01T00:00:00 End: 2015-01-01T00:00:00 Meta: model scenario region variable unit climate_model 2 a_iam a_scenario2 World Primary Energy EJ/yr a_model
- groupby_all_except(*not_group)
Group the object by unique metadata apart from the input columns
In other words, the groups are determined by all columns in
self.meta
except for those innot_group
- head(*args, **kwargs)
Return head of
self.timeseries()
.- Parameters
*args – Passed to
self.timeseries().head()
**kwargs – Passed to
self.timeseries().head()
- Returns
Tail of
self.timeseries()
- Return type
- integrate(out_var=None)
Integrate with respect to time
This function has been deprecated since the method of integration depends on the type of data being integrated.
- Parameters
out_var (str) – If provided, the variable column of the output is set equal to
out_var
. Otherwise, the output variables are equal to the input variables, prefixed with “Cumulative “.- Returns
scmdata.ScmRun
containing the integral ofself
with respect to time- Return type
See also
- Raises
ValueError – If an unknown method is provided Failed unit conversion
- Warns
UserWarning – The data being integrated contains nans. If this happens, the output data will also contain nans.
DeprecationWarning – This function has been deprecated in preference to
cumsum()
andcumtrapz()
.
- interpolate(target_times: Union[ndarray, List[Union[datetime, int]]], interpolation_type: str = 'linear', extrapolation_type: str = 'linear')
Interpolate the data onto a new time frame.
- Parameters
- Returns
A new
ScmRun
containing the data interpolated onto thetarget_times
grid- Return type
- line_plot(**kwargs)
Make a line plot via seaborn’s lineplot
Deprecated: use
lineplot()
instead- Parameters
**kwargs – Keyword arguments to be passed to
seaborn.lineplot
. If none are passed, sensible defaults will be used.- Returns
Output of call to
seaborn.lineplot
- Return type
matplotlib.axes._subplots.AxesSubplot
- linear_regression()
Calculate linear regression of each timeseries
Note
Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with
self.time_points.values.astype("datetime64[s]").astype("int")
. This decision does not matter for the gradients, but is important for the intercept values.- Returns
list of dict[str – List of dictionaries. Each dictionary contains the metadata for the timeseries plus the gradient (with key
"gradient"
) and intercept ( with key"intercept"
). The gradient and intercept are stored aspint.Quantity
.- Return type
Any]
- linear_regression_gradient(unit=None)
Calculate gradients of a linear regression of each timeseries
- Parameters
unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
- Returns
self.meta
plus a column with the value of the gradient for each timeseries. The"unit"
column is updated to show the unit of the gradient.- Return type
- linear_regression_intercept(unit=None)
Calculate intercepts of a linear regression of each timeseries
Note
Times in seconds since 1970-01-01 are used as the x-axis for the regressions. Such values can be accessed with
self.time_points.values.astype("datetime64[s]").astype("int")
. This decision does not matter for the gradients, but is important for the intercept values.- Parameters
unit (str) – Output unit for gradients. If not supplied, the gradients’ units will not be converted to a common unit.
- Returns
self.meta
plus a column with the value of the gradient for each timeseries. The"unit"
column is updated to show the unit of the gradient.- Return type
- linear_regression_scmrun()
Re-calculate the timeseries based on a linear regression
- Returns
The timeseries, re-calculated based on a linear regression
- Return type
- lineplot(time_axis=None, **kwargs)
Make a line plot via seaborn’s lineplot
If only a single unit is present, it will be used as the y-axis label. The axis object is returned so this can be changed by the user if desired.
- Parameters
time_axis ({None, "year", "year-month", "days since 1970-01-01", "seconds since 1970-01-01"} # noqa: E501) –
Time axis to use for the plot.
If
None
,datetime.datetime
objects will be used.If
"year"
, the year of each time point will be used.If
"year-month"
, the year plus (month - 0.5) / 12 will be used.If
"days since 1970-01-01"
, the number of days since 1st Jan 1970 will be used (calculated using thedatetime
module).If
"seconds since 1970-01-01"
, the number of seconds since 1st Jan 1970 will be used (calculated using thedatetime
module).**kwargs – Keyword arguments to be passed to
seaborn.lineplot
. If none are passed, sensible defaults will be used.
- Returns
Output of call to
seaborn.lineplot
- Return type
matplotlib.axes._subplots.AxesSubplot
- long_data(time_axis=None)
Return data in long form, particularly useful for plotting with seaborn
- Parameters
time_axis ({None, "year", "year-month", "days since 1970-01-01", "seconds since 1970-01-01"}) –
Time axis to use for the output’s columns.
If
None
,datetime.datetime
objects will be used.If
"year"
, the year of each time point will be used.If
"year-month"
, the year plus (month - 0.5) / 12 will be used.If
"days since 1970-01-01"
, the number of days since 1st Jan 1970 will be used (calculated using thedatetime
module).If
"seconds since 1970-01-01"
, the number of seconds since 1st Jan 1970 will be used (calculated using thedatetime
module).- Returns
pandas.DataFrame
containing the data in ‘long form’ (i.e. one observation per row).- Return type
- property meta_attributes
Get a list of all meta keys
- Returns
Sorted list of meta keys
- Return type
- multiply(other, op_cols, **kwargs)
Multiply values
- Parameters
op_cols (dict of str: str) – Dictionary containing the columns to drop before multiplying as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the multiplication will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns
Product of
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> fos_times_afolu = fos.multiply( ... afolu, op_cols={"variable": "Emissions|CO2|Fossil : AFOLU"} ... ) >>> fos_times_afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2 0.0 42.0 156.0 World|SH Emissions|CO2|Fossil : AFOLU gigatC ** 2 / a ** 2 6.0 72.0 210.0
- plumeplot(ax=None, quantiles_plumes=[((0.05, 0.95), 0.5), ((0.5,), 1.0)], hue_var='scenario', hue_label='Scenario', palette=None, style_var='variable', style_label='Variable', dashes=None, linewidth=2, time_axis=None, pre_calculated=False, quantile_over=('ensemble_member',))
Make a plume plot, showing plumes for custom quantiles
- Parameters
ax (
matplotlib.axes._subplots.AxesSubplot
) – Axes on which to make the plotquantiles_plumes (list[tuple[tuple, float]]) – Configuration to use when plotting quantiles. Each element is a tuple, the first element of which is itself a tuple and the second element of which is the alpha to use for the quantile. If the first element has length two, these two elements are the quantiles to plot and a plume will be made between these two quantiles. If the first element has length one, then a line will be plotted to represent this quantile.
hue_var (str) – The column of
self.meta
which should be used to distinguish different hues.hue_label (str) – Label to use in the legend for
hue_var
.palette (dict) – Dictionary defining the colour to use for different values of
hue_var
.style_var (str) – The column of
self.meta
which should be used to distinguish different styles.style_label (str) – Label to use in the legend for
style_var
.dashes (dict) – Dictionary defining the style to use for different values of
style_var
.linewidth (float) – Width of lines to use (for quantiles which are not to be shown as plumes)
time_axis (str) – Time axis to use for the plot (see
timeseries()
)pre_calculated (bool) – Are the quantiles pre-calculated? If no, the quantiles will be calculated within this function. Pre-calculating the quantiles using
ScmRun.quantiles_over()
can lead to faster plotting if multiple plots are to be made with the same quantiles.quantile_over (str, tuple[str]) – Columns of
self.meta
over which the quantiles should be calculated. Only used ifpre_calculated
isFalse
.
- Returns
Axes on which the plot was made and the legend items we have made (in case the user wants to move the legend to a different position for example)
- Return type
matplotlib.axes._subplots.AxesSubplot
, list
Examples
>>> scmrun = ScmRun( ... data=np.random.random((10, 3)).T, ... columns={ ... "model": ["a_iam"], ... "climate_model": ["a_model"] * 5 + ["a_model_2"] * 5, ... "scenario": ["a_scenario"] * 5 + ["a_scenario_2"] * 5, ... "ensemble_member": list(range(5)) + list(range(5)), ... "region": ["World"], ... "variable": ["Surface Air Temperature Change"], ... "unit": ["K"], ... }, ... index=[2005, 2010, 2015], ... )
Plot the plumes, calculated over the different ensemble members.
>>> scmrun.plumeplot(quantile_over="ensemble_member")
Pre-calculate the quantiles, then plot
>>> summary_stats = ScmRun( ... scmrun.quantiles_over("ensemble_member", quantiles=quantiles) ... ) >>> summary_stats.plumeplot(pre_calculated=True)
Note
scmdata
is not a plotting library so this function is provided as is, with little testing. In some ways, it is more intended as inspiration for other users than as a robust plotting tool.
- process_over(cols: Union[str, List[str]], operation: Union[str, Callable[[DataFrame], Union[DataFrame, Series, float]]], na_override=-1000000.0, op_cols=None, as_run=False, **kwargs: Any) DataFrame
Process the data over the input columns.
- Parameters
cols – Columns to perform the operation on. The timeseries will be grouped by all other columns in
meta
.operation (str or func) –
The operation to perform.
If a string is provided, the equivalent pandas groupby function is used. Note that not all groupby functions are available as some do not make sense for this particular application. Additional information about the arguments for the pandas groupby functions can be found at <https://pandas.pydata.org/pan das-docs/stable/reference/groupby.html>`_.
If a function is provided, it will be applied to each group. The function must take a dataframe as its first argument and return a DataFrame, Series or scalar.
Note that quantile means the value of the data at a given point in the cumulative distribution of values at each point in the timeseries, for each timeseries once the groupby is applied. As a result, using
q=0.5
is the same as taking the median and not the same as taking the mean/average.Convert any nan value in the timeseries meta to this value during processsing. The meta values converted back to nan’s before the run is returned. This should not need to be changed unless the existing metadata clashes with the default na_override value.
This functionality is disabled if na_override is None, but may result in incorrect results if the timeseries meta includes any nan’s.
op_cols (dict of str: str) –
Dictionary containing any columns that should be overridden after processing.
If a required column from
scmdata.ScmRun
is specified incols
andas_run=True
, an override must be provided for that column inop_cols
otherwise the conversion toscmdata.ScmRun
will fail.as_run (bool or subclass of BaseScmRun) –
If True, return the resulting timeseries as an
scmdata.ScmRun
object, otherwise if False, apandas.DataFrame`or :class:`pandas.Series
is returned (depending on the nature of the operation). Some operations may not be able to be converted to ascmdata.ScmRun
. For example if the operation returns scalar values rather than timeseries.If a class is provided, the return value will be cast to this class.
**kwargs – Keyword arguments to pass
operation
(or the pandas operation ifoperation
is a string)
- Returns
The result of
operation
, grouped by all columns inmeta
other thancols
- Return type
pandas.DataFrame
orpandas.Series
orscmdata.ScmRun
- Raises
ValueError – If the operation is not an allowed operation If the value of na_override clashes with any existing metadata If
operation
produces apandas.Series
, but as_run` is True Ifas_run
is not True, False or a subclass ofscmdata.run.BaseScmRun
scmdata.errors.MissingRequiredColumnError – If as_run is not False and the result does not have the required metadata to convert to an :class`ScmRun <scmdata.ScmRun>`. This can be resolved by specifying additional metadata via
op_cols
- quantiles_over(cols: Union[str, List[str]], quantiles: Union[str, List[float]], **kwargs: Any) DataFrame
Calculate quantiles of the data over the input columns.
- Parameters
cols – Columns to perform the operation on. The timeseries will be grouped by all other columns in
meta
.quantiles – The quantiles to calculate. This should be a list of quantiles to calculate (quantile values between 0 and 1).
quantiles
can also include the strings “median” or “mean” if these values are to be calculated.**kwargs – Passed to
process_over()
.
- Returns
The quantiles of the timeseries, grouped by all columns in
meta
other thancols
. Each calculated quantile is given a label which is stored in thequantile
column within the output index.- Return type
- Raises
TypeError –
operation
is included inkwargs
. The operation is inferred fromquantiles
.
- reduce(func, dim=None, axis=None, **kwargs)
Apply a function along a given axis
This is to provide the GroupBy functionality in
ScmRun.groupby()
and is not generally called directly.This implementation is very bare-bones - no reduction along the time time dimension is allowed and only the dim parameter is used.
- Parameters
- Return type
- Raises
ValueError – If a dimension other than None is provided
NotImplementedError – If axis is anything other than 0
- relative_to_ref_period_mean(append_str=None, **kwargs)
Return the timeseries relative to a given reference period mean.
The reference period mean is subtracted from all values in the input timeseries.
- Parameters
- Returns
New object containing the timeseries, adjusted to the reference period mean. The reference period year bounds are stored in the meta columns
"reference_period_start_year"
and"reference_period_end_year"
.- Return type
- Raises
NotImplementedError –
append_str
is notNone
- required_cols = ('model', 'scenario', 'region', 'variable', 'unit')
Minimum metadata columns required by an ScmRun.
If an application requires a different set of required metadata, this can be specified by overriding
required_cols
on a custom class inheritingscmdata.run.BaseScmRun
. Note that at a minimum, (“variable”, “unit”) columns are required.
- resample(rule: str = 'AS', **kwargs: Any)
Resample the time index of the timeseries data onto a custom grid.
This helper function allows for values to be easily interpolated onto annual or monthly timesteps using the rules=’AS’ or ‘MS’ respectively. Internally, the interpolate function performs the regridding.
- Parameters
rule – See the pandas user guide for a list of options. Note that Business-related offsets such as “BusinessDay” are not supported.
**kwargs – Other arguments to pass through to
interpolate()
- Returns
New
ScmRun
instance on a new time index- Return type
Examples
Resample a run to annual values
>>> scm_df = ScmRun( ... pd.Series([1, 2, 10], index=(2000, 2001, 2009)), ... columns={ ... "model": ["a_iam"], ... "scenario": ["a_scenario"], ... "region": ["World"], ... "variable": ["Primary Energy"], ... "unit": ["EJ/y"], ... } ... ) >>> scm_df.timeseries().T model a_iam scenario a_scenario region World variable Primary Energy unit EJ/y year 2000 1 2010 10
An annual timeseries can be the created by interpolating to the start of years using the rule ‘AS’.
>>> res = scm_df.resample('AS') >>> res.timeseries().T model a_iam scenario a_scenario region World variable Primary Energy unit EJ/y time 2000-01-01 00:00:00 1.000000 2001-01-01 00:00:00 2.001825 2002-01-01 00:00:00 3.000912 2003-01-01 00:00:00 4.000000 2004-01-01 00:00:00 4.999088 2005-01-01 00:00:00 6.000912 2006-01-01 00:00:00 7.000000 2007-01-01 00:00:00 7.999088 2008-01-01 00:00:00 8.998175 2009-01-01 00:00:00 10.00000
>>> m_df = scm_df.resample('MS') >>> m_df.timeseries().T model a_iam scenario a_scenario region World variable Primary Energy unit EJ/y time 2000-01-01 00:00:00 1.000000 2000-02-01 00:00:00 1.084854 2000-03-01 00:00:00 1.164234 2000-04-01 00:00:00 1.249088 2000-05-01 00:00:00 1.331204 2000-06-01 00:00:00 1.416058 2000-07-01 00:00:00 1.498175 2000-08-01 00:00:00 1.583029 2000-09-01 00:00:00 1.667883 ... 2008-05-01 00:00:00 9.329380 2008-06-01 00:00:00 9.414234 2008-07-01 00:00:00 9.496350 2008-08-01 00:00:00 9.581204 2008-09-01 00:00:00 9.666058 2008-10-01 00:00:00 9.748175 2008-11-01 00:00:00 9.833029 2008-12-01 00:00:00 9.915146 2009-01-01 00:00:00 10.000000 [109 rows x 1 columns]
Note that the values do not fall exactly on integer values as not all years are exactly the same length.
References
See the pandas documentation for resample <http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas. Series.resample.html> for more information about possible arguments.
- round(decimals=3, inplace=False)
Round data to a given number of decimal places.
For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc.
- set_meta(dimension: str, value: Any, **filter_kwargs: Any)
Update metadata
Optionally, a subset of metadata may be modified through the use of additional filter_kwargs which are passed to
filter()
. The metadata associated with the non-filtered timeseries are not modified.This method does not preserve the order of the timeseries.
- Parameters
See also
- Returns
A new instance with the updated metadata.
- Return type
BaseScmRun
- subtract(other, op_cols, **kwargs)
Subtract values
- Parameters
op_cols (dict of str: str) – Dictionary containing the columns to drop before subtracting as the keys and the value those columns should hold in the output as the values. For example, if we have
op_cols={"variable": "Emissions|CO2 - Emissions|CO2|Fossil"}
then the subtraction will be performed with an index that uses all columns except the “variable” column and the output will have a “variable” column with the value “Emissions|CO2 - Emissions|CO2|Fossil”.**kwargs (any) – Passed to
prep_for_op()
- Returns
Difference between
self
andother
, usingop_cols
to define the columns which should be dropped before the data is aligned and to define the value of these columns in the output.- Return type
Examples
>>> import numpy as np >>> from scmdata import ScmRun >>> >>> IDX = [2010, 2020, 2030] >>> >>> >>> start = ScmRun( ... data=np.arange(18).reshape(3, 6), ... index=IDX, ... columns={ ... "variable": [ ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Emissions|CO2|Fossil", ... "Emissions|CO2|AFOLU", ... "Cumulative Emissions|CO2", ... "Surface Air Temperature Change", ... ], ... "unit": ["GtC / yr", "GtC / yr", "GtC / yr", "GtC / yr", "GtC", "K"], ... "region": ["World|NH", "World|NH", "World|SH", "World|SH", "World", "World"], ... "model": "idealised", ... "scenario": "idealised", ... }, ... ) >>> >>> start.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 Emissions|CO2|Fossil GtC / yr World|SH idealised idealised 2.0 8.0 14.0 Emissions|CO2|AFOLU GtC / yr World|SH idealised idealised 3.0 9.0 15.0 Cumulative Emissions|CO2 GtC World idealised idealised 4.0 10.0 16.0 >>> fos = start.filter(variable="*Fossil") >>> fos.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|Fossil GtC / yr World|NH idealised idealised 0.0 6.0 12.0 World|SH idealised idealised 2.0 8.0 14.0 >>> >>> afolu = start.filter(variable="*AFOLU") >>> afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 variable unit region model scenario Emissions|CO2|AFOLU GtC / yr World|NH idealised idealised 1.0 7.0 13.0 World|SH idealised idealised 3.0 9.0 15.0 >>> >>> fos_minus_afolu = fos.subtract( ... afolu, op_cols={"variable": "Emissions|CO2|Fossil - AFOLU"} ... ) >>> fos_minus_afolu.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH Emissions|CO2|Fossil - AFOLU gigatC / a -1.0 -1.0 -1.0 World|SH Emissions|CO2|Fossil - AFOLU gigatC / a -1.0 -1.0 -1.0 >>> >>> nh_minus_sh = nh.subtract(sh, op_cols={"region": "World|NH - SH"}) >>> nh_minus_sh.head() time 2010-01-01 00:00:00 2020-01-01 00:00:00 2030-01-01 00:00:00 model scenario region variable unit idealised idealised World|NH - SH Emissions|CO2|Fossil gigatC / a -2.0 -2.0 -2.0 Emissions|CO2|AFOLU gigatC / a -2.0 -2.0 -2.0
- tail(*args: Any, **kwargs: Any) DataFrame
Return tail of
self.timeseries()
.- Parameters
*args – Passed to
self.timeseries().tail()
**kwargs – Passed to
self.timeseries().tail()
- Returns
Tail of
self.timeseries()
- Return type
- time_mean(rule: str)
Take time mean of self
Note that this method will not copy the
metadata
attribute to the returned value.- Parameters
rule (["AC", "AS", "A"]) –
How to take the time mean. The names reflect the pandas user guide where they can, but only the options given above are supported. For clarity, if
rule
is'AC'
, then the mean is an annual mean i.e. each time point in the result is the mean of all values for that particular year. Ifrule
is'AS'
, then the mean is an annual mean centred on the beginning of the year i.e. each time point in the result is the mean of all values from July 1st in the previous year to June 30 in the given year. Ifrule
is'A'
, then the mean is an annual mean centred on the end of the year i.e. each time point in the result is the mean of all values from July 1st of the given year to June 30 in the next year.- Returns
The time mean of
self
.- Return type
- property time_points
Time points of the data
- Return type
- timeseries(meta=None, check_duplicated=True, time_axis=None, drop_all_nan_times=False)
Return the data with metadata as a
pandas.DataFrame
.- Parameters
meta (list[str]) – The list of meta columns that will be included in the output’s MultiIndex. If None (default), then all metadata will be used.
check_duplicated (bool) – If True, an exception is raised if any of the timeseries have duplicated metadata
time_axis ({None, "year", "year-month", "days since 1970-01-01", "seconds since 1970-01-01"}) – See
long_data()
for a description of the options.drop_all_nan_times (bool) – Should time points which contain only nan values be dropped? This operation is applied after any transforms introduced by the value of
time_axis
.
- Returns
DataFrame with datetimes as columns and timeseries as rows. Metadata is in the index.
- Return type
- Raises
NonUniqueMetadataError – If the metadata are not unique between timeseries and
check_duplicated
isTrue
NotImplementedError – The value of time_axis is not recognised
ValueError – The value of time_axis would result in columns which aren’t unique
- to_csv(fname: str, **kwargs: Any) None
Write timeseries data to a csv file
- Parameters
fname – Path to write the file into
- to_iamdataframe() None
Convert to a
LongDatetimeIamDataFrame
instance.LongDatetimeIamDataFrame
is a subclass ofpyam.IamDataFrame
. We useLongDatetimeIamDataFrame
to ensure all times can be handled, see docstring ofLongDatetimeIamDataFrame
for details.- Returns
LongDatetimeIamDataFrame
instance containing the same data.- Return type
LongDatetimeIamDataFrame
- Raises
ImportError – If pyam is not installed
- to_nc(fname, dimensions=('region',), extras=(), **kwargs)
Write timeseries to disk as a netCDF4 file
Each unique variable will be written as a variable within the netCDF file. Choosing the dimensions and extras such that there are as few empty (or nan) values as possible will lead to the best compression on disk.
- Parameters
fname (str) – Path to write the file into
dimensions (iterable of str) – Dimensions to include in the netCDF file. The time dimension is always included (if not provided it will be the last dimension). An additional dimension (specifically a co-ordinate in xarray terms), “_id”, will be included if
extras
is provided and any of the metadata inextras
is not uniquely defined bydimensions
. “_id” maps the timeseries in each variable to their relevant metadata.extras (iterable of str) – Metadata columns to write as variables in the netCDF file (specifically as “non-dimension co-ordinates” in xarray terms, see xarray terminology for more details). Where possible, these non-dimension co-ordinates will use dimension co-ordinates as their own co-ordinates. However, if the metadata in
extras
is not defined by a single dimension indimensions
, then theextras
co-ordinates will have dimensions of “_id”. This “_id” co-ordinate maps the values in theextras
co-ordinates to each timeseries in the serialised dataset. Where “_id” is required, an extra “_id” dimension will also be added todimensions
.kwargs – Passed through to
xarray.Dataset.to_netcdf()
See also
- to_xarray(dimensions=('region',), extras=(), unify_units=True)
Convert to a
xarray.Dataset
- Parameters
dimensions (iterable of str) –
- Dimensions for each variable in the returned dataset. If an “_id” co-ordinate is
required (see
extras
documentation for when “_id” is required) and is not included indimensions
then it will be the last dimension (or second last dimension if “time” is also not included indimensions
). If “time” is not included indimensions
it will be the last dimension.
extras (iterable of str) –
Columns in
self.meta
from which to create “non-dimension co-ordinates” (see xarray terminology for more details). These non-dimension co-ordinates store extra information and can be mapped to each timeseries found in the data variables of the outputxarray.Dataset
. Where possible, these non-dimension co-ordinates will use dimension co-ordinates as their own co-ordinates. However, if the metadata inextras
is not defined by a single dimension indimensions
, then theextras
co-ordinates will have dimensions of “_id”. This “_id” co-ordinate maps the values in theextras
co-ordinates to each timeseries in the serialised dataset. Where “_id” is required, an extra “_id” dimension will also be added todimensions
.unify_units (bool) – If a given variable has multiple units, should we attempt to unify them?
- Returns
Data in self, re-formatted as an
xarray.Dataset
- Return type
- Raises
ValueError – If a variable has multiple units and
unify_units
isFalse
.ValueError – If a variable has multiple units which are not able to be converted to a common unit because they have different base units.
- property values: ndarray
Timeseries values without metadata
The values are returned such that each row is a different timeseries being a row and each column is a different time (although no time information is included as a plain
numpy.ndarray
is returned).- Returns
The array in the same shape as
ScmRun.shape()
, that is(num_timeseries, num_timesteps)
.- Return type
np.ndarray
- scmdata.run.run_append(runs: List[BaseScmRun], inplace: bool = False, duplicate_msg: Union[str, bool] = True, metadata: Optional[Dict[str, Union[str, int, float]]] = None) Optional[BaseScmRun] [source]
Append together many objects.
When appending many objects, it may be more efficient to call this routine once with a list of
ScmRun
’s, than usingScmRun.append()
multiple times.If timeseries with duplicate metadata are found, the timeseries are appended and values falling on the same timestep are averaged if
duplicate_msg
is not “return”. Ifduplicate_msg
is “return”, then the result will contain the duplicated timeseries for further inspection.>>> res = base.append(other, duplicate_msg="return") <scmdata.ScmRun (timeseries: 5, timepoints: 3)> Time: Start: 2005-01-01T00:00:00 End: 2015-06-12T00:00:00 Meta: scenario variable model climate_model region unit 0 a_scenario Primary Energy a_iam a_model World EJ/yr 1 a_scenario Primary Energy|Coal a_iam a_model World EJ/yr 2 a_scenario2 Primary Energy a_iam a_model World EJ/yr 3 a_scenario3 Primary Energy a_iam a_model World EJ/yr 4 a_scenario Primary Energy a_iam a_model World EJ/yr >>> ts = res.timeseries(check_duplicated=False) >>> ts[ts.index.duplicated(keep=False)] time 2005-01-01 ... 2015-06-12 scenario variable model climate_model region unit ... a_scenario Primary Energy a_iam a_model World EJ/yr 1.0 ... 7.0 EJ/yr -1.0 ... 1.0
- Parameters
runs (list of
ScmRun
) – The runs to append. Values will be attempted to be cast toScmRun
.inplace – If
True
, then the operation updates the first item inruns
and returnsNone
.duplicate_msg – If
True
, raise aNonUniqueMetadataError
error so the user can see the duplicate timeseries. IfFalse
, take the average and do not raise a warning or error. If"warn"
, raise a warning if duplicate data is detected.metadata – If not
None
, override the metadata of the resultingScmRun
withmetadata
. Otherwise, the metadata for the runs are merged. In the case where there are duplicate metadata keys, the values from the first run are used.
- Returns
If not
inplace
, the return value is the object containing the merged data. The resultant class will be determined by the type of the first object.- Return type
- Raises
TypeError – If
inplace
isTrue
but the first element indfs
is not an instance ofScmRun
runs
argument is not a listValueError –
duplicate_msg
option is not recognised. No runs are provided to be appended