ScmRun
Suggestions for update: add examples of handling of timeseries interpolation plus how the guessing works
In this notebook we provide an overview of the capabilities provided by scmdata’s ScmRun
class.
ScmRun
provides a efficient interface to analyse timeseries data.
Imports
import traceback
import numpy as np
from openscm_units import unit_registry as ur
from pint.errors import DimensionalityError
from scmdata import ScmRun
from scmdata.errors import NonUniqueMetadataError
Loading data
ScmRun
’s can read many different data types and be loaded in many different ways.
For a full explanation, see the docstring of ScmRun
’s __init__
method.
print(ScmRun.__init__.__doc__)
Initialize the container with timeseries data.
Parameters
----------
data: Union[ScmRun, IamDataFrame, pd.DataFrame, np.ndarray, str, pathlib.Path]
If a :class:`ScmRun <scmdata.run.ScmRun>` object is provided, then a new
:class:`ScmRun <scmdata.run.ScmRun>` is created with a copy of the values and metadata from :obj:
`data`.
A :class:`pandas.DataFrame` with IAMC-format data columns (the result from
:func:`ScmRun.timeseries()`) can be provided without any additional
:obj:`columns` and :obj:`index` information.
If a numpy array of timeseries data is provided, :obj:`columns` and
:obj:`index` must also be specified. The shape of the numpy array should be
``(n_times, n_series)`` where `n_times` is the number of timesteps and
`n_series` is the number of time series.
If a string or :class:`pathlib.Path` is passed, data will be attempted to be
read from file.
Currently, reading from CSV, gzipped CSV and Excel formatted files is
supported. The string could be a URL in a format handled by pandas.
Valid URL schemes include http, ftp, s3, gs, and file if pandas>1.2
is used. For more information about the remote formats that can be read,
see the ``pd.read_csv`` documentation for the version of pandas
which is installed.
If no data is provided than an empty :class:`ScmRun <scmdata.run.ScmRun>`
object is created.
index: np.ndarray
If :obj:`index` is not ``None``, then the :obj:`index` is used as the timesteps
for run. All timeseries in the run use the same set of timesteps.
The values will be attempted to be converted to :class:`numpy.datetime[s]` values.
Possible input formats include :
* :class:`datetime.datetime`
* :obj:`int` Start of year
* :obj:`float` Decimal year
* :obj:`str` Uses :func:`dateutil.parser`. Slow and should be avoided if possible
If :obj:`index` is ``None``, than the time index will be obtained from the
:obj:`data` if possible.
columns
If None, ScmRun will attempt to infer the values from the source.
Otherwise, use this dict to write the metadata for each timeseries in data.
For each metadata key (e.g. "model", "scenario"), an array of values (one
per time series) is expected. Alternatively, providing a list of length 1
applies the same value to all timeseries in data. For example, if you had
three timeseries from 'rcp26' for 3 different models 'model', 'model2' and
'model3', the column dict would look like either 'col_1' or 'col_2':
.. code:: python
>>> d = [[1, 2, 3]]
>>> index = [2010]
>>> col_1 = {
... "scenario": ["rcp26"],
... "model": ["model1", "model2", "model3"],
... "region": ["unspecified"],
... "variable": ["unspecified"],
... "unit": ["unspecified"],
... }
>>> single_value_init = ScmRun(d, index, columns=col_1)
>>> col_2 = {
... "scenario": ["rcp26", "rcp26", "rcp26"],
... "model": ["model1", "model2", "model3"],
... "region": ["unspecified"],
... "variable": ["unspecified"],
... "unit": ["unspecified"],
... }
>>> multi_value_init = ScmRun(d, index, columns=col_2)
>>> pd.testing.assert_frame_equal(
... single_value_init.meta, multi_value_init.meta
... )
metadata:
Optional dictionary of metadata for instance as a whole.
This can be used to store information such as the longer-form information
about a particular dataset, for example, dataset description or DOIs.
Defaults to an empty :obj:`dict` if no default metadata are provided.
copy_data: bool
If True, an explicit copy of data is performed.
.. note::
The copy can be very expensive on large timeseries and should only be needed
in cases where the original data is manipulated.
**kwargs:
Additional parameters passed to :func:`_read_file` to read files
Raises
------
ValueError
* If you try to load from multiple files at once. If you wish to do this,
please use :func:`scmdata.run.run_append` instead.
* Not specifying :obj:`index` and :obj:`columns` if :obj:`data` is a
:class:`numpy.ndarray`
:class:`scmdata.errors.MissingRequiredColumn`
If metadata for :attr:`required_cols` is not found
TypeError
Timeseries cannot be read from :obj:`data`
Here we load data from a file.
Note: here we load RCP26 emissions data. This originally came from http://www.pik-potsdam.de/~mmalte/rcps/ and has since been re-written into a format which can be read by scmdata using the pymagicc library. We are not currently planning on importing Pymagicc’s readers into scmdata by default, please raise an issue here if you would like us to consider doing so.
rcp26 = ScmRun("rcp26_emissions.csv", lowercase_cols=True)
Timeseries
ScmDataFrame
is ideally suited to working with timeseries data.
The timeseries
method allows you to easily get the data back in wide format as a pandas
DataFrame
.
Here ‘wide’ format refers to representing timeseries as a row with metadata being contained in the
row labels.
rcp26.timeseries().head()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Mt BC / yr | Emissions|BC | 0.000000 | 0.106998 | 0.133383 | 0.159847 | 0.186393 | 0.213024 | 0.239742 | 0.266550 | 0.293450 | 0.320446 | ... | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 |
kt C2F6 / yr | Emissions|C2F6 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | |||
kt C6F14 / yr | Emissions|C6F14 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | |||
kt CCl4 / yr | Emissions|CCl4 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | |||
kt CF4 / yr | Emissions|CF4 | 0.010763 | 0.010752 | 0.010748 | 0.010744 | 0.010740 | 0.010736 | 0.010731 | 0.010727 | 0.010723 | 0.010719 | ... | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 |
5 rows × 736 columns
type(rcp26.timeseries())
pandas.core.frame.DataFrame
Operations with scalars
Basic operations with scalars are easily performed.
rcp26.head()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Mt BC / yr | Emissions|BC | 0.000000 | 0.106998 | 0.133383 | 0.159847 | 0.186393 | 0.213024 | 0.239742 | 0.266550 | 0.293450 | 0.320446 | ... | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 |
kt C2F6 / yr | Emissions|C2F6 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | 0.0857 | |||
kt C6F14 / yr | Emissions|C6F14 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | 0.0887 | |||
kt CCl4 / yr | Emissions|CCl4 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | |||
kt CF4 / yr | Emissions|CF4 | 0.010763 | 0.010752 | 0.010748 | 0.010744 | 0.010740 | 0.010736 | 0.010731 | 0.010727 | 0.010723 | 0.010719 | ... | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 | 1.0920 |
5 rows × 736 columns
(rcp26 + 2).head()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Gt C / yr | Emissions|CO2|MAGICC AFOLU | 2.000 | 2.005338 | 2.010677 | 2.016015 | 2.021353 | 2.026691 | 2.032030 | 2.037368 | 2.042706 | 2.048045 | ... | 2.0000 | 2.0000 | 2.0000 | 2.0000 | 2.0000 | 2.0000 | 2.0000 | 2.0000 | 2.0000 | 2.0000 |
Emissions|CO2|MAGICC Fossil and Industrial | 2.003 | 2.003000 | 2.003000 | 2.003000 | 2.003000 | 2.003000 | 2.004000 | 2.004000 | 2.004000 | 2.004000 | ... | 1.0692 | 1.0692 | 1.0692 | 1.0692 | 1.0692 | 1.0692 | 1.0692 | 1.0692 | 1.0692 | 1.0692 | ||||
Mt BC / yr | Emissions|BC | 2.000 | 2.106998 | 2.133383 | 2.159847 | 2.186393 | 2.213024 | 2.239742 | 2.266550 | 2.293450 | 2.320446 | ... | 5.3578 | 5.3578 | 5.3578 | 5.3578 | 5.3578 | 5.3578 | 5.3578 | 5.3578 | 5.3578 | 5.3578 | |||
Mt CH4 / yr | Emissions|CH4 | 2.000 | 3.963262 | 4.436448 | 4.911105 | 5.387278 | 5.865015 | 6.344362 | 6.825372 | 7.308094 | 7.792582 | ... | 144.0527 | 144.0527 | 144.0527 | 144.0527 | 144.0527 | 144.0527 | 144.0527 | 144.0527 | 144.0527 | 144.0527 | |||
Mt CO / yr | Emissions|CO | 2.000 | 11.050221 | 14.960844 | 18.876539 | 22.797465 | 26.723782 | 30.655658 | 34.593264 | 38.536778 | 42.486382 | ... | 609.8438 | 609.8438 | 609.8438 | 609.8438 | 609.8438 | 609.8438 | 609.8438 | 609.8438 | 609.8438 | 609.8438 |
5 rows × 736 columns
(rcp26 / 4).head()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Gt C / yr | Emissions|CO2|MAGICC AFOLU | 0.00000 | 0.001335 | 0.002669 | 0.004004 | 0.005338 | 0.006673 | 0.008007 | 0.009342 | 0.010677 | 0.012011 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Emissions|CO2|MAGICC Fossil and Industrial | 0.00075 | 0.000750 | 0.000750 | 0.000750 | 0.000750 | 0.000750 | 0.001000 | 0.001000 | 0.001000 | 0.001000 | ... | -0.232700 | -0.232700 | -0.232700 | -0.232700 | -0.232700 | -0.232700 | -0.232700 | -0.232700 | -0.232700 | -0.232700 | ||||
Mt BC / yr | Emissions|BC | 0.00000 | 0.026749 | 0.033346 | 0.039962 | 0.046598 | 0.053256 | 0.059935 | 0.066637 | 0.073362 | 0.080112 | ... | 0.839450 | 0.839450 | 0.839450 | 0.839450 | 0.839450 | 0.839450 | 0.839450 | 0.839450 | 0.839450 | 0.839450 | |||
Mt CH4 / yr | Emissions|CH4 | 0.00000 | 0.490815 | 0.609112 | 0.727776 | 0.846820 | 0.966254 | 1.086091 | 1.206343 | 1.327023 | 1.448145 | ... | 35.513175 | 35.513175 | 35.513175 | 35.513175 | 35.513175 | 35.513175 | 35.513175 | 35.513175 | 35.513175 | 35.513175 | |||
Mt CO / yr | Emissions|CO | 0.00000 | 2.262555 | 3.240211 | 4.219135 | 5.199366 | 6.180945 | 7.163915 | 8.148316 | 9.134195 | 10.121595 | ... | 151.960950 | 151.960950 | 151.960950 | 151.960950 | 151.960950 | 151.960950 | 151.960950 | 151.960950 | 151.960950 | 151.960950 |
5 rows × 736 columns
ScmRun
instances also support operations with Pint scalars,
permitting automatic unit conversion and error raising. For interested readers, the scmdata
package uses the OpenSCM-Units unit registry.
to_add = 500 * ur("MtCO2 / yr")
If we try to add 0.5 GtC / yr to all the timeseries, we’ll get a DimensionalityError
.
try:
rcp26 + to_add
except DimensionalityError:
traceback.print_exc(limit=0, chain=False)
pint.errors.DimensionalityError: Cannot convert from 'BC * megametric_ton / a' ([black_carbon] * [mass] / [time]) to 'megatCO2 / a' ([carbon] * [mass] / [time])
However, if we filter things correctly, this operation is perfectly valid.
(rcp26.filter(variable="Emissions|CO2|MAGICC AFOLU") + to_add).head()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | C * gigametric_ton / a | Emissions|CO2|MAGICC AFOLU | 0.136364 | 0.141702 | 0.14704 | 0.152379 | 0.157717 | 0.163055 | 0.168393 | 0.173732 | 0.17907 | 0.184408 | ... | 0.136364 | 0.136364 | 0.136364 | 0.136364 | 0.136364 | 0.136364 | 0.136364 | 0.136364 | 0.136364 | 0.136364 |
1 rows × 736 columns
This can be compared to the raw data as shown below.
rcp26.filter(variable="Emissions|CO2|MAGICC AFOLU").head()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Gt C / yr | Emissions|CO2|MAGICC AFOLU | 0.0 | 0.005338 | 0.010677 | 0.016015 | 0.021353 | 0.026691 | 0.03203 | 0.037368 | 0.042706 | 0.048045 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 rows × 736 columns
Unit conversion
The scmdata package uses the OpenSCM-Units unit registry and uses the Pint library to handle unit conversion.
Calling the convert_unit
method of an ScmRun
returns a new ScmRun
instance with converted
units.
rcp26.filter(variable="Emissions|BC").timeseries()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Mt BC / yr | Emissions|BC | 0.0 | 0.106998 | 0.133383 | 0.159847 | 0.186393 | 0.213024 | 0.239742 | 0.26655 | 0.29345 | 0.320446 | ... | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 | 3.3578 |
1 rows × 736 columns
rcp26.filter(variable="Emissions|BC").convert_unit("kg BC / day").timeseries()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | kg BC / day | Emissions|BC | 0.0 | 292944.558522 | 365181.6564 | 437636.605065 | 510316.112252 | 583227.186858 | 656376.947296 | 729772.785763 | 803422.313484 | 877333.360712 | ... | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 | 9.193155e+06 |
1 rows × 736 columns
Note that you must filter your data first as the unit conversion is applied to all available
variables. If you do not, you will receive DimensionalityError
’s.
try:
rcp26.convert_unit("kg BC / day").timeseries()
except DimensionalityError:
traceback.print_exc(limit=0, chain=False)
pint.errors.DimensionalityError: Cannot convert from 'C * gigametric_ton / a' ([carbon] * [mass] / [time]) to 'BC * kilogram / day' ([black_carbon] * [mass] / [time])
Having said this, thanks to Pint’s idea of contexts, we are able to trivially convert to CO2 equivalent units (as long as we restrict our conversion to variables which have a CO2 equivalent).
rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).timeseries()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Mt CH4 / yr | Emissions|CH4 | 0.000 | 1.963262 | 2.436448 | 2.911105 | 3.387278 | 3.865015 | 4.344362 | 4.825372 | 5.308094 | 5.792582 | ... | 142.0527 | 142.0527 | 142.0527 | 142.0527 | 142.0527 | 142.0527 | 142.0527 | 142.0527 | 142.0527 | 142.0527 |
Gt C / yr | Emissions|CO2|MAGICC AFOLU | 0.000 | 0.005338 | 0.010677 | 0.016015 | 0.021353 | 0.026691 | 0.032030 | 0.037368 | 0.042706 | 0.048045 | ... | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | |||
Emissions|CO2|MAGICC Fossil and Industrial | 0.003 | 0.003000 | 0.003000 | 0.003000 | 0.003000 | 0.003000 | 0.004000 | 0.004000 | 0.004000 | 0.004000 | ... | -0.9308 | -0.9308 | -0.9308 | -0.9308 | -0.9308 | -0.9308 | -0.9308 | -0.9308 | -0.9308 | -0.9308 | ||||
Mt N2ON / yr | Emissions|N2O | 0.000 | 0.005191 | 0.010117 | 0.015043 | 0.019969 | 0.024896 | 0.029822 | 0.034750 | 0.039677 | 0.044605 | ... | 5.2823 | 5.2823 | 5.2823 | 5.2823 | 5.2823 | 5.2823 | 5.2823 | 5.2823 | 5.2823 | 5.2823 |
4 rows × 736 columns
rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).convert_unit(
"Mt CO2 / yr", context="AR4GWP100"
).timeseries()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | unit_context | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Mt CO2 / yr | AR4GWP100 | Emissions|CO2|MAGICC AFOLU | 0.0 | 19.573753 | 39.147508 | 58.721260 | 78.295012 | 97.868767 | 117.442519 | 137.016271 | 156.590027 | 176.163779 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Emissions|CO2|MAGICC Fossil and Industrial | 11.0 | 11.000000 | 11.000000 | 11.000000 | 11.000000 | 11.000000 | 14.666666 | 14.666666 | 14.666666 | 14.666666 | ... | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | |||||
Emissions|CH4 | 0.0 | 49.081547 | 60.911202 | 72.777625 | 84.681955 | 96.625365 | 108.609062 | 120.634295 | 132.702345 | 144.814540 | ... | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | |||||
Emissions|N2O | 0.0 | 2.430911 | 4.737559 | 7.044330 | 9.351227 | 11.658254 | 13.965417 | 16.272717 | 18.580161 | 20.887751 | ... | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 |
4 rows × 736 columns
Without the context, a DimensionalityError
is once again raised.
try:
rcp26.convert_unit("Mt CO2 / yr").timeseries()
except DimensionalityError:
traceback.print_exc(limit=0, chain=False)
pint.errors.DimensionalityError: Cannot convert from 'BC * megametric_ton / a' ([black_carbon] * [mass] / [time]) to 'CO2 * megametric_ton / a' ([carbon] * [mass] / [time])
In addition, when we do a conversion with contexts, the context information is automatically added to the metadata. This ensures we can’t accidentally use a different context for further conversions.
ar4gwp100_converted = rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).convert_unit(
"Mt CO2 / yr", context="AR4GWP100"
)
ar4gwp100_converted.timeseries()
time | 1765-01-01 00:00:00 | 1766-01-01 00:00:00 | 1767-01-01 00:00:00 | 1768-01-01 00:00:00 | 1769-01-01 00:00:00 | 1770-01-01 00:00:00 | 1771-01-01 00:00:00 | 1772-01-01 00:00:00 | 1773-01-01 00:00:00 | 1774-01-01 00:00:00 | ... | 2491-01-01 00:00:00 | 2492-01-01 00:00:00 | 2493-01-01 00:00:00 | 2494-01-01 00:00:00 | 2495-01-01 00:00:00 | 2496-01-01 00:00:00 | 2497-01-01 00:00:00 | 2498-01-01 00:00:00 | 2499-01-01 00:00:00 | 2500-01-01 00:00:00 | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model | region | scenario | unit | unit_context | variable | |||||||||||||||||||||
IMAGE | World | RCP26 | Mt CO2 / yr | AR4GWP100 | Emissions|CO2|MAGICC AFOLU | 0.0 | 19.573753 | 39.147508 | 58.721260 | 78.295012 | 97.868767 | 117.442519 | 137.016271 | 156.590027 | 176.163779 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
Emissions|CO2|MAGICC Fossil and Industrial | 11.0 | 11.000000 | 11.000000 | 11.000000 | 11.000000 | 11.000000 | 14.666666 | 14.666666 | 14.666666 | 14.666666 | ... | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | -3412.933333 | |||||
Emissions|CH4 | 0.0 | 49.081547 | 60.911202 | 72.777625 | 84.681955 | 96.625365 | 108.609062 | 120.634295 | 132.702345 | 144.814540 | ... | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | 3551.317500 | |||||
Emissions|N2O | 0.0 | 2.430911 | 4.737559 | 7.044330 | 9.351227 | 11.658254 | 13.965417 | 16.272717 | 18.580161 | 20.887751 | ... | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 | 2473.625629 |
4 rows × 736 columns
Trying to convert without a context, or with a different context, raises an error.
try:
ar4gwp100_converted.convert_unit("Mt CO2 / yr")
except ValueError:
traceback.print_exc(limit=0, chain=False)
ValueError: Existing unit conversion context(s), `['AR4GWP100']`, doesn't match input context, `None`, drop `unit_context` metadata before doing conversion
try:
ar4gwp100_converted.convert_unit("Mt CO2 / yr", context="AR5GWP100")
except ValueError:
traceback.print_exc(limit=0, chain=False)
ValueError: Existing unit conversion context(s), `['AR4GWP100']`, doesn't match input context, `AR5GWP100`, drop `unit_context` metadata before doing conversion
Metadata handling
Each timeseries within an ScmRun
object has metadata associated with it. The meta
attribute
provides the Timeseries
specific metadata of the timeseries as a pd.DataFrame
. This DataFrame
is effectively the index
of the ScmRun.timeseries()
function.
This Timeseries
specific metadata can be modified using the []
notation which modify the
metadata inplace or alternatively using the set_meta
function which returns a new ScmRun
with
updated metadata. set_meta
also makes it easy to update a subset of timeseries.
ar4gwp100_converted.meta
model | region | scenario | unit | unit_context | variable | |
---|---|---|---|---|---|---|
0 | IMAGE | World | RCP26 | Mt CO2 / yr | AR4GWP100 | Emissions|CO2|MAGICC AFOLU |
1 | IMAGE | World | RCP26 | Mt CO2 / yr | AR4GWP100 | Emissions|CO2|MAGICC Fossil and Industrial |
2 | IMAGE | World | RCP26 | Mt CO2 / yr | AR4GWP100 | Emissions|CH4 |
3 | IMAGE | World | RCP26 | Mt CO2 / yr | AR4GWP100 | Emissions|N2O |
# Update inplace
ar4gwp100_converted["unit_context"] = "inplace"
ar4gwp100_converted["unit_context"]
0 inplace
1 inplace
2 inplace
3 inplace
Name: unit_context, dtype: object
# set_meta returns a new `ScmRun` with the updated metadata
ar4gwp100_converted.set_meta(
"unit_context", "updated-in-set_meta", variable="Emissions|CO2|*"
)
<ScmRun (timeseries: 4, timepoints: 736)>
Time:
Start: 1765-01-01T00:00:00
End: 2500-01-01T00:00:00
Meta:
model region scenario unit unit_context \
0 IMAGE World RCP26 Mt CO2 / yr updated-in-set_meta
1 IMAGE World RCP26 Mt CO2 / yr updated-in-set_meta
2 IMAGE World RCP26 Mt CO2 / yr inplace
3 IMAGE World RCP26 Mt CO2 / yr inplace
variable
0 Emissions|CO2|MAGICC AFOLU
1 Emissions|CO2|MAGICC Fossil and Industrial
2 Emissions|CH4
3 Emissions|N2O
# The original `ScmRun` was not modified by `set_meta`
ar4gwp100_converted
<ScmRun (timeseries: 4, timepoints: 736)>
Time:
Start: 1765-01-01T00:00:00
End: 2500-01-01T00:00:00
Meta:
model region scenario unit unit_context \
0 IMAGE World RCP26 Mt CO2 / yr inplace
1 IMAGE World RCP26 Mt CO2 / yr inplace
2 IMAGE World RCP26 Mt CO2 / yr inplace
3 IMAGE World RCP26 Mt CO2 / yr inplace
variable
0 Emissions|CO2|MAGICC AFOLU
1 Emissions|CO2|MAGICC Fossil and Industrial
2 Emissions|CH4
3 Emissions|N2O
ScmRun
instances are strict with respect to metadata handling. If you either try to either a)
instantiate an ScmRun
instance with duplicate metadata or b) change an existing ScmRun
instance so that it has duplicate metadata then you will receive a NonUniqueMetadataError
.
try:
ScmRun(
data=np.arange(6).reshape(2, 3),
index=[10, 20],
columns={
"variable": "Emissions",
"unit": "Gt",
"model": "idealised",
"scenario": "idealised",
"region": "World",
},
)
except NonUniqueMetadataError:
traceback.print_exc(limit=0, chain=False)
scmdata.errors.NonUniqueMetadataError: Duplicate metadata (numbers show how many times the given metadata is repeated).
model region scenario unit variable repeats
0 idealised World idealised Gt Emissions 3
try:
rcp26["variable"] = "Emissions|CO2|MAGICC AFOLU"
except NonUniqueMetadataError:
traceback.print_exc(limit=0, chain=False)
scmdata.errors.NonUniqueMetadataError: Duplicate metadata (numbers show how many times the given metadata is repeated).
model region scenario unit variable repeats
0 IMAGE World RCP26 Gt C / yr Emissions|CO2|MAGICC AFOLU 2
4 IMAGE World RCP26 Mt N / yr Emissions|CO2|MAGICC AFOLU 2
There is also a metadata
attribute which provides metadata for the ScmRun
instance.
These metadata can be used to store information about the collection of runs as a whole, such as the file where the data are stored or longer-form information about a particular dataset.
rcp26.metadata["filename"] = "rcp26_emissions.csv"
rcp26.metadata
{'filename': 'rcp26_emissions.csv'}
Convenience methods
Below we showcase a few convenience methods of ScmRun
. These will grow over time, please add a
pull request adding more where they are useful!
get_unique_meta
This method helps with getting the unique metadata values in an ScmRun
. Here we show how it can
be useful. Check out its docstring for full details.
By itself, it doesn’t do anything special, just returns the unique metadata values as a list.
rcp26.get_unique_meta("variable")
['Emissions|CO2|MAGICC AFOLU']
However, it can be useful if you expect there to only be one unique metadata value. In such a
case, you can use the no_duplicates
argument to ensure that you only get a single value as its
native type (not a list) and that an error will be raised if this isn’t the case.
rcp26.get_unique_meta("model", no_duplicates=True)
'IMAGE'
try:
rcp26.get_unique_meta("unit", no_duplicates=True)
except ValueError:
traceback.print_exc(limit=0, chain=False)
ValueError: `unit` column is not unique (found values: ['Mt BC / yr', 'kt C2F6 / yr', 'kt C6F14 / yr', 'kt CCl4 / yr', 'kt CF4 / yr', 'kt CFC11 / yr', 'kt CFC113 / yr', 'kt CFC114 / yr', 'kt CFC115 / yr', 'kt CFC12 / yr', 'kt CH3Br / yr', 'kt CH3CCl3 / yr', 'kt CH3Cl / yr', 'Mt CH4 / yr', 'Mt CO / yr', 'Gt C / yr', 'kt HCFC141b / yr', 'kt HCFC142b / yr', 'kt HCFC22 / yr', 'kt HFC125 / yr', 'kt HFC134a / yr', 'kt HFC143a / yr', 'kt HFC227ea / yr', 'kt HFC23 / yr', 'kt HFC245fa / yr', 'kt HFC32 / yr', 'kt HFC4310 / yr', 'kt Halon1202 / yr', 'kt Halon1211 / yr', 'kt Halon1301 / yr', 'kt Halon2402 / yr', 'Mt N2ON / yr', 'Mt N / yr', 'Mt NMVOC / yr', 'Mt OC / yr', 'kt SF6 / yr', 'Mt S / yr'])