ScmRun

Suggestions for update: add examples of handling of timeseries interpolation plus how the guessing works

In this notebook we provide an overview of the capabilities provided by scmdata’s ScmRun class. ScmRun provides a efficient interface to analyse timeseries data.

Imports

import traceback

import numpy as np
from openscm_units import unit_registry as ur
from pint.errors import DimensionalityError

from scmdata import ScmRun
from scmdata.errors import NonUniqueMetadataError

Loading data

ScmRun’s can read many different data types and be loaded in many different ways. For a full explanation, see the docstring of ScmRun’s __init__ method.

print(ScmRun.__init__.__doc__)
        Initialize the container with timeseries data.

        Parameters
        ----------
        data: Union[ScmRun, IamDataFrame, pd.DataFrame, np.ndarray, str, pathlib.Path]
            If a :class:`ScmRun <scmdata.run.ScmRun>` object is provided, then a new
            :class:`ScmRun <scmdata.run.ScmRun>` is created with a copy of the values and metadata from :obj:
            `data`.

            A :class:`pandas.DataFrame` with IAMC-format data columns (the result from
            :func:`ScmRun.timeseries()`) can be provided without any additional
            :obj:`columns` and :obj:`index` information.

            If a numpy array of timeseries data is provided, :obj:`columns` and
            :obj:`index` must also be specified. The shape of the numpy array should be
            ``(n_times, n_series)`` where `n_times` is the number of timesteps and
            `n_series` is the number of time series.

            If a string or :class:`pathlib.Path` is passed, data will be attempted to be
            read from file.

            Currently, reading from CSV, gzipped CSV and Excel formatted files is
            supported. The string could be a URL in a format handled by pandas.
            Valid URL schemes include http, ftp, s3, gs, and file if pandas>1.2
            is used. For more information about the remote formats that can be read,
            see the ``pd.read_csv`` documentation for the version of pandas
            which is installed.

            If no data is provided than an empty :class:`ScmRun <scmdata.run.ScmRun>`
            object is created.

        index: np.ndarray
            If :obj:`index` is not ``None``, then the :obj:`index` is used as the timesteps
            for run. All timeseries in the run use the same set of timesteps.

            The values will be attempted to be converted to :class:`numpy.datetime[s]` values.
            Possible input formats include :

            * :class:`datetime.datetime`
            * :obj:`int` Start of year
            * :obj:`float` Decimal year
            * :obj:`str` Uses :func:`dateutil.parser`. Slow and should be avoided if possible

            If :obj:`index` is ``None``, than the time index will be obtained from the
            :obj:`data` if possible.

        columns
            If None, ScmRun will attempt to infer the values from the source.
            Otherwise, use this dict to write the metadata for each timeseries in data.
            For each metadata key (e.g. "model", "scenario"), an array of values (one
            per time series) is expected. Alternatively, providing a list of length 1
            applies the same value to all timeseries in data. For example, if you had
            three timeseries from 'rcp26' for 3 different models 'model', 'model2' and
            'model3', the column dict would look like either 'col_1' or 'col_2':

            .. code:: python

                >>> d = [[1, 2, 3]]
                >>> index = [2010]
                >>> col_1 = {
                ...     "scenario": ["rcp26"],
                ...     "model": ["model1", "model2", "model3"],
                ...     "region": ["unspecified"],
                ...     "variable": ["unspecified"],
                ...     "unit": ["unspecified"],
                ... }
                >>> single_value_init = ScmRun(d, index, columns=col_1)
                >>> col_2 = {
                ...     "scenario": ["rcp26", "rcp26", "rcp26"],
                ...     "model": ["model1", "model2", "model3"],
                ...     "region": ["unspecified"],
                ...     "variable": ["unspecified"],
                ...     "unit": ["unspecified"],
                ... }
                >>> multi_value_init = ScmRun(d, index, columns=col_2)
                >>> pd.testing.assert_frame_equal(
                ...     single_value_init.meta, multi_value_init.meta
                ... )

        metadata:
            Optional dictionary of metadata for instance as a whole.

            This can be used to store information such as the longer-form information
            about a particular dataset, for example, dataset description or DOIs.

            Defaults to an empty :obj:`dict` if no default metadata are provided.

        copy_data: bool
            If True, an explicit copy of data is performed.

            .. note::
                The copy can be very expensive on large timeseries and should only be needed
                in cases where the original data is manipulated.

        **kwargs:
            Additional parameters passed to :func:`_read_file` to read files

        Raises
        ------
        ValueError
            * If you try to load from multiple files at once. If you wish to do this,
                please use :func:`scmdata.run.run_append` instead.
            * Not specifying :obj:`index` and :obj:`columns` if :obj:`data` is a
                :class:`numpy.ndarray`

        :class:`scmdata.errors.MissingRequiredColumn`
            If metadata for :attr:`required_cols` is not found

        TypeError
            Timeseries cannot be read from :obj:`data`
        

Here we load data from a file.

Note: here we load RCP26 emissions data. This originally came from http://www.pik-potsdam.de/~mmalte/rcps/ and has since been re-written into a format which can be read by scmdata using the pymagicc library. We are not currently planning on importing Pymagicc’s readers into scmdata by default, please raise an issue here if you would like us to consider doing so.

rcp26 = ScmRun("rcp26_emissions.csv", lowercase_cols=True)

Timeseries

ScmDataFrame is ideally suited to working with timeseries data. The timeseries method allows you to easily get the data back in wide format as a pandas DataFrame. Here ‘wide’ format refers to representing timeseries as a row with metadata being contained in the row labels.

rcp26.timeseries().head()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 Mt BC / yr Emissions|BC 0.000000 0.106998 0.133383 0.159847 0.186393 0.213024 0.239742 0.266550 0.293450 0.320446 ... 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578
kt C2F6 / yr Emissions|C2F6 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857
kt C6F14 / yr Emissions|C6F14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887
kt CCl4 / yr Emissions|CCl4 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
kt CF4 / yr Emissions|CF4 0.010763 0.010752 0.010748 0.010744 0.010740 0.010736 0.010731 0.010727 0.010723 0.010719 ... 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920

5 rows × 736 columns

type(rcp26.timeseries())
pandas.core.frame.DataFrame

Operations with scalars

Basic operations with scalars are easily performed.

rcp26.head()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 Mt BC / yr Emissions|BC 0.000000 0.106998 0.133383 0.159847 0.186393 0.213024 0.239742 0.266550 0.293450 0.320446 ... 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578
kt C2F6 / yr Emissions|C2F6 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857 0.0857
kt C6F14 / yr Emissions|C6F14 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887 0.0887
kt CCl4 / yr Emissions|CCl4 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
kt CF4 / yr Emissions|CF4 0.010763 0.010752 0.010748 0.010744 0.010740 0.010736 0.010731 0.010727 0.010723 0.010719 ... 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920 1.0920

5 rows × 736 columns

(rcp26 + 2).head()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 Gt C / yr Emissions|CO2|MAGICC AFOLU 2.000 2.005338 2.010677 2.016015 2.021353 2.026691 2.032030 2.037368 2.042706 2.048045 ... 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000
Emissions|CO2|MAGICC Fossil and Industrial 2.003 2.003000 2.003000 2.003000 2.003000 2.003000 2.004000 2.004000 2.004000 2.004000 ... 1.0692 1.0692 1.0692 1.0692 1.0692 1.0692 1.0692 1.0692 1.0692 1.0692
Mt BC / yr Emissions|BC 2.000 2.106998 2.133383 2.159847 2.186393 2.213024 2.239742 2.266550 2.293450 2.320446 ... 5.3578 5.3578 5.3578 5.3578 5.3578 5.3578 5.3578 5.3578 5.3578 5.3578
Mt CH4 / yr Emissions|CH4 2.000 3.963262 4.436448 4.911105 5.387278 5.865015 6.344362 6.825372 7.308094 7.792582 ... 144.0527 144.0527 144.0527 144.0527 144.0527 144.0527 144.0527 144.0527 144.0527 144.0527
Mt CO / yr Emissions|CO 2.000 11.050221 14.960844 18.876539 22.797465 26.723782 30.655658 34.593264 38.536778 42.486382 ... 609.8438 609.8438 609.8438 609.8438 609.8438 609.8438 609.8438 609.8438 609.8438 609.8438

5 rows × 736 columns

(rcp26 / 4).head()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 Gt C / yr Emissions|CO2|MAGICC AFOLU 0.00000 0.001335 0.002669 0.004004 0.005338 0.006673 0.008007 0.009342 0.010677 0.012011 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Emissions|CO2|MAGICC Fossil and Industrial 0.00075 0.000750 0.000750 0.000750 0.000750 0.000750 0.001000 0.001000 0.001000 0.001000 ... -0.232700 -0.232700 -0.232700 -0.232700 -0.232700 -0.232700 -0.232700 -0.232700 -0.232700 -0.232700
Mt BC / yr Emissions|BC 0.00000 0.026749 0.033346 0.039962 0.046598 0.053256 0.059935 0.066637 0.073362 0.080112 ... 0.839450 0.839450 0.839450 0.839450 0.839450 0.839450 0.839450 0.839450 0.839450 0.839450
Mt CH4 / yr Emissions|CH4 0.00000 0.490815 0.609112 0.727776 0.846820 0.966254 1.086091 1.206343 1.327023 1.448145 ... 35.513175 35.513175 35.513175 35.513175 35.513175 35.513175 35.513175 35.513175 35.513175 35.513175
Mt CO / yr Emissions|CO 0.00000 2.262555 3.240211 4.219135 5.199366 6.180945 7.163915 8.148316 9.134195 10.121595 ... 151.960950 151.960950 151.960950 151.960950 151.960950 151.960950 151.960950 151.960950 151.960950 151.960950

5 rows × 736 columns

ScmRun instances also support operations with Pint scalars, permitting automatic unit conversion and error raising. For interested readers, the scmdata package uses the OpenSCM-Units unit registry.

to_add = 500 * ur("MtCO2 / yr")

If we try to add 0.5 GtC / yr to all the timeseries, we’ll get a DimensionalityError.

try:
    rcp26 + to_add
except DimensionalityError:
    traceback.print_exc(limit=0, chain=False)
pint.errors.DimensionalityError: Cannot convert from 'BC * megametric_ton / a' ([black_carbon] * [mass] / [time]) to 'megatCO2 / a' ([carbon] * [mass] / [time])

However, if we filter things correctly, this operation is perfectly valid.

(rcp26.filter(variable="Emissions|CO2|MAGICC AFOLU") + to_add).head()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 C * gigametric_ton / a Emissions|CO2|MAGICC AFOLU 0.136364 0.141702 0.14704 0.152379 0.157717 0.163055 0.168393 0.173732 0.17907 0.184408 ... 0.136364 0.136364 0.136364 0.136364 0.136364 0.136364 0.136364 0.136364 0.136364 0.136364

1 rows × 736 columns

This can be compared to the raw data as shown below.

rcp26.filter(variable="Emissions|CO2|MAGICC AFOLU").head()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 Gt C / yr Emissions|CO2|MAGICC AFOLU 0.0 0.005338 0.010677 0.016015 0.021353 0.026691 0.03203 0.037368 0.042706 0.048045 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1 rows × 736 columns

Unit conversion

The scmdata package uses the OpenSCM-Units unit registry and uses the Pint library to handle unit conversion.

Calling the convert_unit method of an ScmRun returns a new ScmRun instance with converted units.

rcp26.filter(variable="Emissions|BC").timeseries()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 Mt BC / yr Emissions|BC 0.0 0.106998 0.133383 0.159847 0.186393 0.213024 0.239742 0.26655 0.29345 0.320446 ... 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578 3.3578

1 rows × 736 columns

rcp26.filter(variable="Emissions|BC").convert_unit("kg BC / day").timeseries()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 kg BC / day Emissions|BC 0.0 292944.558522 365181.6564 437636.605065 510316.112252 583227.186858 656376.947296 729772.785763 803422.313484 877333.360712 ... 9.193155e+06 9.193155e+06 9.193155e+06 9.193155e+06 9.193155e+06 9.193155e+06 9.193155e+06 9.193155e+06 9.193155e+06 9.193155e+06

1 rows × 736 columns

Note that you must filter your data first as the unit conversion is applied to all available variables. If you do not, you will receive DimensionalityError’s.

try:
    rcp26.convert_unit("kg BC / day").timeseries()
except DimensionalityError:
    traceback.print_exc(limit=0, chain=False)
pint.errors.DimensionalityError: Cannot convert from 'C * gigametric_ton / a' ([carbon] * [mass] / [time]) to 'BC * kilogram / day' ([black_carbon] * [mass] / [time])

Having said this, thanks to Pint’s idea of contexts, we are able to trivially convert to CO2 equivalent units (as long as we restrict our conversion to variables which have a CO2 equivalent).

rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).timeseries()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit variable
IMAGE World RCP26 Mt CH4 / yr Emissions|CH4 0.000 1.963262 2.436448 2.911105 3.387278 3.865015 4.344362 4.825372 5.308094 5.792582 ... 142.0527 142.0527 142.0527 142.0527 142.0527 142.0527 142.0527 142.0527 142.0527 142.0527
Gt C / yr Emissions|CO2|MAGICC AFOLU 0.000 0.005338 0.010677 0.016015 0.021353 0.026691 0.032030 0.037368 0.042706 0.048045 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Emissions|CO2|MAGICC Fossil and Industrial 0.003 0.003000 0.003000 0.003000 0.003000 0.003000 0.004000 0.004000 0.004000 0.004000 ... -0.9308 -0.9308 -0.9308 -0.9308 -0.9308 -0.9308 -0.9308 -0.9308 -0.9308 -0.9308
Mt N2ON / yr Emissions|N2O 0.000 0.005191 0.010117 0.015043 0.019969 0.024896 0.029822 0.034750 0.039677 0.044605 ... 5.2823 5.2823 5.2823 5.2823 5.2823 5.2823 5.2823 5.2823 5.2823 5.2823

4 rows × 736 columns

rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).convert_unit(
    "Mt CO2 / yr", context="AR4GWP100"
).timeseries()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit unit_context variable
IMAGE World RCP26 Mt CO2 / yr AR4GWP100 Emissions|CO2|MAGICC AFOLU 0.0 19.573753 39.147508 58.721260 78.295012 97.868767 117.442519 137.016271 156.590027 176.163779 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Emissions|CO2|MAGICC Fossil and Industrial 11.0 11.000000 11.000000 11.000000 11.000000 11.000000 14.666666 14.666666 14.666666 14.666666 ... -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333
Emissions|CH4 0.0 49.081547 60.911202 72.777625 84.681955 96.625365 108.609062 120.634295 132.702345 144.814540 ... 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500
Emissions|N2O 0.0 2.430911 4.737559 7.044330 9.351227 11.658254 13.965417 16.272717 18.580161 20.887751 ... 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629

4 rows × 736 columns

Without the context, a DimensionalityError is once again raised.

try:
    rcp26.convert_unit("Mt CO2 / yr").timeseries()
except DimensionalityError:
    traceback.print_exc(limit=0, chain=False)
pint.errors.DimensionalityError: Cannot convert from 'BC * megametric_ton / a' ([black_carbon] * [mass] / [time]) to 'CO2 * megametric_ton / a' ([carbon] * [mass] / [time])

In addition, when we do a conversion with contexts, the context information is automatically added to the metadata. This ensures we can’t accidentally use a different context for further conversions.

ar4gwp100_converted = rcp26.filter(variable=["*CO2*", "*CH4*", "*N2O*"]).convert_unit(
    "Mt CO2 / yr", context="AR4GWP100"
)
ar4gwp100_converted.timeseries()
time 1765-01-01 00:00:00 1766-01-01 00:00:00 1767-01-01 00:00:00 1768-01-01 00:00:00 1769-01-01 00:00:00 1770-01-01 00:00:00 1771-01-01 00:00:00 1772-01-01 00:00:00 1773-01-01 00:00:00 1774-01-01 00:00:00 ... 2491-01-01 00:00:00 2492-01-01 00:00:00 2493-01-01 00:00:00 2494-01-01 00:00:00 2495-01-01 00:00:00 2496-01-01 00:00:00 2497-01-01 00:00:00 2498-01-01 00:00:00 2499-01-01 00:00:00 2500-01-01 00:00:00
model region scenario unit unit_context variable
IMAGE World RCP26 Mt CO2 / yr AR4GWP100 Emissions|CO2|MAGICC AFOLU 0.0 19.573753 39.147508 58.721260 78.295012 97.868767 117.442519 137.016271 156.590027 176.163779 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
Emissions|CO2|MAGICC Fossil and Industrial 11.0 11.000000 11.000000 11.000000 11.000000 11.000000 14.666666 14.666666 14.666666 14.666666 ... -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333 -3412.933333
Emissions|CH4 0.0 49.081547 60.911202 72.777625 84.681955 96.625365 108.609062 120.634295 132.702345 144.814540 ... 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500 3551.317500
Emissions|N2O 0.0 2.430911 4.737559 7.044330 9.351227 11.658254 13.965417 16.272717 18.580161 20.887751 ... 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629 2473.625629

4 rows × 736 columns

Trying to convert without a context, or with a different context, raises an error.

try:
    ar4gwp100_converted.convert_unit("Mt CO2 / yr")
except ValueError:
    traceback.print_exc(limit=0, chain=False)
ValueError: Existing unit conversion context(s), `['AR4GWP100']`, doesn't match input context, `None`, drop `unit_context` metadata before doing conversion
try:
    ar4gwp100_converted.convert_unit("Mt CO2 / yr", context="AR5GWP100")
except ValueError:
    traceback.print_exc(limit=0, chain=False)
ValueError: Existing unit conversion context(s), `['AR4GWP100']`, doesn't match input context, `AR5GWP100`, drop `unit_context` metadata before doing conversion

Metadata handling

Each timeseries within an ScmRun object has metadata associated with it. The meta attribute provides the Timeseries specific metadata of the timeseries as a pd.DataFrame. This DataFrame is effectively the index of the ScmRun.timeseries() function.

This Timeseries specific metadata can be modified using the [] notation which modify the metadata inplace or alternatively using the set_meta function which returns a new ScmRun with updated metadata. set_meta also makes it easy to update a subset of timeseries.

ar4gwp100_converted.meta
model region scenario unit unit_context variable
0 IMAGE World RCP26 Mt CO2 / yr AR4GWP100 Emissions|CO2|MAGICC AFOLU
1 IMAGE World RCP26 Mt CO2 / yr AR4GWP100 Emissions|CO2|MAGICC Fossil and Industrial
2 IMAGE World RCP26 Mt CO2 / yr AR4GWP100 Emissions|CH4
3 IMAGE World RCP26 Mt CO2 / yr AR4GWP100 Emissions|N2O
# Update inplace
ar4gwp100_converted["unit_context"] = "inplace"
ar4gwp100_converted["unit_context"]
0    inplace
1    inplace
2    inplace
3    inplace
Name: unit_context, dtype: object
# set_meta returns a new `ScmRun` with the updated metadata
ar4gwp100_converted.set_meta(
    "unit_context", "updated-in-set_meta", variable="Emissions|CO2|*"
)
<ScmRun (timeseries: 4, timepoints: 736)>
Time:
	Start: 1765-01-01T00:00:00
	End: 2500-01-01T00:00:00
Meta:
	   model region scenario         unit         unit_context  \
	0  IMAGE  World    RCP26  Mt CO2 / yr  updated-in-set_meta   
	1  IMAGE  World    RCP26  Mt CO2 / yr  updated-in-set_meta   
	2  IMAGE  World    RCP26  Mt CO2 / yr              inplace   
	3  IMAGE  World    RCP26  Mt CO2 / yr              inplace   
	
	                                     variable  
	0                  Emissions|CO2|MAGICC AFOLU  
	1  Emissions|CO2|MAGICC Fossil and Industrial  
	2                               Emissions|CH4  
	3                               Emissions|N2O  
# The original `ScmRun` was not modified by `set_meta`
ar4gwp100_converted
<ScmRun (timeseries: 4, timepoints: 736)>
Time:
	Start: 1765-01-01T00:00:00
	End: 2500-01-01T00:00:00
Meta:
	   model region scenario         unit unit_context  \
	0  IMAGE  World    RCP26  Mt CO2 / yr      inplace   
	1  IMAGE  World    RCP26  Mt CO2 / yr      inplace   
	2  IMAGE  World    RCP26  Mt CO2 / yr      inplace   
	3  IMAGE  World    RCP26  Mt CO2 / yr      inplace   
	
	                                     variable  
	0                  Emissions|CO2|MAGICC AFOLU  
	1  Emissions|CO2|MAGICC Fossil and Industrial  
	2                               Emissions|CH4  
	3                               Emissions|N2O  

ScmRun instances are strict with respect to metadata handling. If you either try to either a) instantiate an ScmRun instance with duplicate metadata or b) change an existing ScmRun instance so that it has duplicate metadata then you will receive a NonUniqueMetadataError.

try:
    ScmRun(
        data=np.arange(6).reshape(2, 3),
        index=[10, 20],
        columns={
            "variable": "Emissions",
            "unit": "Gt",
            "model": "idealised",
            "scenario": "idealised",
            "region": "World",
        },
    )
except NonUniqueMetadataError:
    traceback.print_exc(limit=0, chain=False)
scmdata.errors.NonUniqueMetadataError: Duplicate metadata (numbers show how many times the given metadata is repeated).
       model region   scenario unit   variable  repeats
0  idealised  World  idealised   Gt  Emissions        3
try:
    rcp26["variable"] = "Emissions|CO2|MAGICC AFOLU"
except NonUniqueMetadataError:
    traceback.print_exc(limit=0, chain=False)
scmdata.errors.NonUniqueMetadataError: Duplicate metadata (numbers show how many times the given metadata is repeated).
   model region scenario       unit                    variable  repeats
0  IMAGE  World    RCP26  Gt C / yr  Emissions|CO2|MAGICC AFOLU        2
4  IMAGE  World    RCP26  Mt N / yr  Emissions|CO2|MAGICC AFOLU        2

There is also a metadata attribute which provides metadata for the ScmRun instance.

These metadata can be used to store information about the collection of runs as a whole, such as the file where the data are stored or longer-form information about a particular dataset.

rcp26.metadata["filename"] = "rcp26_emissions.csv"
rcp26.metadata
{'filename': 'rcp26_emissions.csv'}

Convenience methods

Below we showcase a few convenience methods of ScmRun. These will grow over time, please add a pull request adding more where they are useful!

get_unique_meta

This method helps with getting the unique metadata values in an ScmRun. Here we show how it can be useful. Check out its docstring for full details.

By itself, it doesn’t do anything special, just returns the unique metadata values as a list.

rcp26.get_unique_meta("variable")
['Emissions|CO2|MAGICC AFOLU']

However, it can be useful if you expect there to only be one unique metadata value. In such a case, you can use the no_duplicates argument to ensure that you only get a single value as its native type (not a list) and that an error will be raised if this isn’t the case.

rcp26.get_unique_meta("model", no_duplicates=True)
'IMAGE'
try:
    rcp26.get_unique_meta("unit", no_duplicates=True)
except ValueError:
    traceback.print_exc(limit=0, chain=False)
ValueError: `unit` column is not unique (found values: ['Mt BC / yr', 'kt C2F6 / yr', 'kt C6F14 / yr', 'kt CCl4 / yr', 'kt CF4 / yr', 'kt CFC11 / yr', 'kt CFC113 / yr', 'kt CFC114 / yr', 'kt CFC115 / yr', 'kt CFC12 / yr', 'kt CH3Br / yr', 'kt CH3CCl3 / yr', 'kt CH3Cl / yr', 'Mt CH4 / yr', 'Mt CO / yr', 'Gt C / yr', 'kt HCFC141b / yr', 'kt HCFC142b / yr', 'kt HCFC22 / yr', 'kt HFC125 / yr', 'kt HFC134a / yr', 'kt HFC143a / yr', 'kt HFC227ea / yr', 'kt HFC23 / yr', 'kt HFC245fa / yr', 'kt HFC32 / yr', 'kt HFC4310 / yr', 'kt Halon1202 / yr', 'kt Halon1211 / yr', 'kt Halon1301 / yr', 'kt Halon2402 / yr', 'Mt N2ON / yr', 'Mt N / yr', 'Mt NMVOC / yr', 'Mt OC / yr', 'kt SF6 / yr', 'Mt S / yr'])