scmdata.groupby

Functionality for grouping and filtering ScmRun objects

RunGroupBy

class RunGroupBy(run, groups, na_fill_value=-10000)[source]

Bases: ImplementsArrayReduce, Generic[GenericRun]

GroupBy object specialized to grouping ScmRun objects

all(dim=None, axis=None, **kwargs)

Reduce this RunGroupBy’s data by applying all along some dimension(s).

Parameters:

dim (str or sequence of str, optional) – Dimension(s) over which to apply all.
axis (int or sequence of int, optional) – Axis(es) over which to apply all. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then all is calculated over axes.
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating all on this object’s data.

Returns:

reduced (RunGroupBy) – New RunGroupBy object with all applied to its data and the indicated dimension(s) removed.

any(dim=None, axis=None, **kwargs)

Reduce this RunGroupBy’s data by applying any along some dimension(s).

Parameters:

dim (str or sequence of str, optional) – Dimension(s) over which to apply any.
axis (int or sequence of int, optional) – Axis(es) over which to apply any. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then any is calculated over axes.
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating any on this object’s data.

Returns:

reduced (RunGroupBy) – New RunGroupBy object with any applied to its data and the indicated dimension(s) removed.

apply(func, *args, **kwargs)[source]

Apply a function to each group and append the results

func is called like func(ar, *args, **kwargs) for each ScmRun group. If the result of this function call is None, than it is excluded from the results.

The results are appended together using run_append(). The function can change the size of the input ScmRun as long as run_append() can be applied to all results.

Examples

>>> from scmdata import ScmRun
>>> def show_var_and_convert_unit(arr: scmdata.ScmRun) -> None:
...     variable = arr.get_unique_meta("variable", True)
...     unit = arr.get_unique_meta("unit", True)
...     print(f"{variable}'s original unit was {unit}")
...
...     return arr.convert_unit("MtC")

>>> df = ScmRun(
...     data=[[1, 2], [3, 4]],
...     index=[2010, 2020],
...     columns={
...         "variable": ["v1", "v2"],
...         "model": "model",
...         "scenario": "scenario",
...         "region": "World",
...         "unit": ["tC", "GtC"],
...     },
... )
>>> df.groupby("variable").apply(show_var_and_convert_unit)
v1's original unit was tC
v2's original unit was GtC
<ScmRun (timeseries: 2, timepoints: 2)>
Time:
    Start: 2010-01-01T00:00:00
    End: 2020-01-01T00:00:00
Meta:
       model region  scenario unit variable
    0  model  World  scenario  MtC       v1
    1  model  World  scenario  MtC       v2

Parameters:

func (Callable[Concatenate[GenericRun, P], GenericRun | (pd.DataFrame | None)]) – Callable to apply to each group.
*args (P.args) – Positional arguments passed to func.
**kwargs (P.kwargs) – Keyword arguments passed to func.

Returns:

GenericRun – The result of applying and combining.

apply_parallel(func, parallel_processor=None, *args, **kwargs)[source]

Apply a function to each group in parallel and append the results

Provides the same functionality as apply() except that parallel processing can be used via the parallel_processor argument. By default, joblib is used to apply func to each group in parallel. This can be slower than using apply() for small numbers of groups or in the case where func is fast as there is overhead setting up the processing pool.

get_joblib_parallel_processor

get_joblib_parallel_processor(n_jobs=-1, backend='loky', *args, **kwargs)[source]

Get parallel processor using joblib as the backend.

Parameters:

n_jobs (int) – Number of jobs to run in parallel. If -1 all CPUs are used.
backend (str) – Backend used for parallelisation. Defaults to ‘loky’ which uses separate processes for each worker. See joblib.Parallel for a more complete description of the available options.
*args (typing.Any) – Passed to initialiser of joblib.Parallel
**kwargs (typing.Any) – Passed to initialiser of joblib.Parallel

Returns:

typing.Callable[[typing.Callable[[typing.TypeVar(RunLike, bound= scmdata.run.BaseScmRun), typing.ParamSpec(Q)], typing.Union[typing.TypeVar(RunLike, bound= scmdata.run.BaseScmRun), pandas.core.frame.DataFrame, None]], collections.abc.Iterable[typing.TypeVar(RunLike, bound= scmdata.run.BaseScmRun)], typing.ParamSpec(Q)], collections.abc.Iterable[typing.Union[typing.TypeVar(RunLike, bound= scmdata.run.BaseScmRun), pandas.core.frame.DataFrame, None]]] – Function that can be used for parallel processing in RunGroupBy.apply_parallel()