scmdata.groupby¶

Functionality for grouping and filtering ScmRun objects

class scmdata.groupby.RunGroupBy(run, groups)[source]¶

Bases: scmdata.groupby._GroupBy

GroupBy object specialized to grouping ScmRun objects

all(dim=None, axis=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying all along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply all.
axis (int or sequence of int, optional) – Axis(es) over which to apply all. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then all is calculated over axes.
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating all on this object’s data.

Returns

reduced – New RunGroupBy object with all applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

any(dim=None, axis=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying any along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply any.
axis (int or sequence of int, optional) – Axis(es) over which to apply any. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then any is calculated over axes.
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating any on this object’s data.

Returns

reduced – New RunGroupBy object with any applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

count(dim=None, axis=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying count along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply count.
axis (int or sequence of int, optional) – Axis(es) over which to apply count. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then count is calculated over axes.
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating count on this object’s data.

Returns

reduced – New RunGroupBy object with count applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

map(func, *args, **kwargs)[source]¶

Apply a function to each group and append the results

func is called like func(ar, *args, **kwargs) for each ScmRun ar in this group. If the result of this function call is None, than it is excluded from the results.

The results are appended together using run_append(). The function can change the size of the input ScmRun as long as run_append() can be applied to all results.

Examples

>>> def write_csv(arr):
...     variable = arr.get_unique_meta("variable")
...     arr.to_csv("out-{}.csv".format(variable)
>>> df.groupby("variable").map(write_csv)

Parameters

func (function) – Callable to apply to each timeseries.
*args – Positional arguments passed to func.
**kwargs – Used to call func(ar, **kwargs) for each array ar.

Returns

applied – The result of splitting, applying and combining this array.

Return type

ScmRun

max(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying max along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply max.
axis (int or sequence of int, optional) – Axis(es) over which to apply max. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then max is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating max on this object’s data.

Returns

reduced – New RunGroupBy object with max applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

mean(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying mean along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply mean.
axis (int or sequence of int, optional) – Axis(es) over which to apply mean. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then mean is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating mean on this object’s data.

Returns

reduced – New RunGroupBy object with mean applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

median(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying median along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply median.
axis (int or sequence of int, optional) – Axis(es) over which to apply median. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then median is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating median on this object’s data.

Returns

reduced – New RunGroupBy object with median applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

min(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying min along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply min.
axis (int or sequence of int, optional) – Axis(es) over which to apply min. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then min is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating min on this object’s data.

Returns

reduced – New RunGroupBy object with min applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

prod(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying prod along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply prod.
axis (int or sequence of int, optional) – Axis(es) over which to apply prod. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then prod is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
min_count (int, default: None) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. Only used if skipna is set to True or defaults to True for the array’s dtype. New in version 0.10.8: Added with the default being None. Changed in version 0.17.0: if specified on an integer array and skipna=True, the result will be a float array.
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating prod on this object’s data.

Returns

reduced – New RunGroupBy object with prod applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

reduce(func, dim=None, axis=None, **kwargs)[source]¶

Reduce the items in this group by applying func along some dimension(s).

Parameters

func (function) – Function which can be called in the form func(x, axis=axis, **kwargs) to return the result of collapsing an np.ndarray over an integer valued axis.
dim (…, str or sequence of str, optional) – Not used in this implementation
axis (int or sequence of int, optional) – Axis(es) over which to apply func. Only one of the ‘dimension’ and ‘axis’ arguments can be supplied. If neither are supplied, then func is calculated over all dimension for each group item.
**kwargs (dict) – Additional keyword arguments passed on to func.

Returns

reduced – Array with summarized data and the indicated dimension(s) removed.

Return type

ScmRun

std(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying std along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply std.
axis (int or sequence of int, optional) – Axis(es) over which to apply std. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then std is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating std on this object’s data.

Returns

reduced – New RunGroupBy object with std applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

sum(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying sum along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply sum.
axis (int or sequence of int, optional) – Axis(es) over which to apply sum. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then sum is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
min_count (int, default: None) – The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. Only used if skipna is set to True or defaults to True for the array’s dtype. New in version 0.10.8: Added with the default being None. Changed in version 0.17.0: if specified on an integer array and skipna=True, the result will be a float array.
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating sum on this object’s data.

Returns

reduced – New RunGroupBy object with sum applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy

var(dim=None, axis=None, skipna=None, **kwargs)¶

Reduce this RunGroupBy’s data by applying var along some dimension(s).

Parameters

dim (str or sequence of str, optional) – Dimension(s) over which to apply var.
axis (int or sequence of int, optional) – Axis(es) over which to apply var. Only one of the ‘dim’ and ‘axis’ arguments can be supplied. If neither are supplied, then var is calculated over axes.
skipna (bool, optional) – If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).
keep_attrs (bool, optional) – If True, the attributes (attrs) will be copied from the original object to the new one. If False (default), the new object will be returned without attributes.
**kwargs (dict) – Additional keyword arguments passed on to the appropriate array function for calculating var on this object’s data.

Returns

reduced – New RunGroupBy object with var applied to its data and the indicated dimension(s) removed.

Return type

RunGroupBy