scmdata.database¶
Database for handling large datasets in a performant, but flexible way
Data is chunked using unique combinations of metadata. This allows for the database to expand as new data is added without having to change any of the existing data.
Subsets of data are also able to be read without having to load all the data
and then filter. For example, one could save model results from a number of different
climate models and then load just the Surface Temperature
data for all models.
- class scmdata.database.DatabaseBackend(**kwargs)[source]¶
Bases:
abc.ABC
Abstract backend for serialising/deserialising data
Data is stored as objects represented by keys. These keys can be used later to load data.
- abstract get(filters)[source]¶
Get all matching keys for a given filter
- Parameters
filters (dict of str) – String filters If a level is missing then all values are fetched
- Returns
Each item is a key which may contain data which is of interest
- Return type
list of str
- class scmdata.database.NetCDFBackend(**kwargs)[source]¶
Bases:
scmdata.database.DatabaseBackend
On-disk database handler for outputs from SCMs
Data is split into groups as specified by
levels
. This allows for fast reading and writing of new subsets of data when a single output file is no longer performant or data cannot all fit in memory.- get(filters)[source]¶
Get all matching objects for a given filter
- Parameters
filters (dict of str) – String filters If a level is missing then all values are fetched
- Returns
- Return type
list of str
- get_key(sr)[source]¶
Get key where the data will be stored
The key is the root directory joined with the other information provided. The filepath is also cleaned to remove spaces and special characters.
- Parameters
sr (
scmdata.ScmRun
) – Data to save- Raises
ValueError – If non-unique metadata is found for each of
self.kwargs["levels"]
KeyError – If missing metadata is found for each of
self.kwargs["levels"]
- Returns
Path in which to save the data without spaces or special characters
- Return type
- save(sr)[source]¶
Save a ScmRun to the database
The dataset should not contain any duplicate metadata for the database levels
- Parameters
sr (
scmdata.ScmRun
) – Data to save- Raises
ValueError – If duplicate metadata are present for the requested database levels
KeyError – If metadata for the requested database levels are not found
- Returns
Key where the data is saved
- Return type
- class scmdata.database.ScmDatabase(root_dir, levels=('climate_model', 'variable', 'region', 'scenario'), backend='netcdf', backend_config=None)[source]¶
Bases:
object
On-disk database handler for outputs from SCMs
Data is split into groups as specified by
levels
. This allows for fast reading and writing of new subsets of data when a single output file is no longer performant or data cannot all fit in memory.- available_data()[source]¶
Get all the data which is available to be loaded
If metadata includes non-alphanumeric characters then it might appear modified in the returned table. The original metadata values can still be used to filter data.
- Returns
- Return type
pd.DataFrame
- delete(**filters)[source]¶
Delete data from the database
- Parameters
filters (dict of str) –
Filters for the data to load.
Defaults to deleting all data if nothing is specified.
- Raises
ValueError – If a filter for a level not in
levels
is specified
- load(disable_tqdm=False, **filters)[source]¶
Load data from the database
- Parameters
disable_tqdm (bool) – If True, do not show the progress bar
filters (dict of str : [str, list[str]]) –
Filters for the data to load.
Defaults to loading all values for a level if it isn’t specified.
If a filter is a list then OR logic is applied within the level. For example, if we have
scenario=["ssp119", "ssp126"]
then both the ssp119 and ssp126 scenarios will be loaded.
- Returns
Loaded data
- Return type
scmdata.ScmRun
- Raises
ValueError – If a filter for a level not in
levels
is specified If no data matchingfilters
is found
- save(scmrun, disable_tqdm=False)[source]¶
Save data to the database
The results are saved with one file for each unique combination of
levels
in a directory structure underneathroot_dir
.Use
available_data()
to see what data is available. Subsets of data can then be loaded as anscmdata.ScmRun
usingload()
.- Parameters
scmrun (
scmdata.ScmRun
) –Data to save.
The timeseries in this run should have valid metadata for each of the columns specified in
levels
.disable_tqdm (bool) – If True, do not show the progress bar
- Raises
KeyError – If a filter for a level not in
levels
is specified