scmdata.database
Database for handling large datasets in a performant, but flexible way
Data is chunked using unique combinations of metadata. This allows for the database to expand as new data is added without having to change any of the existing data.
Subsets of data are also able to be read without having to load all the data
and then filter. For example, one could save model results from a number of different
climate models and then load just the Surface Temperature
data for all models.
- class scmdata.database.ScmDatabase(root_dir, levels=('climate_model', 'variable', 'region', 'scenario'), backend='netcdf', backend_config=None)[source]
Bases:
object
On-disk database handler for outputs from SCMs
Data is split into groups as specified by
levels
. This allows for fast reading and writing of new subsets of data when a single output file is no longer performant or data cannot all fit in memory.- __init__(root_dir, levels=('climate_model', 'variable', 'region', 'scenario'), backend='netcdf', backend_config=None)[source]
Initialise the database
Note
Creating a new
ScmDatabase
does not modify any existing data on disk. To load an existing database ensure that theroot_dir
.levels
and backend settings are the same as the previous instance.- Parameters:
root_dir (str) – The root directory of the database
Specifies how the runs should be stored on disk.
The data will be grouped by
levels
. These levels should be adapted to best match the input data and desired access pattern. If there are any additional varying dimensions, they will be stored as dimensions.backend (str or
BaseDatabaseBackend
) –Determine the backend to serialize and deserialize data
Defaults to using
NetCDFDatabaseBackend
which reads and writes data as netCDF files. Note that this requires the optional dependency of netCDF4 to be installed.If a custom backend class is being used, it must extend the
BaseDatabaseBackend
class.backend_config (dict) –
Additional configuration to pass to the backend
See the documentation for the target backend to determine which configuration options are available.
- available_data()[source]
Get all the data which is available to be loaded
If metadata includes non-alphanumeric characters then it might appear modified in the returned table. The original metadata values can still be used to filter data.
- Return type:
pd.DataFrame
- delete(**filters)[source]
Delete data from the database
- Parameters:
Filters for the data to load.
Defaults to deleting all data if nothing is specified.
- Raises:
ValueError – If a filter for a level not in
levels
is specified
- load(disable_tqdm=False, **filters)[source]
Load data from the database
- Parameters:
disable_tqdm (bool) – If True, do not show the progress bar
filters (dict of str : [str, list[str]]) –
Filters for the data to load.
Defaults to loading all values for a level if it isn’t specified.
If a filter is a list then OR logic is applied within the level. For example, if we have
scenario=["ssp119", "ssp126"]
then both the ssp119 and ssp126 scenarios will be loaded.
- Returns:
Loaded data
- Return type:
scmdata.ScmRun
- Raises:
ValueError – If a filter for a level not in
levels
is specified If no data matchingfilters
is found
- save(scmrun, disable_tqdm=False)[source]
Save data to the database
The results are saved with one file for each unique combination of
levels
in a directory structure underneathroot_dir
.Use
available_data()
to see what data is available. Subsets of data can then be loaded as anscmdata.ScmRun
usingload()
.- Parameters:
scmrun (
scmdata.ScmRun
) –Data to save.
The timeseries in this run should have valid metadata for each of the columns specified in
levels
.disable_tqdm (bool) – If True, do not show the progress bar
- Raises:
KeyError – If a filter for a level not in
levels
is specified
scmdata.database.backends
Database backends are responsible for the fetching and storage of ScmRun objects. All
backends should be based upon BaseDatabaseBackend
.
- class scmdata.database.backends.BaseDatabaseBackend(**kwargs)[source]
Bases:
ABC
Abstract backend for serialising/deserialising data
Data is stored as objects represented by keys. These keys can be used later to load data.
- class scmdata.database.backends.NetCDFDatabaseBackend(**kwargs)[source]
Bases:
BaseDatabaseBackend
Database backend for handling local files stored as NetCDF
- get_key(sr)[source]
Get key where the data will be stored
The key is the root directory joined with the other information provided. The filepath is also cleaned to remove spaces and special characters.
- Parameters:
sr (
scmdata.ScmRun
) – Data to save- Raises:
ValueError – If non-unique metadata is found for each of
self.kwargs["levels"]
If any metadata end with ‘.’KeyError – If missing metadata is found for each of
self.kwargs["levels"]
- Returns:
Path in which to save the data without spaces or special characters
- Return type:
- save(sr)[source]
Save a ScmRun to the database
The dataset should not contain any duplicate metadata for the database levels
- Parameters:
sr (
scmdata.ScmRun
) – Data to save- Raises:
ValueError – If duplicate metadata are present for the requested database levels
KeyError – If metadata for the requested database levels are not found
- Returns:
Key where the data is saved
- Return type: