Python API reference

All described objects can be imported from the zappend.api module.

Function `zappend()`

`zappend.api.zappend(slices, config=None, **kwargs)`

Robustly create or update a Zarr dataset from dataset slices.

The zappend function concatenates the dataset slices from given slices along a given append dimension, e.g., "time" (the default) for geospatial satellite observations. Each append step is atomic, that is, the append operation is a transaction that can be rolled back, in case the append operation fails. This ensures integrity of the target data cube target_dir given in config or kwargs.

Each slice item in slices provides a slice dataset to be appended. The interpretation of a given slice item depends on whether a slice source is configured or not (setting slice_source).

If no slice source is configured, a slice item must be an object of type str, FileObj, xarray.Dataset, or SliceSource. If str or FileObj are used, they are interpreted as local dataset path or dataset URI. If a URI is used, protocol-specific parameters apply, given by the configuration parameter slice_storage_options.

If a slice source is configured, a slice item represents the argument(s) passed to that slice source. Multiple positional arguments can be passed as list, multiple keyword arguments as dict, and both as a tuple of list and dict.

Parameters:

Name	Type	Description	Default
`slices`	`Iterable[Any]`	An iterable that yields slice items.	required
`config`	`ConfigLike`	Processor configuration. Can be a file path or URI, a `dict`, `None`, or a sequence of the aforementioned. If a sequence is used, subsequent configurations are incremental to the previous ones.	`None`
`kwargs`	`Any`	Additional configuration parameters. Can be used to pass or override configuration values in config.	`{}`

Returns:

Type	Description
`int`	The number of slices processed. The value can be useful if the number of items in `slices` is unknown.

Class `SliceSource`

Bases: ABC

Slice source interface definition.

A slice source is a closable source for a slice dataset.

A slice source is intended to be implemented by users. An implementation must provide the methods get_dataset() and close().

If your slice source class requires the processing context, your class constructor may define a ctx: Context as 1st positional argument or as keyword argument.

`close()`

Close this slice source. This should include cleaning up of any temporary resources.

This method is not intended to be called directly and is called exactly once for each instance of this class.

`dispose()`

Deprecated since version 0.6.0, override close() instead.

`get_dataset()` `abstractmethod`

Open this slice source, do some processing and return a dataset of type xarray.Dataset as result.

This method is not intended to be called directly and is called exactly once for each instance of this class.

It should return a dataset that is compatible with target dataset:

slice must have same fixed dimensions;
append dimension must exist in slice.

Returns:

Type	Description
`Dataset`	A slice dataset.

Class `Context`

Provides access to configuration values and values derived from it.

Parameters:

Name	Type	Description	Default
`config`	`Dict[str, Any] \| Config`	A validated configuration dictionary or a `Config` instance.	required

Raises:

Type	Description
`ValueError`	If `target_dir` is missing in the configuration.

`config: Config` `property`

The processor configuration.

`last_append_label: Any | None` `property`

The last label found in the coordinate variable that corresponds to the append dimension. Its value is None if no such variable exists or the variable is empty or if config.append_step is None.

`target_metadata: DatasetMetadata | None` `property` `writable`

The metadata for the target dataset. May be None while the target dataset hasn't been created yet. Will be set, once the target dataset has been created from the first slice dataset.

`get_dataset_metadata(dataset)`

Get the dataset metadata from configuration and the given dataset.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	The dataset	required

Returns:

Type	Description
`DatasetMetadata`	The dataset metadata

Class `Config`

Provides access to configuration values and values derived from it.

Parameters:

Name	Type	Description	Default
`config_dict`	`Dict[str, Any]`	A validated configuration dictionary.	required

Raises:

Type	Description
`ValueError`	If `target_dir` is missing in the configuration.

`append_dim: str` `property`

The name of the append dimension along which slice datasets will be concatenated. Defaults to "time".

`append_step: int | float | str | None` `property`

The enforced step size in the append dimension between two slices. Defaults to None.

`attrs: dict[str, Any]` `property`

Global dataset attributes. May include dynamically computed placeholders if the form {{ expression }}.

`attrs_update_mode: Literal['keep'] | Literal['replace'] | Literal['update']` `property`

The mode used to deal with global slice dataset attributes. One of "keep", "replace", "update".

`disable_rollback: bool` `property`

Whether to disable transaction rollbacks.

`dry_run: bool` `property`

Whether to run in dry mode.

`excluded_variables: list[str]` `property`

Names of excluded variables.

`extra: dict[str, Any]` `property`

Extra settings. Intended use is by a slice_source that expects an argument named ctx to access the extra settings and other configuration.

`force_new: bool` `property`

If set, an existing target dataset will be deleted.

`included_variables: list[str]` `property`

Names of included variables.

`logging: dict[str, Any] | str | bool | None` `property`

Logging configuration.

`permit_eval: bool` `property`

Check if dynamically computed values in dataset attributes attrs using the syntax {{ expression }} is permitted. Executing arbitrary Python expressions is a security risk, therefore this must be explicitly enabled.

`persist_mem_slices: bool` `property`

Whether to persist in-memory slice datasets.

`profiling: dict[str, Any] | str | bool | None` `property`

Profiling configuration.

`slice_engine: str | None` `property`

The configured slice engine to be used if a slice path or URI does not point to a dataset in Zarr format. If defined, it will be passed to the xarray.open_dataset() function.

`slice_polling: tuple[float, float] | tuple[None, None]` `property`

The configured slice dataset polling. If slice polling is enabled, returns tuple (interval, timeout) in seconds, otherwise, return (None, None).

`slice_source: Callable[[...], Any] | None` `property`

A class or function that receives a slice item as argument(s) and provides the slice dataset.

If a class is given, it must be derived from zappend.api.SliceSource.
If the function is a context manager, it must yield an xarray.Dataset.
If a plain function is given, it must return any valid slice item type.

Refer to the user guide for more information.

`slice_source_kwargs: dict[str, Any] | None` `property`

Extra keyword-arguments passed to a specified slice_source together with each slice item.

`slice_storage_options: dict[str, Any] | None` `property`

The configured slice storage options to be used if a slice item is a URI.

`target_dir: FileObj` `property`

The configured directory that represents the target datacube in Zarr format.

`temp_dir: FileObj` `property`

The configured directory used for temporary files such as rollback data.

`variables: dict[str, Any]` `property`

Variable definitions.

`zarr_version: int` `property`

The configured Zarr version for the target dataset.

Class `FileObj`

An object that represents a file or directory in some filesystem.

Parameters:

Name	Type	Description	Default
`uri`	`str`	The file or directory URI	required
`storage_options`	`dict[str, Any] \| None`	Optional storage options specific to the protocol of the URI	`None`
`fs`	`AbstractFileSystem \| None`	Optional fsspec filesystem instance. Use with care, the filesystem must be consistent with uri and storage_options. For internal use only.	`None`
`path`	`str \| None`	The path info the filesystem fs. Use with care, the path must be consistent with uri. For internal use only.	`None`

`filename: str` `property`

The filename part of the URI.

`fs: fsspec.AbstractFileSystem` `property`

The filesystem.

`parent: FileObj` `property`

The parent file object.

`path: str` `property`

The path of the file or directory into the filesystem.

`storage_options: dict[str, Any] | None` `property`

Storage options for creating the filesystem object.

`uri: str` `property`

The URI.

`truediv(rel_path)`

Overriden to call for_path(rel_path).

Parameters:

Name	Type	Description	Default
`rel_path`	`str`	Relative path to append.	required

`close()`

Close the filesystem used by this file object.

`delete(recursive=False)`

Delete the file or directory represented by this file object.

Parameters:

Name	Type	Description	Default
`recursive`	`bool`	Set to `True` to delete a non-empty directory.	`False`

`exists()`

Check if the file or directory represented by this file object exists.

`for_path(rel_path)`

Gets a new file object for the given relative path.

Parameters:

Name	Type	Description	Default
`rel_path`	`str`	Relative path to append.	required

Returns:

Type	Description
`FileObj`	A new file object

`mkdir()`

Create the directory represented by this file object.

`read(mode='rb')`

Read the contents of the file represented by this file object.

Parameters:

Name	Type	Description	Default
`mode`	`Literal['rb'] \| Literal['r']`	Read mode, must be "rb" or "r"	`'rb'`

Returns:

Type	Description
`bytes \| str`	The contents of the file either as `bytes` if mode is "rb" or as `str`
`bytes \| str`	if mode is "r".

`write(data, mode=None)`

Write the contents of the file represented by this file object.

Parameters:

Name	Type	Description	Default
`data`	`str \| bytes`	The data to write.	required
`mode`	`Literal['wb'] \| Literal['w'] \| Literal['ab'] \| Literal['a'] \| None`	Write mode, must be "wb", "w", "ab", or "a".	`None`

Returns:

Type	Description
`int`	The number of bytes written.

Types

`zappend.api.SliceItem = str | FileObj | xr.Dataset | ContextManager[xr.Dataset] | SliceSource` `module-attribute`

The possible types that can represent a slice dataset.

`zappend.api.SliceCallable = Type[SliceSource] | Callable[[...], SliceItem]` `module-attribute`

This type is either a class derived from SliceSource or a function that returns a SliceItem. Both can be invoked with any number of positional or keyword arguments. The processing context, if used, must be named ctx and must be either the 1st positional argument or a keyword argument. Its type is Context.

`zappend.api.ConfigItem = FileObj | str | dict[str, Any]` `module-attribute`

The possible types used to represent zappend configuration.

`zappend.api.ConfigList = list[ConfigItem] | tuple[ConfigItem]` `module-attribute`

A sequence of possible zappend configuration types.

`zappend.api.ConfigLike = ConfigItem | ConfigList | None` `module-attribute`

Type for a zappend configuration-like object.

Contributions

This module contributes to zappend's core functionality.

The function signatures in this module are less stable, and their implementations are considered experimental. They may also rely on external packages. For more information, please refer to the individual function documentation. Due to these reasons, this module is excluded from the project's automatic coverage analysis.

Function `write_levels()`

Write a dataset given by source_ds or source_path to target_path using the multi-level dataset format as specified by xcube.

It resembles the store.write_data(dataset, "<name>.levels", ...) method provided by the xcube filesystem data stores ("file", "s3", "memory", etc.). The zappend version may be used for potentially very large datasets in terms of dimension sizes or for datasets with very large number of chunks. It is considerably slower than the xcube version (which basically uses xarray.to_zarr() for each resolution level), but should run robustly with stable memory consumption.

The function opens the source dataset and subdivides it into dataset slices along the append dimension given by append_dim, which defaults to "time". The slice size in the append dimension is one. Each slice is downsampled to the number of levels and each slice level dataset is created/appended the target dataset's individual level datasets.

The target dataset's chunk size in the spatial x- and y-dimensions will be the same as the specified (or derived) tile size. The append dimension will be one. The chunking will be reflected as the variables configuration parameter passed to each zappend() call. If configuration parameter variables is also given as part of zappend_config, it will be merged with the chunk definitions.

Important notes:

This function depends on xcube.core.gridmapping.GridMapping and xcube.core.subsampling.subsample_dataset() of the xcube package.
write_levels() is not as robust as zappend itself. For example, there may be inconsistent dataset levels if the processing is interrupted while a level is appended.
There is a remaining issue that with (coordinate) variables that have a dimension that is not a dimension of any variable that has one of the spatial dimensions, e.g., time_bnds with dimensions time and bnds. Please exclude such variables using the parameter excluded_variables.

Parameters:

Name	Type	Description	Default
`source_ds`	`Dataset \| None`	The source dataset. Must be given in case `source_path` is not given.	`None`
`source_path`	`str \| None`	The source dataset path. If `source_ds` is provided and `link_level_zero` is true, then `source_path` must also be provided in order to determine the path of the level zero source.	`None`
`source_storage_options`	`dict[str, Any] \| None`	Storage options for the source dataset's filesystem.	`None`
`source_append_offset`	`int \| None`	Optional offset in the append dimension. Only slices with indexes greater or equal the offset are appended.	`None`
`target_path`	`str \| None`	The target multi-level dataset path. Filename extension should be `.levels`, by convention. If not given, `target_dir` should be passed as part of the `zappend_config`. (The name `target_path` is used here for consistency with `source_path`.)	`None`
`num_levels`	`int \| None`	Optional number of levels. If not given, a reasonable number of levels is computed from `tile_size`.	`None`
`tile_size`	`tuple[int, int] \| None`	Optional tile size in the x- and y-dimension in pixels. If not given, the tile size is computed from the source dataset's chunk sizes in the x- and y-dimensions.	`None`
`xy_dim_names`	`tuple[str, str] \| None`	Optional dimension names that identify the x- and y-dimensions. If not given, derived from the source dataset's grid mapping, if any.	`None`
`agg_methods`	`str \| dict[str, Any] \| None`	An aggregation method for all data variables or a mapping that provides the aggregation method for a variable name. Possible aggregation methods are `"first"`, `"min"`, `"max"`, `"mean"`, `"median"`.	`None`
`use_saved_levels`	`bool`	Whether a given, already written resolution level serves as input to aggregation for the next level. If `False`, the default, each resolution level other than zero is computed from the source dataset. If `True`, the function may perform significantly faster, but be aware that the aggregation methods `"first"` and `"median"` will produce inaccurate results.	`False`
`link_level_zero`	`bool`	Whether to not write the level zero of the target multi-level dataset and link it instead. In this case, a link file `{target_path}/0.link` will be written. If `False`, the default, a level dataset `{target_path}/0.zarr` will be written instead.	`False`
`zappend_config`		Configuration passed to zappend as `zappend(slice, **zappend_config)` for each slice in the append dimension. The zappend `config` parameter is not supported.	`{}`

Python API reference

Function zappend()

zappend.api.zappend(slices, config=None, **kwargs)

Class SliceSource

close()

dispose()

get_dataset() abstractmethod

Class Context

config: Config property

last_append_label: Any | None property

target_metadata: DatasetMetadata | None property writable

get_dataset_metadata(dataset)

Class Config

append_dim: str property

append_step: int | float | str | None property

attrs: dict[str, Any] property

attrs_update_mode: Literal['keep'] | Literal['replace'] | Literal['update'] property

disable_rollback: bool property

dry_run: bool property

excluded_variables: list[str] property

extra: dict[str, Any] property

force_new: bool property

included_variables: list[str] property

logging: dict[str, Any] | str | bool | None property

permit_eval: bool property

persist_mem_slices: bool property

profiling: dict[str, Any] | str | bool | None property

slice_engine: str | None property

slice_polling: tuple[float, float] | tuple[None, None] property

slice_source: Callable[[...], Any] | None property

slice_source_kwargs: dict[str, Any] | None property

slice_storage_options: dict[str, Any] | None property

target_dir: FileObj property

temp_dir: FileObj property

variables: dict[str, Any] property

zarr_version: int property

Class FileObj

filename: str property

fs: fsspec.AbstractFileSystem property

parent: FileObj property

path: str property

storage_options: dict[str, Any] | None property

uri: str property

__truediv__(rel_path)

close()

delete(recursive=False)

exists()

for_path(rel_path)

mkdir()

read(mode='rb')

write(data, mode=None)

Types

zappend.api.SliceItem = str | FileObj | xr.Dataset | ContextManager[xr.Dataset] | SliceSource module-attribute

zappend.api.SliceCallable = Type[SliceSource] | Callable[[...], SliceItem] module-attribute

zappend.api.ConfigItem = FileObj | str | dict[str, Any] module-attribute

zappend.api.ConfigList = list[ConfigItem] | tuple[ConfigItem] module-attribute

zappend.api.ConfigLike = ConfigItem | ConfigList | None module-attribute

Contributions

Function write_levels()

Function `zappend()`

`zappend.api.zappend(slices, config=None, **kwargs)`

Class `SliceSource`

`close()`

`dispose()`

`get_dataset()` `abstractmethod`

Class `Context`

`config: Config` `property`

`last_append_label: Any | None` `property`

`target_metadata: DatasetMetadata | None` `property` `writable`

`get_dataset_metadata(dataset)`

Class `Config`

`append_dim: str` `property`

`append_step: int | float | str | None` `property`

`attrs: dict[str, Any]` `property`

`attrs_update_mode: Literal['keep'] | Literal['replace'] | Literal['update']` `property`

`disable_rollback: bool` `property`

`dry_run: bool` `property`

`excluded_variables: list[str]` `property`

`extra: dict[str, Any]` `property`

`force_new: bool` `property`

`included_variables: list[str]` `property`

`logging: dict[str, Any] | str | bool | None` `property`

`permit_eval: bool` `property`

`persist_mem_slices: bool` `property`

`profiling: dict[str, Any] | str | bool | None` `property`

`slice_engine: str | None` `property`

`slice_polling: tuple[float, float] | tuple[None, None]` `property`

`slice_source: Callable[[...], Any] | None` `property`

`slice_source_kwargs: dict[str, Any] | None` `property`

`slice_storage_options: dict[str, Any] | None` `property`

`target_dir: FileObj` `property`

`temp_dir: FileObj` `property`

`variables: dict[str, Any]` `property`

`zarr_version: int` `property`

Class `FileObj`

`filename: str` `property`

`fs: fsspec.AbstractFileSystem` `property`

`parent: FileObj` `property`

`path: str` `property`

`storage_options: dict[str, Any] | None` `property`

`uri: str` `property`

`truediv(rel_path)`

`close()`

`delete(recursive=False)`

`exists()`

`for_path(rel_path)`

`mkdir()`

`read(mode='rb')`

`write(data, mode=None)`

`zappend.api.SliceItem = str | FileObj | xr.Dataset | ContextManager[xr.Dataset] | SliceSource` `module-attribute`

`zappend.api.SliceCallable = Type[SliceSource] | Callable[[...], SliceItem]` `module-attribute`

`zappend.api.ConfigItem = FileObj | str | dict[str, Any]` `module-attribute`

`zappend.api.ConfigList = list[ConfigItem] | tuple[ConfigItem]` `module-attribute`

`zappend.api.ConfigLike = ConfigItem | ConfigList | None` `module-attribute`

Function `write_levels()`