User Guide
After installation, you can either use the zappend
CLI command
zappend -t output/mycube.zarr inputs/*.nc
or the zappend
Python function
from zappend.api import zappend
zappend(os.listdir("inputs"), target_dir="output/mycube.zarr")
Both invocations will create the Zarr dataset output/mycube.zarr
by concatenating
the "slice" datasets provided in the inputs
directory along their time
dimension.
target_dir
must specify a directory for the Zarr dataset. (Its parent directory must
exist.) Both the CLI command and the Python function can be run without any further
configuration provided the paths of the target dataset and the source slice datasets
are given. The target dataset path must point to a directory that will contain a Zarr
group to be created and updated. The slice dataset paths may be provided as Zarr as
well or in other data formats supported by the xarray.open_dataset()
function. Because we provided no additional configuration, the default append dimension
time
is used above.
The target and slice datasets are allowed to live in filesystems other than the local
one, if their paths are given as URIs prefixed with a filesystem protocol such as
s3://
or memory://
. Additional filesystem storage options may be specified via
dedicated configuration settings. More on this is given
in section Data I/O below.
Zarr Format v2
By default, zappend
uses the Zarr storage specification 2
and has only been tested with this version. The zarr_version
setting can be used
to change it, e,g, to 3
, but any other value than 2
is currently unsupported.
The tool takes care of generating the target dataset from slice datasets, but doesn't
care how the slice datasets are created. Hence, when using the Python zappend()
function, the slice datasets can be provided in various forms. More on this is given
in section Slice Sources below.
To run the zappend
tool with configuration you can pass one or more
configuration files using JSON or YAML format:
zappend -t output/mycube.zarr -c config.yaml inputs/*.nc
If multiple configuration files are passed, they will be merged into one by incrementally updating the first by subsequent ones.
Environment Variables
It is possible to include the values of environment variables in JSON or YAML
configuration files using the syntax ${ENV_VAR}
or just $ENV_VAR
.
You can pass configuration settings to the zappend
Python function with
the optional config
keyword argument. Other keyword arguments are
interpreted as individual configuration settings and will be merged into the
one given by config
argument, if any. The config
keyword argument can be
given as local file path or URL (type str
) pointing to a JSON or YAML file.
It can also be given as dictionary, or as a sequence of the aforementioned
types. Configuration sequences are again merged into one.
import os
from zappend.api import zappend
zappend(os.listdir("inputs"),
config=["configs/base.yaml",
"configs/mycube.yaml"],
target_dir="outputs/mycube.zarr",
dry_run=True)
The remainder of this guide explains the how to use the various zappend
configuration settings.
Note
We use the term Dataset in the same way xarray
does: A dataset comprises any
number of multidimensional Data Variables, and usually 1-dimensional
Coordinate Variables that provide the labels for the dimensions used by the data
variables. A variable comprises the actual data array as well as metadata describing
the data dimensions, units, and encoding, such as chunking and compression.
Dataset Metadata
Outline
If no further configuration is supplied, then the target dataset's outline and data
encoding is fully prescribed by the first slice dataset provided. By default, the
dimension along which subsequent slice datasets are concatenated is time
. If
you use a different append dimension, the append_dim
setting can be used to
specify its name:
{
"append_dim": "depth"
}
The configuration setting append_step
can be used to validate the step sizes
between the labels of a coordinate variable associated with the append dimension.
Its value can be a number for numerical labels or a timedelta value of the form
<count><unit>
for date/time labels. In the latter case <count>
is an integer
and <units>
is one of the possible
numpy datetime units,
for example, 8h
(8 hours) or 2D
(two days). Numerical and timedelta values
may be negative. append_step
can also take the two special values "+"
and
"-"
. In this case it is just verified that the append labels are monotonically
increasing and decreasing, respectively.
{
"append_dim": "time",
"append_step": "2D"
}
Other, non-variadic dimensions besides the append dimension can and should
be specified using the fixed_dims
setting which is a mapping from dimension
name to the fixed dimension size, e.g.:
{
"fixed_dims": {
"x": 16384,
"y": 8192
}
}
By default, without further configuration, all data variables seen in the first
dataset slice will be included in the target dataset. If only a subset of
variables should be used from the slice dataset, they can be specified using
the included_variables
setting, which is a list of names of variables that
will be included:
{
"included_variables": [
"time", "y", "x",
"chl",
"tsm"
]
}
Often, it is easier to specify which variables should be excluded:
{
"excluded_variables": ["GridCellId"]
}
Attributes
The target dataset should exploit information about itself using global
metadata attributes.
There are three choices to update the global attributes of the target
dataset from slices. The configuration setting attrs_update_mode
controls how this is done:
"keep"
- use attributes from first slice dataset and keep them (default);"replace"
- replace existing attributes by attributes of last slice dataset;"update"
- update existing attributes by attributes of last slice dataset;"ignore"
- ignore attributes from slice datasets.
Extra attributes can be added using the optional configuration setting attrs
:
{
"attrs_update_mode": "keep",
"attrs": {
"Conventions": "CF-1.10",
"title": "SMOS Level 2C Soil Moisture 2-Days Composite"
}
}
Independently of the attrs_update_mode
setting, extra attributes configured
by the attrs
setting will always be used to update the resulting target
attributes.
Attribute values in the attrs
setting may also be computed dynamically using
the syntax {{ expression }}
, where expression
is an arbitrary Python
expression. For this to work, the setting permit_eval
must be explicitly
set for security reasons:
{
"permit_eval": true,
"attrs_update_mode": "keep",
"attrs": {
"time_coverage_start": "{{ ds.time[0] }}",
"time_coverage_end": "{{ ds.time[-1] }}"
}
}
Currently, the only variable accessible from expressions is ds
which is
a reference to the current state of the target dataset after the last slice
append. It is of type
xarray.Dataset.
Evil eval()
The expressions in {{ expression }}
are evaluated using the Python
eval() function.
This can pose a threat to your application and environment.
Although zappend
does not allow you to directly access Python built-in
functions via expressions, it should be used judiciously and with extreme
caution if used as part of a web service where configuration is injected
from the outside of your network.
The following utility functions can be used as well and are handy if you need to store the upper and lower bounds of coordinates as attribute values:
lower_bound(array, ref: "lower"|"upper"|"center" = "lower")
: Return the lower bound of a one-dimensional (coordinate) arrayarray
.upper_bound(array, ref: "lower"|"upper"|"center" = "lower")
: Return the upper bound of a one-dimensional (coordinate) arrayarray
.
The ref
value specifies the reference within an array element that is used
as a basis for the boundary computation. E.g., if coordinate labels refer to
array element centers, pass ref="center"
.
{
"attrs": {
"time_coverage_start": "{{ lower_bound(ds.time, 'center') }}",
"time_coverage_end": "{{ upper_bound(ds.time, 'center') }}"
}
}
Variable Metadata
Without any additional configuration, zappend
uses the dimensions, attributes,
and encoding information from the data variables of the first slice dataset.
Encoding information is used only to the extent applicable to the Zarr format.
Non-applicable encoding information will be reported by a warning log record
but is otherwise ignored.
Variable metadata can be specified by the variables
setting, which is a
mapping from variable name to a mapping that provides the dimensions, attributes,
and encoding information of data variables for the target dataset. All such
information is optional. The provided settings will be merged with the
information retrieved from the data variables with same name included in the
first dataset slice.
A special "variable name" is the wildcard *
that can be used to define default
values for all variables:
{
"variables": {
"*": {
}
}
}
If *
is specified, the metadata for a particular variable is generated by
merging the specific metadata for that variable into the common metadata given
by *
, which is eventually merged into metadata of the variable in the first
dataset slice.
Note
Only metadata from the first slice dataset is used. Metadata of variables from subsequent slice datasets is ignored entirely.
Dimensions
To ensure a slice variable has the expected dimensionality and shape, the dims
setting is used. The following example defines the dimensions of a data variable
named chl
(Chlorophyll):
{
"variables": {
"chl": {
"dims": ["time", "y", "x"]
}
}
}
An error will be raised if a variable from a subsequent slice has different dimensions.
Attributes
Extra variable attributes can be provided using the attrs
setting:
{
"variables": {
"chl": {
"attrs": {
"units": "mg/m^3",
"long_name": "chlorophyll_concentration"
}
}
}
}
Encoding
Encoding metadata specifies how array data is stored in the target dataset and includes
storage data type, packing, chunking, and compression. Encoding metadata for a given
variable is provided by the encoding
setting. Since the encoding is often shared by
multiple variables the wildcard variable name *
can often be of help.
Verify encoding is as expected
To verify that zappend
uses the expected encoding for your variables create a
target dataset for testing from your first slice dataset and open it using
ds = xarray.open_zarr(target_dir, decode_cf=False)
. Then inspect dataset ds
using the Python console or Jupyter Notebook (attribute ds.<var>.encoding
).
You can also inspect the Zarr directly by opening the <target_dir>/<var>/.zarray
or <target_dir>/.zmetadata
metadata JSON files.
Chunking
Chunking refers to the subdivision of multidimensional data arrays into
smaller multidimensional blocks. Using the Zarr format, such blocks become
individual data files after optional data packing
and compression. The chunk sizes of the
dimensions of the multidimensional blocks therefore determine the number of
blocks used per data array and also their size. Hence, chunk sizes have
a very large impact on I/O performance of datasets, especially if they are
persisted in remote filesystems such as S3. The chunk sizes are specified
using the chunks
setting in the encoding of each variable.
The value of chunks
can also be null
, which means no chunking is
desired and the variable's data array will be persisted as one block.
By default, the chunking of the coordinate variable corresponding to the append
dimension will be its dimension size in the first slice dataset. Often, the size
will be 1
or another small number. Since xarray
loads coordinates eagerly
when opening a dataset, this can lead to performance issues if the target
dataset is served from object storage such as S3. The reason for this is that a
separate HTTP request is required for every single chunk. It is therefore very
advisable to set the chunks of that variable to a larger number using the
chunks
setting. For other variables, you could still use a small chunk size
in the append dimension.
Here is a typical chunking configuration for the append dimension "time"
:
{
"append_dim": "time",
"variables": {
"*": {
"encoding": {
"chunks": null
}
},
"time": {
"dims": ["time"],
"encoding": {
"chunks": [1024]
}
},
"chl": {
"dims": ["time", "y", "x"],
"encoding": {
"chunks": [1, 2048, 2048]
}
}
}
}
Sometimes, you may explicitly wish to not chunk a given dimension of a variable.
If you know the size of that dimension in advance, you can then use its size as
chunk size. But there are situations, where the final dimension size depends
on some processing parameters. For example, you could define your own
slice source that takes a geodetic bounding box bbox
parameter to spatially crop your variables in the x
and y
dimensions.
If you want such dimensions to not be chunked, you can set their chunk sizes
to null
(None
in Python):
{
"variables": {
"chl": {
"dims": ["time", "y", "x"],
"encoding": {
"chunks": [1, null, null]
}
}
}
}
Missing Data
To indicate missing data in a variable data array, a dedicated no-data or missing value
can be specified by the fill_value
setting. The value is given in a variable's storage
type and storage units, see next section Data Packing.
{
"variables": {
"chl": {
"encoding": {
"fill_value": -999
}
}
}
}
If the fill_value
is not specified, the default is NaN
(given as string "NaN"
in JSON) if the storage data type is floating point; it is None
(null
in JSON)
if the storage data types is integer, which effectively means, no fill value is used.
You can also explicitly set fill_value
to null
(None
in Python) to not use one.
Setting the fill_value
for a variable can be important for saving storage space and
improving data I/O performance in many cases, because zappend
does not write empty
array chunks - chunks that comprise missing data only, i.e.,
slice.to_zarr(target_dir, write_empty_chunks=False, ...)
.
Data Packing
Data packing refers to a simple lossy data compression method where 32- or 64-bit
floating point values are linearly scaled so that their value range can be fully or
partially represented by a lower precision integer data type. Packed values usually
also give higher compression rates when using a compressor
, see next section.
Data packing is specified using the scale_factor
and add_offset
settings together
with the storage data type setting dtype
. The settings should be given as a triple:
{
"variables": {
"chl": {
"encoding": {
"dtype": "int16",
"scale_factor": 0.005,
"add_offset": 0.0
}
}
}
}
The in-memory value in its physical units for a given encoded value in storage is computed according to
memory_value = scale_factor * storage_value + add_offset
Hence, the encoded value is computed from an in-memory value in physical units as
storage_value = (memory_value - add_offset) / scale_factor
You can compute scale_factor
and add_offset
from given data range in physical units
according to
add_offset = memory_value_min
scale_factor = (memory_value_max - memory_value_min) / (2 ** num_bits - 1)
with num_bits
being the number of bits for the integer type to be used.
Compression
Data compression is specified by the compressor
setting, optionally paired with the
filters
setting:
{
"variables": {
"chl": {
"encoding": {
"compressor": {},
"filters": []
}
}
}
}
By default, zappend
uses default the default blosc
compressor of Zarr, if not
specified. To explicitly disable compression you must set the compressor
to None
(null
in JSON).
The usage of compressors and filters is best explained in dedicated sections of the Zarr Tutorial, namely Compressors and Filters.
Data I/O
This section describes the configuration of how data is read and written.
All input and output can be configured to take place in different filesystem. To
specify a filesystem other than the local one, you can use URIs and URLs for path
configuration settings such target_dir
and temp_dir
as well as for the slice
dataset paths. The filesystem is given by an URI's protocol prefix, such as s3://
,
which specifies the S3 filesystem. Additional storage parameters may be required to
access the data which can be provided by the settings target_storage_options
,
temp_storage_options
, and slice_storage_options
which must be given as dictionaries
or JSON objects. The supported filesystems and their storage options are given by the
fsspec package.
Tip
You can use the dry_run
setting to supress creation or modification of any files
in the filesystem. This is useful for testing, e.g., make sure configuration is
valid and slice datasets can be read without errors.
While the target dataset is being modified, a file lock is created used to effectively
prevent concurrent dataset modifications. After successfully appending a complete slice
dataset, the lock is removed from the target. The lock file is written next to the target
dataset, using the same filesystem and parent directory path. Its filename is the
filename of the target dataset suffixed by the extension .lock
.
Transactions
Appending a slice dataset is an atomic operation to ensure the target dataset's integrity. That is, in case a former append step failed, a rollback is performed to restore the last valid state of the target dataset. The rollback takes place immediately after a target dataset modification failed. The rollback include restoring all changed files and removing added files. After the rollback you can analyse what went wrong and try to continue appending slices at the point it failed.
To allow for rollbacks, a slice append operation is treated as a transaction,
hence temporary files must be written, e.g., to record required rollback actions and to
save backup files with the original data. The location of the temporary transaction
files can be configured using the optional temp_dir
and temp_storage_options
settings:
{
"temp_dir": "memory://temp"
}
temp_dir
is your operating system's location for temporary
data (Python tempfile.gettempdir()
).
You can disable transaction management by specifying
{
"disable_rollback": true
}
Target Dataset
The target_dir
setting is mandatory. If it is not specified in the configuration,
it must be passed either as --target
or -t
option to the zappend
command or as
target_dir
keyword argument when using the zappend
Python function.
Note, the parent directory of target_dir
must already exist.
If the target path is given for another filesystem, additional storage options may be
passed using the optional target_storage_options
setting.
{
"target_dir": "s3://wqservices/cubes/chl-2023.zarr",
"target_storage_options": {
"anon": false,
"key": "...",
"secret": "...",
"endpoint_url": "https://s3.acme.org"
}
}
Sometimes you may want to start a new target dataset from scratch when calling
zappend
. A typical case is testing if a given configuration yields the desired
results. The configuration flag force_new
can be used to delete existing
target datasets (and an existing lock) upfront.
{
"target_dir": "s3://wqservices/cubes/chl-2023.zarr",
"force_new": true
}
However, keep in mind that the deletion is not a transaction that can be rolled
back. Therefore, a log message with warning level will be emitted if the
force_new
flag is set.
Setting force_new
The configuration flag force_new
will force generating a new
target dataset. If it already exists, it will be permanently deleted!
If the deletion fails, there will be no rollback.
Slice Datasets
A slice dataset is the dataset that is appended for every slice item passed
to zappend
. Slice datasets can be provided in various ways.
-
When using the zappend CLI command, slice items are passed as command arguments where they point to slice datasets by local file paths or URIs.
-
When using the zappend Python function, slice items are passed using the
slices
argument, which is a Python iterable. You can pass a list or tuple of slice items or provide a Python generator that provides the slice items.
Each slice item can be a local file path or URI of type str
or FileObj
,
a dataset of type xarray.Dataset
, or a SliceSource
object explained in more detail
below.
Paths and URIs
A slice item of type str
is interpreted as local file path or URI, in the case
the path has a protocol prefix, such as s3://
as described above.
In the majority of zappend
use cases the slice datasets to be appended to a target
dataset are passed as local file paths or URIs. A slice URI starts with a protocol
prefix, such as s3://
, or memory://
. Additional storage options may be required
for the filesystem given by the URI's protocol. They may be specified using the
slice_storage_options
setting.
{
"slice_storage_options": {
"anon": true
}
}
Sometimes, the slice dataset to be processed are not yet available, e.g.,
because another process is currently generating them. For such cases, the
slice_polling
setting can be used. It provides the poll interval and the timeout
values in seconds. If this setting is used, and the slice dataset does not yet exist or
fails to open, the tool will retry to open it after the given interval. It will stop
doing so and exit with an error if the total time for opening the slice dataset exceeds
the given timeout:
{
"slice_polling": {
"interval": 2,
"timeout": 600
}
}
Or use default polling:
{
"slice_polling": true
}
An alternative to providing the slice dataset as path or URI is using the
zappend.api.FileObj
class, which combines a URI with dedicated filesystem
storage options.
Dataset Objects
You can also use dataset objects of type
xarray.Dataset
as slice item. Such objects may originate from opening datasets from some storage, e.g.,
xarray.open_dataset(slice_store, ...)
or by composing, aggregating, resampling
slice datasets from other datasets and dataset variables.
Datasets are not closed automatically
If you pass xarray.Dataset
objects to zappend
they will not be
automatically closed. This may become be issue, if you have many datasets
and each one binds resources such as open file handles. Consider using a
slice source then, see below.
Chunked data arrays of an xarray.Dataset
are usually instances of
Dask arrays, to allow for out-of-core
computation of large datasets. As a dask array may represent complex and/or
expensive processing graphs, high CPU loads and memory consumption are common issues
for computed slice datasets, especially if the specified target dataset chunking is
different from the slice dataset chunking. This may cause Dask graphs to be
computed multiple times if the source chunking overlaps multiple target chunks,
potentially causing large resource overheads while recomputing and/or reloading the same
source chunks multiple times. In such cases it can help to "terminate" computations for
each slice by persisting the computed dataset first and then to reopen it. This can be
specified using the persist_mem_slice
setting:
{
"persist_mem_slice": true
}
If the flag is set, in-memory slices will be persisted to a temporary Zarr before
appending them to the target dataset. It may prevent expensive re-computation of chunks
at the cost of additional i/o. It therefore defaults to false
.
Slice Sources
A slice source gives you full control about how a slice dataset is created, loaded,
or generated and how its bound resources, if any, are released. In its simplest form,
a slice source is a plain Python function that can take any arguments and returns
an xarray.Dataset
:
import xarray as xr
# Slice source argument `path` is just an example.
def get_dataset(path: str) -> xr.Dataset:
# Provide dataset here. No matter how, e.g.:
return xr.open_dataset(path)
If you need cleanup code that is executed after the slice dataset has been appended, you can turn your slice source function into a context manager (new in zappend v0.7):
from contextlib import contextmanager
import xarray as xr
# Slice source argument `path` is just an example.
@contextmanager
def get_dataset(path: str) -> xr.Dataset:
# Bind any resources and provide dataset here, e.g.:
dataset = xr.open_dataset(path)
try:
# Yield (not return!) the dataset
yield dataset
finally:
# Cleanup code here, release any bound resources, e.g.:
dataset.close()
You can also implement your slice source as a class derived from the abstract
zappend.api.SliceSource
class. Its interface methods are:
get_dataset()
: a zero-argument method that returns the slice dataset of typexarray.Dataset
. You must implement this abstract method.close()
: Optional method. Put your cleanup code here. (in zappend < v0.7, theclose
method was calleddispose
).
import xarray as xr
from zappend.api import SliceSource
class MySliceSource(SliceSource):
# Slice source argument `path` is just an example.
def __init__(self, path: str):
self.path = path
self.dataset = None
def get_dataset(self) -> xr.Dataset:
# Bind any resources and provide dataset here, e.g.:
self.dataset = xr.open_dataset(self.path)
return self.dataset
def close(self):
# Cleanup code here, release any bound resources, e.g.:
if self.dataset is not None:
self.dataset.close()
You may prefer implementing a class because your slice source is complex and you want
to split its logic into separate methods. You may also just prefer classes as a matter
of your personal taste. Another advantage of using a class is that you can pass
instances of it as slice items to the zappend
function without further configuration.
However, the intended use of a slice source is to configure it by specifying the
slice_source
setting. In a JSON or YAML configuration file it specifies the fully
qualified name of the slice source function or class:
{
"slice_source": "mymodule.MySliceSource"
}
If you use the zappend
function, you can pass the function or class directly:
zappend(["slice-1.nc", "slice-2.nc", "slice-3.nc"],
target_dir="target.zarr",
slice_source=MySliceSource)
If the slice source setting is used, each slice item passed to zappend
is passed as
argument(s) to your slice source.
- Slices passed to the
zappend
CLI command become slice source arguments of typestr
. -
Slice items passed to the
zappend
function via theslices
argument can be of any type, but thetuple
,list
, anddict
types have a special meaning:tuple
: a pair of the form(args, kwargs)
, whereargs
is a list or tuple of positional arguments andkwargs
is a dictionary of keyword arguments;list
: positional arguments only;dict
: keyword arguments only;- Any other type is interpreted as single positional argument.
You can also pass extra keyword arguments to your slice source using the
slice_source_kwargs
setting. Keyword arguments passed as slice items take
precedence, that is, they overwrite arguments passed by slice_source_kwargs
.
If your slice source has many parameters that stay the same for all slices you may
prefer providing parameters as configuration settings, rather than function or class
arguments. This can be achieved using the extra
setting:
{
"extra": {
"quantiles": [0.1, 0.5, 0.9],
"use_threshold": true,
"filter": "gauss"
}
}
To access the settings in extra
your slice source function or class constructor
must define a special argument named ctx
. It must be a 1ˢᵗ positional argument or
a keyword argument. The argument ctx
is the current processing context of type
zappend.api.Context
that also contains the configuration. The settings in extra
can be accessed using the dictionary returned from ctx.config.extra
.
Here is a more advanced example of a slice source that opens datasets from a given file path and averages the values along the time dimension:
import numpy as np
import xarray as xr
from zappend.api import Context
from zappend.api import SliceSource
from zappend.api import zappend
class MySliceSource(SliceSource):
def __init__(self, ctx: Context, slice_path: str):
self.quantiles = ctx.config.extra.get("quantiles", [0.5])
self.slice_path = slice_path
self.ds = None
def get_dataset(self):
self.ds = xr.open_dataset(self.slice_path)
return self.get_agg_slice(self.ds)
def close(self):
if self.ds is not None:
self.ds.close()
def get_agg_slice(self, slice_ds: xr.Dataset) -> xr.Dataset:
agg_slice_ds = slice_ds.quantile(self.quantiles, dim="time")
# Re-introduce time dimension of size one
agg_slice_ds = agg_slice_ds.expand_dims("time", axis=0)
agg_slice_ds.coords["time"] = self.get_mean_time(slice_ds)
return agg_slice_ds
@classmethod
def get_mean_time(cls, slice_ds: xr.Dataset) -> xr.DataArray:
time = slice_ds.time
t0 = time[0]
dt = time[-1] - t0
return xr.DataArray(np.array([t0 + dt / 2],
dtype=slice_ds.time.dtype),
dims="time")
zappend(["slice-1.nc", "slice-2.nc", "slice-3.nc"],
target_dir="target.zarr",
slice_source=MySliceSource)
Profiling
Runtime profiling is very important for understanding program runtime behavior
and performance. The configuration setting profiling
can be used to
analyse and improve the runtime performance of zappend
itself as well as
the runtime performance of the computation of in-memory slices passed to the
zappend()
function.
To log the output of the profiling with level INFO
(see next section
Logging), you can use the value true
:
{
"profiling": true
}
If you like to see the output in a file too, then set profiling
to the
desired local file path:
{
"profiling": "perf.out"
}
You can also set profiling
to an object that allows for fine-grained
control of the runtime logging:
{
"profiling": {
"log_level": "WARNING",
"path": "perf.out",
"keys": ["tottime", "time", "ncalls"]
}
}
Please refer to section profiling
in
the Configuration Reference for more details.
Logging
The zappend
logging output is configured using the logging
setting.
In the simplest case, if you just want logging output from zappend
to the console:
{
"logging": true
}
The above uses log level INFO
. If you want a different log level,
just provide its name:
{
"logging": "DEBUG"
}
If you also want logging output in a file or using a different format or if you want to
see logging output of other Python modules, you can configure Python's logging
system following the logging dictionary schema.
Given here is an example that logs zappend
's output to the console using
the INFO
level (same as "logging": true
):
{
"logging": {
"version": 1,
"formatters": {
"normal": {
"format": "%(asctime)s %(levelname)s %(message)s",
"style": "%"
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"formatter": "normal"
}
},
"loggers": {
"zappend": {
"level": "INFO",
"handlers": ["console"]
}
}
}
}
Using the loggers
entry you can configure the logger of other Python modules, e.g.,
xarray
or dask
. The logger used by the zappend
tool is named zappend
.