How do I ...
... create datacubes from a directory of GeoTIFFs
Files with GeoTIFF format cannot be opened directly by zappend
unless
you add rioxarray to your
Python environment.
Then write your own slice source and
use configuration setting slice_source
:
import glob
import numpy as np
import rioxarray as rxr
import xarray as xr
from zappend.api import zappend
def get_dataset_from_geotiff(tiff_path):
ds = rxr.open_rasterio(tiff_path)
# Add missing time dimension
slice_time = get_slice_time(tiff_path)
slice_ds = ds.expand_dims("time", axis=0)
slice_ds.coords["time"] = xr.Dataset(np.array([slice_time]), dims="time")
try:
yield slice_ds
finally:
ds.close()
zappend(sorted(glob.glob("inputs/*.tif")),
slice_source=get_dataset_from_geotiff,
target_dir="output/tif-cube.zarr")
In the example above, function get_slice_time()
returns the time label
of a given GeoTIFF file as a value of type np.datetime64
.
... create datacubes from datasets without append dimension
zappend
expects the append dimension to exist in slice datasets and
expects that at least one variable exists that makes use of that dimension.
For example, if you are appending spatial 2-d images with dimensions x and y
along a dimension time, you need to first expand the images into the time
dimension. Here the 2-d image dataset is called image_ds
and slice_time
is its associated time value of type np.datetime64
.
slice_ds = image_ds.expand_dims("time", axis=0)
slice_ds.coords["time"] = xr.Dataset(np.array([slice_time]), dims="time")
See also How do I create datacubes from a directory of GeoTIFFs above.
... dynamically update global metadata attributes
Refer to section about target attributes in the user guide.
... find out what is limiting the performance
Use the logging configuration see which processing steps use most of the time. Use the profiling configuration to inspect in more detail which parts of the processing are the bottlenecks.
... write a log file
Use the following logging configuration:
{
"logging": {
"version": 1,
"formatters": {
"normal": {
"format": "%(asctime)s %(levelname)s %(message)s",
"style": "%"
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"formatter": "normal"
},
"file": {
"class": "logging.FileHandler",
"formatter": "normal",
"filename": "zappend.log",
"mode": "w",
"encoding": "utf-8"
}
},
"loggers": {
"zappend": {
"level": "INFO",
"handlers": ["console", "file"]
}
}
}
}
... address common errors
Error Target parent directory does not exist
For security reasons, zappend
does not create target directories
automatically. You should make sure the parent directory exists before
calling zappend
.
Error Target is locked
In this case the target lock file still exists, which means that a former
rollback did not complete nominally. You can no longer trust the integrity of
any existing target dataset. The recommended way is to remove the lock file
and any target datasets artifact. You can do that manually or use the
configuration setting force_new
.
Error Append dimension 'foo' not found in dataset
Refer to How do I create datacubes from datasets without append dimension.