stac-vrt¶
Note
stac-vrt is lightly maintained these days, and its use case is now better filled by other libraries:
GDAL now natively supports STAC items: See https://gdal.org/drivers/raster/stacit.html
stackstac provides a nicer way to stack STAC items into a DataArray
stac-vrt
is a small library for quickly generating a GDAL VRT from a collection
of STAC items. This makes it fast and easy to generate a mosaic of many
raster images.
Installation¶
stac-vrt
can be installed from conda-forge
conda install -c conda-forge stac-vrt
or from PyPI
pip install stac-vrt
Usage¶
stac_vrt.build_vrt()
is the primary function to use. You provide it a list of STAC items:
>>> import stac_vrt
>>> import requests
>>> stac_items = requests.get(
... "http://pct-mqe-staging.westeurope.cloudapp.azure.com/collections/usda-naip/items"
... ).json()["features"]
These STAC items contain essentially all of the information needed to build a VRT.
>>> vrt = stac_vrt.build_vrt(stac_items, data_type="Byte", block_width=512, block_height=512)
The vrt
variable is just a Python string that’s a valid VRT (an XML document). It can
be written to disk, or passed directly to rasterio or rioxarray (this example uses rioxarray
and also requires Dask to read the chunks in parallel).
>>> import rioxarray
>>> ds = rioxarray.open_rasterio(vrt, chunks=(4, -1, "auto"))
>>> ds
<xarray.DataArray (band: 4, y: 11588, x: 20704)>
dask.array<open_rasterio-a61f0d99384a83218d8164684d89e2db<this-array>, shape=(4, 11588, 20704), dtype=uint8, chunksize=(1, 11520, 11520), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1 2 3 4
* y (y) float64 2.986e+06 2.986e+06 2.986e+06 ... 2.98e+06 2.98e+06
* x (x) float64 5.248e+05 5.248e+05 ... 5.372e+05 5.372e+05
spatial_ref int64 0
Attributes:
scale_factor: 1.0
add_offset: 0.0
grid_mapping: spatial_ref
Background¶
VRTs are a pretty cool concept in GDAL. The basic idea is to make document that’s essentially just metadata; it points to other documents or URLs for the actual data. They’re extremely useful for creating a mosiac of many images: the VRT just has information like “this sub-dataset goes at position (x, y)
in the full dataset”.
VRTs pair extremely nicely with Dask-backed xarray DataArrays: you build up a mosaic of a whole bunch of images that just involves reading some metadata and doing some geospatial reprojections. No actual data is read. Then you can (lazily) read the actual data into an xarray DataArray for your analysis, and the separate original images can be read into separate chunks.
One downside to (large) VRTs is that they can be time-consuming to build. You’d need to make at least one HTTP requests for each file going into the VRT to read the metadata (things like the CRS, shape, and transformation).
When you’re using STAC to discover your assets, you already have all of that information avaiable. And so stac-vrt
is able to build the VRT without any additional network requests. An informal benchmark on a set of 500 images stored in Azure Blob Storage showed that gdal.BuildVRT
took about 90 seconds, while stac-vrt.build_vrt
took a handful of milliseconds.
Building stac-vrt compatible STAC Items¶
If you’re responsible for creating STAC items, stac-vrt
would appreciate if you include
These are used by stac-vrt
to build the VRT.