stac-vrt is lightly maintained these days, and its use case is now better filled by other libraries:
stac-vrt can be installed from conda-forge
conda install -c conda-forge stac-vrt
or from PyPI
pip install stac-vrt
>>> import stac_vrt >>> import requests >>> stac_items = requests.get( ... "http://pct-mqe-staging.westeurope.cloudapp.azure.com/collections/usda-naip/items" ... ).json()["features"]
These STAC items contain essentially all of the information needed to build a VRT.
>>> vrt = stac_vrt.build_vrt(stac_items, data_type="Byte", block_width=512, block_height=512)
vrt variable is just a Python string that’s a valid VRT (an XML document). It can
be written to disk, or passed directly to rasterio or rioxarray (this example uses
rioxarray and also requires Dask to read the chunks in parallel).
>>> import rioxarray >>> ds = rioxarray.open_rasterio(vrt, chunks=(4, -1, "auto")) >>> ds <xarray.DataArray (band: 4, y: 11588, x: 20704)> dask.array<open_rasterio-a61f0d99384a83218d8164684d89e2db<this-array>, shape=(4, 11588, 20704), dtype=uint8, chunksize=(1, 11520, 11520), chunktype=numpy.ndarray> Coordinates: * band (band) int64 1 2 3 4 * y (y) float64 2.986e+06 2.986e+06 2.986e+06 ... 2.98e+06 2.98e+06 * x (x) float64 5.248e+05 5.248e+05 ... 5.372e+05 5.372e+05 spatial_ref int64 0 Attributes: scale_factor: 1.0 add_offset: 0.0 grid_mapping: spatial_ref
VRTs are a pretty cool concept in GDAL. The basic idea is to make document that’s essentially just metadata; it points to other documents or URLs for the actual data. They’re extremely useful for creating a mosiac of many images: the VRT just has information like “this sub-dataset goes at position
(x, y) in the full dataset”.
VRTs pair extremely nicely with Dask-backed xarray DataArrays: you build up a mosaic of a whole bunch of images that just involves reading some metadata and doing some geospatial reprojections. No actual data is read. Then you can (lazily) read the actual data into an xarray DataArray for your analysis, and the separate original images can be read into separate chunks.
One downside to (large) VRTs is that they can be time-consuming to build. You’d need to make at least one HTTP requests for each file going into the VRT to read the metadata (things like the CRS, shape, and transformation).
When you’re using STAC to discover your assets, you already have all of that information avaiable. And so
stac-vrt is able to build the VRT without any additional network requests. An informal benchmark on a set of 500 images stored in Azure Blob Storage showed that
gdal.BuildVRT took about 90 seconds, while
stac-vrt.build_vrt took a handful of milliseconds.