Changes between Initial Version and Version 1 of Conda_GDAL_lite


Ignore:
Timestamp:
08/03/24 21:55:55 (3 months ago)
Author:
darkblueb
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Conda_GDAL_lite

    v1 v1  
     1Introducing Lightweight Versions of GDAL and PDAL
     2Quansight
     3· Jul 25, 2024
     4
     5See how Hobu teamed with Quansight to fund the transition to a deferred plugin system in both GDAL and PDAL. The new architecture was implemented in GDAL 3.9.1 and PDAL 2.7.2.
     6
     7This article was originally published on the Quansight Blog by Isuru Fernando.
     8
     9The evolution of geospatial data processing has taken a significant step forward with the introduction of lightweight versions of the Geospatial Data Abstraction Library (GDAL) and the Point Data Abstraction Library (PDAL). This new architecture addresses the long-standing issue of dependency bloat, significantly improving solve times, download speeds, and overall package manageability for users. This post delves into the history, technical implementation, and benefits of this transition.
     10
     11GDAL (Geospatial Data Abstraction Library) is a translator library for raster and vector geospatial data formats. Being a translator library supporting multiple different geospatial data formats, it has a lot of libraries as dependencies. For example The `hdf5` for HDF5 package format support. PDAL (Point Data Abstraction Library) is a library built on top of GDAL and has similar support for package formats.
     12
     13A Little Bit of History
     14conda-forge was started by a few people, including a couple of oceanographers, who wanted a way to distribute gdal easily. Hence `gdal-feedstock` is one of the first feedstocks to be made on conda-forge and was the 49th PR on staged-recipes. The initial commit to the `gdal-feedstock` which builds the conda package, only used a few packages, including `hdf4`, `hdf5`, , `postgresql`, `libnetcdf`, `kealib`.
     15
     16Since then, more dependencies have been added to the gdal conda package and it has now grown to 113 direct and indirect dependencies (numbers based on macOS, JUL 2024). With the huge number of dependencies, the solve times and download times have increased, and images created from these conda packages are unwieldy.
     17
     18This is where the partnership with Hobu and Quansight comes in to fund the transition to a deferred plugin system in both GDAL and PDAL. The new architecture was implemented in GDAL 3.9.1 and PDAL 2.7.2.
     19
     20Deferred C++ plugin loading
     21GDAL RFC 96 enables the support of deferred plugins. Plugins in GDAL support the various raster and vector geospatial data formats. These plugins are usually built into the core library, `libgdal.(dylib/so/dll)` , but RFC 96 introduced deferred plugins that build these plugins separately such that only the necessary plugin dependencies are needed.
     22
     23For example, instead of HDF5 being a dependency of `libgdal.(dylib/so/dll)` , there’s a new `gdal_HDF5.(dylib/so/dll)` which has an HDF5 dependency and is loaded by the libgdal core library.
     24
     25This allows us to package the plugins as separate conda packages and therefore the core library can remain small while enabling full functionality of GDAL through these plugins. A nice feature of RFC 96 is that the core libgdal library will output a customizable error message when a plugin fails to load. For example when the hdf5 plugin is in a separate package called libgdal-hdf5, we can introduce an error message that says
     26
     27You may install it with ‘conda install -c conda-forge libgdal-hdf5’.
     28This concept was first used for `libarrow/libparquet` dependency since it is a large dependency and especially because gdal supports four different major versions on conda-forge. By separating this dependency, only the plugin needs to be built for the four different arrow/parquet versions as opposed to the core libgdal library being built for the four different versions. The conda package for the plugin was called `libgdal-arrow-parquet` and depended on the core library conda package `libgdal` which included the rest of the plugins.
     29
     30libgdal-core and libgdal
     31In order to generalize the above strategy to more plugins, we are now introducing a `libgdal-core` conda package and more plugins as conda packages with all plugins (except arrow/parquet) being installable with `libgdal` . We also made the python bindings depend on `libgdal-core` instead of `libgdal` so that users can select the plugins that they need.
     32
     33gdal conda packages
     34
     35– `libgdal-core` — core C++ library
     36– `libgdal` — core C++ library and all plugins
     37– `gdal` — python library without the plugins
     38
     39gdal plugin conda packages
     40
     41– `libgdal-arrow-parquet` : `vector.arrow` and `vector.parquet` drivers as a plugin
     42– `libgdal-fits` : `raster.fits` driver as a plugin
     43– `libgdal-grib` : `raster.grib` driver as a plugin
     44– `libgdal-hdf4` : `raster.hdf4` driver as a plugin
     45– `libgdal-hdf5` : `raster.hdf5` driver as a plugin
     46– `libgdal-jp2openjpeg` :`raster.jp2openjpeg` driver as a plugin
     47– `libgdal-kea` : `raster.kea` driver as a plugin
     48– `libgdal-netcdf`: `raster.netcdf` driver as a plugin
     49– `libgdal-pdf`: `raster.pdf` driver as a plugin
     50– `libgdal-postgisraster`: `raster.postgisraster` driver as a plugin
     51– `libgdal-pg`: `vector.pg` driver as a plugin
     52– `libgdal-tiledb` : `raster.tiledb` driver as a plugin
     53– `libgdal-xls`: `vector.xls` driver as a plugin
     54
     55`libgdal` has 113 direct/indirect dependencies, but `libgdal-core` has only 48 direct/indirect dependencies.
     56
     57If you are missing plugins with the new split, you can install all the plugins by running:
     58
     59conda install libgdal
     60To install all the plugins or install individual plugins:
     61
     62conda install libgdal-hdf5
     63Currently only the python bindings `gdal` depend on `libgdal-core` and in the future more and more downstream packages of `libgdal` will depend on `libgdal-core` and individual plugins needed for their usage. Therefore we recommend either installing `libgdal` or explicitly installing the individual plugins.
     64
     65We looked at the install times for `libgdal` vs `libgdal-core` on Github actions and `libgdal-core` was faster. We also noticed that `libboost-headers` was being pulled by `libkml` which is only needed for development. We split the `libkml` conda package into `libkml` and `libkml-devel` so that end users are not going to end up with the `libboost-headers` which has thousands of header files.
     66
     67Note that the timings are from a quick testing on Github actions and not formal benchmarking.
     68
     69libpdal and libpdal-core
     70Similar to `libgdal` and `libgdal-core` , we have introduced `libgdal` and `libgdal-core` conda packages. Previously the `pdal` conda package provided only the C++ library, but now it also provides the python package to match the `gdal` conda package.
     71
     72pdal conda packages
     73
     74– `libpdal-core` — core C++ library
     75– `libpdal` — core C++ library and all plugins
     76– `pdal-python` — python library without the plugins
     77– `pdal` — python library and all plugins
     78
     79pdal plugin conda packages
     80
     81– `libpdal-trajectory` : `filters.trajectory` driver as a plugin
     82– `libpdal-hdf` : `readers.hdf` driver as a plugin
     83– `libpdal-tiledb`: `readers.tiledb`, `writers.tiledb` driver as a plugin
     84– `libpdal-pgpointcloud`: `readers.pgpointcloud` driver as a plugin
     85– `libpdal-draco` : `readers.draco`, `writers.draco` driver as a plugin
     86– `libpdal-arrow`: `readers.arrow`, `writers.arrow` driver as a plugin
     87– `libpdal-nitf`: `readers.nitf` driver as a plugin
     88– `libpdal-e57`: `readers.e57`, `writers.e57` driver as a plugin
     89– `libpdal-icebridge`: `readers.icebridge` driver as a plugin
     90– `libpdal-cpd` : `filters.cpd` driver as a plugin
     91
     92The shift to a deferred plugin system in GDAL and PDAL is a pivotal moment in geospatial data processing, offering a more efficient and streamlined approach to handling dependencies. By enabling the separation of core libraries and plugins, users can now enjoy faster installation times and a more manageable set of dependencies tailored to their specific needs. The collaboration between Hobu and Quansight has not only modernized these essential libraries but has also set a new standard for the development and distribution of geospatial tools.
     93
     94Acknowledgements
     95This work was funded by Hobu, Inc in collaboration with Quansight, Inc.