wiki:rfc24_progressive_data_support

Version 26 (modified by normanb, 16 years ago) ( diff )

--

RFC 24: GDAL Progressive Data Support

Author: Norman Barker
Contact: nbarker@…
Status: Development

Summary

To provide an interface for data streaming support to GDAL by overloading the RasterIO function to include a callback function when there are buffer updates. The RFC focuses on JPIP but should be generic to apply to other streaming / progressive formats.

Definitions

JPIP: JPEG 2000 Interactive Protocol

Objective

To provide a callback function to allow users of a progressive format driver to receive notifications of updates to the underlying dataset for a particular requested region of the data. The notification mechanism should be accessible to all the swig wrappers.

Implementation

The implementation is a definition of an interface, in particular an overloaded function definition for RasterIO. Concrete implementations of this interface will follow. Currently the most convenient JPIP streaming developer library is Kakadu however since GDAL is also developer library, only stubs can be distributed to conform to Kakadu licensing (JP2KAK). Commercial vendors are also interested in using GDAL for streaming support and a standard interface will allow these commercial plugins to be incorporated. e.g. ECW, MrSID.

Discussion

Tamas Szekeres - thread on gmane

Question:

It seems we are tending to spawn new threads in every RasterIO operations at driver level, which is quite inconvenient at the moment, therefore it should be considered with care. Will you allow RasterIO to be re-entered from the GDALProgFunc event handler or by another thread? Will you provide a copy of pBuff in GDALProgFunc or the same pointer will be passed back to the caller? How this buffer will be protected from the simultaneous access of the multiple threads?

Response:

There will one thread at the RasterIO driver level communicating to the server and this will be separate to application thread, the driver will accept new window requests from the application and change the requests being made to the server within the thread. There is a degree of synchronization required here, but this is hidden within the format driver.

The callback function and the raster display will be in the main application thread, it will not be possible to update the format driver window from the callback function, but will be possible using RasterIO read as normal.

pBuff will be small, it only contains the data for the currently request window at a particular resolution level and these are normally proportional, so small window implies higher resolutions, large window implies low resolution. As such an initial version will provide a copy of the buffer rather than a pointer this should protect the buffer from corruption by multiple threads.

Question:

How will the intermediary data be represented in the buffer required by the various kind of rendering methods? Will the user re-read the whole buffer in every roundtrip, or a subset of the data will be definied that have been changed in the meantime? I guess some cases only the modified scanlines or reduced resolution images would be sufficient to read.

Response:

In step 5, GDALProgFunc(xOff, yOff, xSiz, ySiz, pBuf, bufXSiz, bufYSiz, bufType, nBandCount, bandMap, nPixelSpace, nLineSpace, nBandcount) specifies the region of the image (at base resolution) that has changed, the dimensions and type of the buffer, everything needed to render the buffer data.

Question:

Interchange between the dataset and band level functions:

Response:

This is the big question, since within jpip (and other streaming protocols) this is handled at the dataset level as opposed to the band level. Within JPIP the returned bands are specified using &comps=x1,x2,x3 ... as a request to the server, it would not be desirable to make 3 separate requests to the server to make an rgb image (though you could). As such I recommend that the band level access subscribes to the parent dataset for accessing the server, and that the band data is pulled from the dataset class - this is potentially the hardest part to implement, and will be deferred until the dataset access level is complete.

Question:

Supporting the the current synchronous mode in addition to the asynchronous rendering mode.

Response:

Can you point me to some more info here, are you discussing the asynchronous and synchronous jpip streams as per the JPIP spec? If so then we would only support the synchronous mode of JPIP communication - this is the most commonly deployed.

Question:

With regard to the SWIG / C API would this be supported by means of adding a new function name to the API like GDALBeginRasterIO or GDALAsyncRasterIO for instance?

Response:

I am thinking SWIG directors, so the overloaded RasterIO function accepts a function from the calling language.

Question:

Would this driver be an extension or a replacement of the currently existing JP2KAK implementation? I can see some kind of support inside for JPIP as well however I'm not sure if it is fully functional.

Answer:

Kakadu does include support for JPIP on the client (and indeed a demonstration server), Kakadu would be a good start for implementing this driver. I would suggest a new driver called JPIPKAK just to avoid any conflict with JP2KAK. There are issues with communication layer within kakadu, but this is relatively simple to code around using a http library such cURL.

Even Rouault - thread on gmane

Question:

I would like that the dataset object to be added as the first argument of the callback, and a void* user_data to be added as the last argument

Response:

Agreed.

Question:

Is the extended version of the RasterIO() call still blocking as the current version?

Response:

This version of the RasterIO call is not blocking, the progress function is a callback which will be called when the buffer has new data. Effectively we are adding an observer (listener) to the format driver when requesting a particular window of data.

Question:

What happens if the user specifies a not NULL argument as the output buffer ( in (1) ) ? What happens if the user specifies GF_Write ? It is probably an argument for a name change, something like ProgressiveRasterIO.

Response:

Agree with the name change for clarification. There is an argument to drop the buffer and GF_READ from the function call if we are giving it a new name and not just overloading RasterIO.

Question:

Maybe it can make sense to add some way of cancelling the whole RasterIO call by providing a callback, like the standard progress callback (GDALProgressFunc in gdal.h) mechanism do ? Because the RasterIO() will spend most of the time waiting for data. It could resume from time to time to call that callback and see if the user still wants the request to be continued. It would be nice if the mechanism could provide some percentage of the total progress as it might be tedious for the user to compute that ? But that's probably not easy to define if you first update the whole request area with a low resolution, and then at higher resolutions.

Response:

Fully agree, adding a stop, amount of data transferred metadata would be very useful. The driver will use GDALProgressFunc (or will now, I hadn't thought of it until you mentioned it).

Question:

What happens if the user issues another call to RasterIO(), traditional version and/or your extended version, in the pfnProgressIO callback

Response:

The callback and the RasterIO function are in the same thread, so there are no synchronization issues here. The callback function is carrying the state (so region, res, buffer etc.) so that the display thread always has knowledge of the buffer contents. When updating the communication thread to the server, synchronization within the format driver is required.

Question:

How will that work with the block cache mechanism?

Response:

At the moment I am not planning to integrate with the block cache, the format driver will have to implement its own (wavelet) cache, this is important so that the format driver can communicate to the server the contents of its cache.

Proposed Progressive Streaming Sequence of Events

Sequence shows GDALDataset, same sequence for GDALRasterBand

Attachments (7)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.