Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#2217 closed defect (fixed)

[raster] ST_AsBinary semantic discrepancy

Reported by: mloskot Owned by: Bborie Park
Priority: medium Milestone: PostGIS 2.1.0
Component: raster Version: master
Keywords: history Cc:

Description (last modified by Bborie Park)

Originally posted to postgis-devel in ST_Binary semantic thread, copied below.

Currently (SVN trunk), ST_AsBinary manifests dual nature, depending on in-db or out-db storage of raster data.

  • For in-db, ST_AsBinary returns complete blob with raster data in WKB format.
  • For out-db, ST_AsBinary returns the WKB header followed by local file path.

Assuming the WKB format is canonical representation, this ST_AsBinary manifests valid behaviour.

However, in the out-db case and from usability point of view, transporting local filesystem path does not make much sense.

I would argue, that canonical representation of WKB doesn't necessarily have to match ST_AsBinary output. It seems, currently we have made mistake in the fundamental behaviour.

Quick discussion with Regina, Sandro and Even on IRC confirms this argument is valid and common sense would suggest to most users that ST_Binary should output the same raster data, regardless storage mode on the server side.

Regina also commented that correction may or will potentially affect current users.

<robe2>proposal adversely affects prior backup and restore behavior
<robe2>Well the good thing is that I don't think too many people are
using out db in 2.0  (they might be using raster though).
2.1 has much more robust outdb
<robe2> So I think if we can catch it in 2.1 (even if it breaks 
out-db backward compatibility), it might be okay
<robe2> 3rd party readers are the MOST important folks to satisfy in my mind.
Because if you guys have difficulty supporting out of the box, no one will use it
<robe2> well GDAL a lot of third parties use indirectly so that goes without saying
<robe2> mloskot: anyrate like I said I think given we didn't really test 2.0 
out db and I have a feeling its pretty broken so as long as the change doesn't
affect in db, we'll be fine even if breaking for out db
<robe2> I mean in 2.1 (doing it in 2.1)

The choice between affecting current users of out-db raster storage (are there many?) and correcting API for correctness adn consistency in future uses is a difficult decision. Let's figure out how to handle it, please.

Change History (27)

comment:1 by Bborie Park, 11 years ago

Description: modified (diff)
Summary: [raster] ST_Binary semantic discrepancy[raster] ST_AsBinary semantic discrepancy

changed ST_Binary to ST_AsBinary…

comment:2 by mloskot, 11 years ago

@dustymugs Thanks for correction and sorry for perhaps confusing mistake.

comment:3 by Bborie Park, 11 years ago

From my initial read of RFC-2, it looks like the WKB output is following spec. Bands that has the parameter isOffline set are returning the out-db band number and path…

comment:4 by pracine, 11 years ago

Out-db pixel values were planned to be read directly, from the filesystem, by the invoking application. Now that we support seamless reading of pixel values for out-db some might want to read them seamlessly through the db. We could add a boolean option to ST_AsBinary() so it returns the raster as if it was in-db. Another option is to have a general way to make out-db rasters in-db: ST_MakeInDB(raster)…

comment:5 by pracine, 11 years ago

That was part of Objective FV.18…

comment:6 by Bborie Park, 11 years ago

I like the idea of adding a flag to ST_AsBinary(), like include_outdb_data = TRUE. ST_MakeInDB() is also a nifty idea…

in reply to:  3 ; comment:7 by mloskot, 11 years ago

Replying to dustymugs:

From my initial read of RFC-2, it looks like the WKB output is following spec

Please, let me repeat what has been also confirmed with Sandro:

That canonical representation of WKB doesn't necessarily have to match ST_AsBinary output.

It's also important to keep in mind, that this RFC marks very early stage of low-level design and does not discuss or even consider public APIs that followed.

in reply to:  7 ; comment:8 by Bborie Park, 11 years ago

Replying to mloskot:

Replying to dustymugs:

From my initial read of RFC-2, it looks like the WKB output is following spec

Please, let me repeat what has been also confirmed with Sandro:

That canonical representation of WKB doesn't necessarily have to match ST_AsBinary output.

It's also important to keep in mind, that this RFC marks very early stage of low-level design and does not discuss or even consider public APIs that followed.

So, the WKB spec is open to change?

in reply to:  4 comment:9 by mloskot, 11 years ago

Replying to pracine:

Out-db pixel values were planned to be read directly, from the filesystem, by the invoking application.

Pierre, how it was planned? Assuming remote access, such design is unfeasible to realise, unless we consider remote access to filesystem directly.

Now that we support seamless reading of pixel values for out-db some might want to read them seamlessly through the db.

It's not now, it has been clear from the very beginning that SQL API will allow seemless integration, hasn't it?

We could add a boolean option to ST_AsBinary() so it returns the raster as if it was in-db. Another option is to have a general way to make out-db rasters in-db: ST_MakeInDB(raster)…

This is not about making out-db rasters in-db, please don't confused such case.

Simply put, if we keep ST_AsBinary return filesystem path, it will make ST_AsBinary effectively useless for out-db mode.

in reply to:  8 comment:10 by mloskot, 11 years ago

Replying to dustymugs:

Replying to mloskot:

Replying to dustymugs:

From my initial read of RFC-2, it looks like the WKB output is following spec

Please, let me repeat what has been also confirmed with Sandro:

That canonical representation of WKB doesn't necessarily have to match ST_AsBinary output.

It's also important to keep in mind, that this RFC marks very early stage of low-level design and does not discuss or even consider public APIs that followed.

So, the WKB spec is open to change?

I'm not saying any such thing, I'm saying that WKB storage is one thing, public API is the other.

in reply to:  6 comment:11 by mloskot, 11 years ago

Replying to dustymugs:

I like the idea of adding a flag to ST_AsBinary(), like include_outdb_data = TRUE. ST_MakeInDB() is also a nifty idea…

It's not only about technical fix, as it is easy peasy to fix from such point. It is about considet discoverable decent quality API.

I can bet Pierre's 5K UDS majority of users will expact this simple query to behave exactly the same way, regardless storage mode on the database server:

SELECT ST_AsBinary(rast) FROM t1;

comment:12 by pracine, 11 years ago

So if I understand Mat well, having ST_AsBinary return filesystem path is useless and undesirable? What about an application that would reside on the same server as the DB and would like to read the raster directly (because they are already in JPEG for exmaple). And let's suppose there are also some raster in-db. How can this application know how to read the right file? It has to query for ST_BandPath() for each tables? Or it call for ST_AsBinary() with the switch set to return the path so it does not have to do an extra query (ST_BandPath()) to know if it is in-db or out-db. If someone never wants to read out-db directly, he just call ST_AsBinary() with the switch set to always return in-db like data.

comment:13 by pracine, 11 years ago

There is no reason we can not make the default behavior to always return "like in-db data". The "like out-db data" could be optional.

comment:14 by Bborie Park, 11 years ago

So, the WKB spec is open to change?

I'm not saying any such thing, I'm saying that WKB storage is one thing, public API is the other.

So what is the binary output of the public API if not WKB as defined in that doc?

I'm not trying to be argumentative here. I'm well aware that RFC-2 in trunk is incomplete as it doesn't provide a facility to output the out-db band data. The serialized storage RFC-1 is a separate format that just happens to match up at this time.

If anything, the WKB should be changed to permit output of pixel values.

comment:15 by mloskot, 11 years ago

@pracine Database communication is based on client-server model. The fact that you can put application and database on the same machine is irrelevant here, because we discuss public SQL API which is supposed to provide seamless means of access. I'd also argue, that in distributed systems, in a cloud, returning local path is useless indeed. It is not responsibility of application to read the file registered in PostGIS database as out-db raster, but it is responsibility of PostGIS to serve that file. The application does not want to distinguish storage modes in the server.

IMHO, you seem to be trying to mix in data server and data client into single solution. Nothing with that as long as the data server does not make assumptions about local data access.

From your perspective, instead of having geometry functions in PostGIS passing blobs of data, they simply could juggle IDs/pointers/indexes to serialised storage only without any real data exchange.

@dustymags It is defined in that doc, but it does not consider SQL API. There is no need for change, as the WKB already defines all we need in the first part.

I don't want to make yet another discussion ride, about API design, use cases, and other things you don't necessarily would like to hear.

Let's turn this ticket this way:

  • I'm a user.
  • I have table with raster registered as out-db
  • I call SQL API.

How can I query and fetch complete blob of WKB data for the whole raster?

in reply to:  15 ; comment:16 by Bborie Park, 11 years ago

Let's turn this ticket this way:

  • I'm a user.
  • I have table with raster registered as out-db
  • I call SQL API.

How can I query and fetch complete blob of WKB data for the whole raster?

Convert the raster with out-db bands to in-db (this functionality does not exist) and then call ST_AsBinary()? That would abide by the WKB format…

in reply to:  15 ; comment:17 by pracine, 11 years ago

Replying to mloskot:

@pracine Database communication is based on client-server model. The fact that you can put application and database on the same machine is irrelevant here, because we discuss public SQL API which is supposed to provide seamless means of access.

It is true that out-db and in-db are becoming more and more seamless but you still can't edit out-db. In the beginning there was no plan to be able to access out-db pixel values. out-db has been put there just to satisfy web based application developers scared that putting the data in the db would result in poorer reading performance.

I'd also argue, that in distributed systems, in a cloud, returning local path is useless indeed. It is not responsibility of application to read the file registered in PostGIS database as out-db raster, but it is responsibility of PostGIS to serve that file. The application does not want to distinguish storage modes in the server.

So even ST_BandPath() would be irrelevant in that case?

So why storing raster outside the database then?

in reply to:  16 ; comment:18 by mloskot, 11 years ago

Replying to dustymugs:

Let's turn this ticket this way:

  • I'm a user.
  • I have table with raster registered as out-db
  • I call SQL API.

How can I query and fetch complete blob of WKB data for the whole raster?

Convert the raster with out-db bands to in-db (this functionality does not exist) and then call ST_AsBinary()? That would abide by the WKB format…

I'm sorry, but IMHO you simply don't understand the purpose of this format. The flag if data is stored in-database or out-database is for sole use by the raster implementation in PostGIS. It is for PostGIS to be able to locate physical storage, not for client (user of public SQL API). It is not even of interested of any client. Fullstop.

comment:19 by pracine, 11 years ago

There we go… Mat understand everything he did not design! Then what was the purpose of ST_BandPath()? Tell me please so I understand the purpose of PostGIS raster!

in reply to:  17 comment:20 by mloskot, 11 years ago

Replying to pracine:

Replying to mloskot:

@pracine Database communication is based on client-server model. The fact that you can I'd also argue, that in distributed systems, in a cloud, returning local path is useless indeed. It is not responsibility of application to read the file registered in PostGIS database as out-db raster, but it is responsibility of PostGIS to serve that file. The application does not want to distinguish storage modes in the server.

So even ST_BandPath() would be irrelevant in that case?

IMO, yes.

So why storing raster outside the database then?

For flexibility of storage, data organisation, performance tuning. PostgreSQL I/O to reach blobs of data is never as fast as direct filesystem I/O.

in reply to:  19 comment:21 by mloskot, 11 years ago

Replying to pracine:

There we go… Mat understand everything he did not design! Then what was the purpose of ST_BandPath()? Tell me please so I understand the purpose of PostGIS raster!

Sandro explained to me that the RFC for WKB does not discuss or investigate out-db mode. Second, use of local paths is such obvious no-no, that it can't be seriously considered. Whoever considers it for public SQL API use cases like in PostGIS lacks of understanding here. PostgreSQL is not yet another embedded SQLite, is it.

in reply to:  18 comment:22 by Bborie Park, 11 years ago

Replying to mloskot:

Replying to dustymugs:

Let's turn this ticket this way:

  • I'm a user.
  • I have table with raster registered as out-db
  • I call SQL API.

How can I query and fetch complete blob of WKB data for the whole raster?

Convert the raster with out-db bands to in-db (this functionality does not exist) and then call ST_AsBinary()? That would abide by the WKB format…

I'm sorry, but IMHO you simply don't understand the purpose of this format. The flag if data is stored in-database or out-database is for sole use by the raster implementation in PostGIS. It is for PostGIS to be able to locate physical storage, not for client (user of public SQL API). It is not even of interested of any client. Fullstop.

I would consider the first sentence of the RFC describing WKB as explicit as can be.

The WKB format for RASTER is meant for transport.

If the WKB is kept the same, then what should the output from ST_Binary be? Another format? If so, please define that format.

comment:23 by Bborie Park, 11 years ago

The WKB format is currently defined (version 0) is unable to properly handle out-db raster bands for clients. Ticket for correcting this is #2222

comment:24 by Bborie Park, 11 years ago

Milestone: PostGIS 2.1.0

comment:25 by Bborie Park, 11 years ago

Status: newassigned

comment:26 by Bborie Park, 11 years ago

Keywords: history added
Resolution: fixed
Status: assignedclosed

Resolved as of r11320.

comment:27 by mloskot, 10 years ago

@dustymags Delayed, but sincere thanks!

Note: See TracTickets for help on using tickets.