Opened 18 years ago

Last modified 18 years ago

#1230 closed defect (fixed)

crash: PostGIS SQL queries in OGR

Reported by: amagnum2@… Owned by: Mateusz Łoskot
Priority: highest Milestone:
Component: default Version: unspecified
Severity: major Keywords:
Cc: amagnum2@…

Description

If I enter a simple query such as select * from layer, ogr may crash.

The bug occurs if the layer has an SRID attched , and does not occur otherwise.

When we specify "select * from layer" then the database selects the output format of the geometry -it chooses the PostGIS specific EWKB format as documented in PostGIS documentation http://postgis.refractions.net/docs/ch04.html#id2522847.

But when OGR reads the geometry it assumes it is WKB.
This can be seen in OGRPGLayer::HEXToGeometry which converts the output from hex to binary and then calls OGRGeometryFactory::createFromWkb. The problem is, that this is not WKB it is EWKB and is formatted differently.
OGRGeometryFactory::createFromWkb looks for the size of the linestring in the wrong place , reads the SRID and jumps out of bounds and crashes.
If the table does not have SRID, the EWKT is sent without SRID and all is well.

As for severity, this bug can be worked around by putting in the field names in the query, and outputting the geometry as WKB or WKT or even EWKT. However, it prevents people from using a simle query such as "select * from layer" if the table has SRID.

I don't see a simple solution to this one, since EWKB is not a standard format and it may not be worthwhile to support it in OGR for all geometries. Sadly I could find a way to make PostGIS output the geometry column in WKB which would also solve the problem.

My testing environment is the latsest PGSQL and PostGIS e.g. 8.1.4 and 1.1.3 with GDAL 1.3.2.0 from FWtools as well as a CVS snaphsot from 6/6/2006

Attachments (3)

sig.zip (203.9 KB ) - added by amagnum2@… 18 years ago.
PGSQL database backup
ewkb_test-dump.sql.tar.gz (122.6 KB ) - added by Mateusz Łoskot 18 years ago.
ewkb_test database dump file
test_data.sql (75.6 KB ) - added by amagnum2@… 18 years ago.
test data

Download all attachments as: .zip

Change History (19)

comment:1 by warmerdam, 18 years ago

Ariel, 

Would you mind a quick verification with the latest FWTools (1.0.5)?
There were some fixes in recent months with EWKT support that *may* 
have dealh this already.

by amagnum2@…, 18 years ago

Attachment: sig.zip added

PGSQL database backup

comment:2 by amagnum2@…, 18 years ago

(In reply to comment #1)
> Ariel, 
> Would you mind a quick verification with the latest FWTools (1.0.5)?
> There were some fixes in recent months with EWKT support that *may* 
> have dealh this already.

The crash is also there. I have attached a database backup containing some geo data.
To crash , restore the database and do
ogrinfo "PG:..." -sql "select * from ci_section"

comment:3 by warmerdam, 18 years ago

Mateusz,

Could you dig into this issue please?

Thanks

comment:4 by Mateusz Łoskot, 18 years ago

This bug is closely related to how OGR handles EWKB bytes stream.
The Bug 1195 fix is coming today.

by Mateusz Łoskot, 18 years ago

Attachment: ewkb_test-dump.sql.tar.gz added

ewkb_test database dump file

comment:5 by Mateusz Łoskot, 18 years ago

(In reply to comment #6)
> Created an attachment (id=383) [edit]
> ewkb_test database dump file
> 
> This is my database to test how OGR handles EWKB streams.
> Data exported from this database is also used by my tests in Bug 1195

This is the result I get querying my ewkb_test database with ogrinfo + my fix applied:

mloskot:~/dev/gdal/_cvs/gdal$ ogr/ogrinfo -ro PG:dbname=ewkb_test -sql "select * from ewkt_srid"
KML: Attempt to open: PG:dbname=ewkb_test
OGR_PG: DBName="ewkb_test"
OGR_PG: PostSIS version string: '1.1 USE_GEOS=1 USE_PROJ=1 USE_STATS=1' -> '1.1'
OGR_PG: POSTGIS_VERSION=1.1 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
OGR_PG: Layer 'ewkt_2dim' geometry type: GEOMETRY:Unknown (any), Dim=2
OGR_PG: Layer 'ewkt_3dim' geometry type: GEOMETRY:Unknown (any), Dim=3
OGR_PG: Layer 'ewkt_4dim' geometry type: GEOMETRY:Unknown (any), Dim=3
OGR_PG: Layer 'ewkt_srid' geometry type: GEOMETRY:Unknown (any), Dim=3
OGR: OGROpen(PG:dbname=ewkb_test/0x804e278) succeeded as PostgreSQL.
INFO: Open of `PG:dbname=ewkb_test'
      using driver `PostgreSQL' successful.
OGR_PG: PQexec(select * from ewkt_srid)
OGR_PG: Command Results Tuples = 7

Layer name: sql_statement
Geometry: Unknown (any)
Feature Count: 7
OGR_PG: PQexec(DECLARE OGRPGResultLayerReader CURSOR for select * from ewkt_srid)
OGR: OGRGeometryFactory::createFromWkb() - got corrupt data.
0200000001010000

OGR: OGRLinearRing points buffer is too small!
        WKB stream may be corrupted or it is a EWKB stream which is not supported
OGR: OGRGeometryFactory::createFromWkb() - got corrupt data.
0200000001020000

ERROR 2: CPLCalloc(): Out of memory allocating 8388608 bytes.

Aborted
mloskot:~/dev/gdal/_cvs/gdal$


Unfortunately, I'm not sure how can I catch CPLCalloc errors you see above.
The same error is expected to occur when running my test application attached to the Bug 1195. This "Out of memory" is expected :-)


comment:6 by Mateusz Łoskot, 18 years ago

I've just commited changes fixing this bug.

by amagnum2@…, 18 years ago

Attachment: test_data.sql added

test data

comment:7 by amagnum2@…, 18 years ago

Unfortunately, it does not work for me. The error message is:
ERROR 1: Length of input WKB is too small
Shortly afterwards, the process crashes.

Here is an example of the data (you can find more attached).

This is HEX representation of the geometry column.
There are 35 points (23 in hex)
The SRID is 4326 (10E6 in hex)
The geometry type is 2d linestring (typeid 2).

0102000020E610000023000000728A07CF9C76414082BE168914EE3F40C9B1A7B49A764140703AF03C03EE3F40ADE7CC9898764140621D4EF1F1ED3F40FF0A186396764140B9AF32AFE0ED3F404972FA149476414033DA1576CFED3F404DBF7FE8917641403013B730BEED3F409FA586D38F76414005FEABE2ACED3F40373577FA8D7641407408FA819BED3F40FC8B578C8C7641404DA055048AED3F403B56CBAD8B764140120AE86C78ED3F402D8A552F8B764140ABCCAFC866ED3F4098600A5B8B7641401470162555ED3F40D9E083DA8B764140FB53F08043ED3F40F5C9496A8C76414058C450DE31ED3F4050D7B11F8E764140F21FB97E20ED3F409402E4D98F764140354334170FED3F40542A110C92764140E53FFFD9FDEC3F40CE578FEF93764140F1089686F2EC3F40657E0BB5977641401AB765D902ED3F401499DE3E9B7641407116097513ED3F40E902798E9E764140E35F593324ED3F404CD842B2A1764140E1ECF70935ED3F40FE53FF1FA47641402859B43446ED3F400DBDCB9BA67641407EFB5E5B57ED3F405A56E382A876414032D731B768ED3F40C882CD33AA7641407F9C7E247AED3F40CD297B0FAC764140F4522A858BED3F4050A5F9E2AD764140C6CE75E89CED3F40103FDEBFAF7641407FF8DF48AEED3F4071561487B17641409C51CAAFBFED3F401B00E91BB3764140AF9E9B24D1ED3F40D2CBBFABB4764140C6BAD59AE2ED3F405E941543B67641407F8C1F0FF4ED3F4074CB9BBEB776414011C46C8A05EE3F40EB5284DEB87641408E38AD2811EE3F40

The outcome should be:
LINESTRING (34.9266604220219 31.9300008469131, 34.9265962427922 31.9297369085784, 34.9265318871033 31.929473001071, 34.9264644496343 31.9292096613642, 34.9263941023715 31.9289468577719, 34.9263277648703 31.9286833235703, 34.9262642295318 31.9284192724135, 34.9262078363485 31.9281541095192, 34.9261641910052 31.9278872212456, 34.9261376612526 31.927618796015, 34.9261225860285 31.9273496083448, 34.926127796237 31.9270804576664, 34.9261429923951 31.9268112742175, 34.9261601314828 31.9265421817002, 34.9262122743954 31.926277084538, 34.9262649882632 31.9260115149111, 34.926332004911 31.9257484672707, 34.9263896417539 31.9255756489083, 34.9265047365305 31.9258247254685, 34.9266127192406 31.9260781428515, 34.9267137614132 31.9263336270523, 34.9268095804588 31.9265905599867, 34.9268836971969 31.9268525066665, 34.9269594902818 31.9271142107859, 34.9270175561948 31.9273790833169, 34.9270691636535 31.9276449975118, 34.9271258689292 31.9279101589518, 34.9271815985654 31.9281754768006, 34.9272384486377 31.9284406229785, 34.9272927140527 31.9287061566298, 34.9273409736345 31.9289725189143, 34.9273886381631 31.9292389652062, 34.9274371962899 31.9295052959465, 34.9274824391095 31.9297720447103, 34.9275167604554 31.929949323931)

comment:8 by Mateusz Łoskot, 18 years ago

(In reply to comment #10)
> Unfortunately, it does not work for me. The error message is:
> ERROR 1: Length of input WKB is too small

This error message says:
- OGR has detected non-standard WKB bytes stream, there is something
- OGR checked if number of points fetched/read from the WKB bytes stream
is far too much in comparison of the length of the buffer passed to.

>
> Shortly afterwards, the process crashes.
>

What does it mean it 'crashes'?
Are you getting segmentation faults, etc.?
Or just OGR exists without providing any meaningful results for your operation?


> Here is an example of the data (you can find more attached).

Please, could you provide CREATE TABLE instruction so I'll be able to reproduce 
the same database structure in my env?

> This is HEX representation of the geometry column.
> There are 35 points (23 in hex)
> The SRID is 4326 (10E6 in hex)
> The geometry type is 2d linestring (typeid 2).
> 
>0102000020E610000023000000728A07CF9C76414082BE168914EE3F40C9B1A7B49A764140703AF03C03EE3F40ADE7CC9898764140621D4EF1F1ED3F40FF0A186396764140B9AF32AFE0ED3F404972FA149476414033DA1576CFED3F404DBF7FE8917641403013B730BEED3F409FA586D38F76414005FEABE2ACED3F40373577FA8D7641407408FA819BED3F40FC8B578C8C7641404DA055048AED3F403B56CBAD8B764140120AE86C78ED3F402D8A552F8B764140ABCCAFC866ED3F4098600A5B8B7641401470162555ED3F40D9E083DA8B764140FB53F08043ED3F40F5C9496A8C76414058C450DE31ED3F4050D7B11F8E764140F21FB97E20ED3F409402E4D98F764140354334170FED3F40542A110C92764140E53FFFD9FDEC3F40CE578FEF93764140F1089686F2EC3F40657E0BB5977641401AB765D902ED3F401499DE3E9B7641407116097513ED3F40E902798E9E764140E35F593324ED3F404CD842B2A1764140E1ECF70935ED3F40FE53FF1FA47641402859B43446ED3F400DBDCB9BA67641407EFB5E5B57ED3F405A56E382A876414032D731B768ED3F40C882CD33AA7641407F9C7E247AED3F40CD297B0FAC764140F4522A858BED3F4050A5F9E2AD764140C6CE75E89CED3F40103FDEBFAF7641407FF8DF48AEED3F4071561487B17641409C51CAAFBFED3F401B00E91BB3764140AF9E9B24D1ED3F40D2CBBFABB4764140C6BAD59AE2ED3F405E941543B67641407F8C1F0FF4ED3F4074CB9BBEB776414011C46C8A05EE3F40EB5284DEB87641408E38AD2811EE3F40

Ariel, as I said that's the problem.
If you're passing OGR with junk (EWKB is junk for OGR) then you'll likely
get junk or nothing meaningful back.

EWKT and EWKB place SRID at the beginning of data, but OGC WKB doesn't.
Here is exmaple: SRID=4;POINT(0 0).

The procedure is as follows:
OGR thinks your EWKB is OGC WKB stream and reads first four bytes as number of 
points in linestring or polygon ring.
But, instead of number of points, OGR gets SRID number, usually big
number - 4326 in your case.
So, OGR thinks that there are 4326 points in your WKB buffer (EWKB in fact).
Next, OGR checks if your buffer is length enough to store 4326 points (3D or 2D). If the buffer is too small, and often it be too small because SRIDs are bigh numbers, then it reports error message and exists with OGRERR_NOT_ENOUGH_DATA error code.
Here is small snipped that illustrates the procedure:

////////////////////////////////////////////////////////////////////////
    /* Check if the wkb stream buffer is big enough to store
     * fetched number of points.
     * 16 or 24 - size of point structure
     */
    size_t nPointSize = (b3D ? 24 : 16);
    size_t nBufferMinSize = nPointSize * nNewNumPoints;
   
    CPLDebug( "OGR", "WKB Buffer check:\n\tnNewNumPoints=%d\n\tnBufferMinSize=%u\n\tnBytesAvailable=%d\n",
              nNewNumPoints, nBufferMinSize, nBytesAvailable);

    if( nBufferMinSize > nBytesAvailable )
    {
        CPLError( CE_Failure, CPLE_AppDefined,
                  "Length of input WKB is too small" );
        return OGRERR_NOT_ENOUGH_DATA;
    }
////////////////////////////////////////////////////////////////////////

Unfortunately, we're not able to handle EWKB bytes stream, but just detect it as something is wrong - corrupted WKB stream - and gracefully exits.
That's how we've discussed and decided to handle this issue with Frank.

Frank, could you confirm or correct what I've written above? Thank you.

comment:9 by Mateusz Łoskot, 18 years ago

Ariel, AFAIR from talks with strk and pramsey on IRC #postgis,
EKWB was invented to be used internally in PostGIS applications
but not to become a new standard widely supported by other applications.
So, it may be unsafe to rely on it and I'm not sure if it is/will be widely supported by other spatial products.
I also believe that's the reason why OGR is not much interested in
incorporating EWKB support, but here Frank will know better.

Cheers

comment:10 by amagnum2@…, 18 years ago

Mateusz, 

I agree with your Analisis, except that the error detection indeed causes a crash rather than a graceful exit. Note that I had overriden the CPL error handler to throw a C++ exception so that is probably why I ended up in a crash when the process should have ended gracefully as you had designed it to.

The optimal solution should be, that PostGIS return the standard WKT from a query such as select * from X, in order for that to happen, the people at PostGIS need to understand the flaw and fix it. I think its a poor decision by them that the default be this internal representation, especially for something like PostGIS which endevaours to be standard...

So I don't mind if the bug is closed, since I have a workaround, perhaps it would be better to document this in the postgis driver, since this is where people look first.

comment:11 by Mateusz Łoskot, 18 years ago

(In reply to comment #13)
> Mateusz, 
> 
> I agree with your Analisis, except that the error detection indeed causes a
> crash rather than a graceful exit.

Ariel, the idea is to return OGRERR_NOT_ENOUGH_DATA error code from OGR functions if are passed with EWKB streams.
Next, user is able to react properly in his program.

> Note that I had overriden the CPL error
> handler to throw a C++ exception so that is probably why I ended up in a crash
> when the process should have ended gracefully as you had designed it to.

If you're not catching thrown exception, then the program will terminate.

> The optimal solution should be, that PostGIS return the standard WKT from a
> query such as select * from X, in order for that to happen, the people at
> PostGIS need to understand the flaw and fix it.

But PostGIS returns OGC WKT and WKB, so I'm not sure I understand your problem.
Simply, if you want to get PostGIS geometries encoded according to OGC 
standards, unless you asks for Extended WKB or WKT explicitly.
In order to get OGC geometries, use asBinary(geometry) and asText(geometry) 
functions, but don't use asEWKB(geometry) or asEWKT(geometry),
and everything should work well.

> I think its a poor decision by them that the default be this internal
> representation, especially for something like PostGIS which endevaours
> to be standard...

AFAIK user does not deal with EWKB/EWKT unless he asks for it.
PostGIS manuals in "4.1.2. PostGIS EWKB, EWKT and Canonical Forms" says:

"Thus you SHOULD NOT rely on this feature!"

So, are you using asEWKB explicitly?
If you're not, but you still gets EWKB, that may mean problems/bug in PostGIS.

> So I don't mind if the bug is closed, since I have a workaround, perhaps it
> would be better to document this in the postgis driver, since this is where
> people look first.

It's documented in PostGIS manual.

comment:12 by Mateusz Łoskot, 18 years ago

Ariel has agreed to close this bug, so I'm doing so.

comment:13 by amagnum2@…, 18 years ago

(In reply to comment #14)
> (In reply to comment #13)
> > Mateusz, 


> > The optimal solution should be, that PostGIS return the standard WKT from a
> > query such as select * from X, in order for that to happen, the people at
> > PostGIS need to understand the flaw and fix it.
> But PostGIS returns OGC WKT and WKB, so I'm not sure I understand your problem.
> Simply, if you want to get PostGIS geometries encoded according to OGC 
> standards, unless you asks for Extended WKB or WKT explicitly.
> In order to get OGC geometries, use asBinary(geometry) and asText(geometry) 
> functions, but don't use asEWKB(geometry) or asEWKT(geometry),
> and everything should work well.

Note that the EWKB is returned also when you do something like select * from layer and not only in case you use asEWKB. It would also return EWKB in case you do select geom from layer, geom being the geometry column.
Currently I don't use this syntax, I use Astext.
However it means that my queries are more complex and less portable. For example, select * from layer would work fine in shapefiles, but it does not work in PostGIS.

comment:14 by Mateusz Łoskot, 18 years ago

(In reply to comment #16)
> Note that the EWKB is returned also when you do something like select * from
> layer and not only in case you use asEWKB. It would also return EWKB in case
> you do select geom from layer, geom being the geometry column.

According to my understanding, it's not a bug.
User is not supposed to use non-explicitly casted (with asBinary() or asText()) geometry values. Geometry value returned using 'select the_geom from mytable'
is not assumed to be OGC compliant, but a kind of *internal geometry format*,
similarly to MySQL's Spatial Extensions:
http://dev.mysql.com/doc/refman/5.1/en/fetching-spatial-data.html

where simple 'select the_geom from mytable' is considered as "Fetching spatial data in internal format". So, PostGIS is free to assume EWKB as its internal format.

> However it means that my queries are more complex and less portable. For
> example, select * from layer would work fine in shapefiles, but it does not
> work in PostGIS.

Ariel, I am sorry but I can not agree with your statement above.
Moreover, your queries are *less* portable if you don't use AsBinary() and AsText()
functions, because it means you're queries are not compliant to OGC SFS for SQL.
The specification does not tell implementors that 'select the_geom from mylayer' 
returns OGC geometry, neither WKB nor WKT, but it clearly says that AsText and AsBinary
functions return OGC geoemtries.
Here are two paragraphs

----
2.1.1.1 Basic Methods on Geometry
AsBinary( ):Binary—Exports this Geometry to a specific well-known binary representation of Geometry.

3.2.9.2 Language Constructs
The AsBinary function takes a single argument of type Geometry and returns its well-known binary representation.
----

Please note, I sad the above according to my understanding of the subject,
but certainly I may be wrong.

comment:15 by warmerdam, 18 years ago

Folks,

I haven't followed this too closely, but I wanted to make the point that
in the past OGR could read the internal geometry format returned by 
"select * from table". 

I gather some recent version of PostGIS changed the default representation
return to EWKB, is that right?  And the problem is that this can't be
safely distinguished from regular WKB (via AsBinary())?  

I don't have a solution, if my assumptions are correct.  But I would love
it if we could come up with a safe way of recognising EWKB in the PostGIS
driver.  (note, it isn't my intention that the OGRGeometry WKB parser 
support EWKB at this time). 

Since this is very desirable, I'm going to reopen this bug again.

comment:16 by warmerdam, 18 years ago

I have discussed this with Mateusz, and we have concluded that
supporting EWKB is a lot of work due to the Z and M support (in addition 
to the SRID).  And we can't easily tell whether a given blob of binary
is EWKB or WKB.  So we will leave things as they are for now, requiring
the user to use AsBinary() or AsText(). 

Hopefully at some point in the future the PostGIS default return type will
be more easily identified and/or change to the in-progress OGC extended WKB
which we will need to support.

Note: See TracTickets for help on using tickets.