Ticket #2372 (reopened defect)

Opened 13 months ago

Last modified 5 months ago

KML with space around ordinate values considered invalid

Reported by: strk Owned by: strk
Priority: high Milestone: PostGIS 1.5.9
Component: postgis Version: 1.5.X
Keywords: history, kml Cc: rouault

Description

Works:

ST_GeomFromKML('<Point><coordinates>0,0</coordinates></Point>')

Fails:

ST_GeomFromKML('<Point><coordinates>0, 0</coordinates></Point>')

Fails:

ST_GeomFromKML('<Point><coordinates>0 ,0</coordinates></Point>')

If the spaces are before X or after Y ordinates, it still works.

Broken from 1.5 to 2.2..

Change History

Changed 13 months ago by strk

  • owner changed from pramsey to strk
  • status changed from new to assigned

Example KML containing the unsupported representation:  http://firms.modaps.eosdis.nasa.gov/active_fire/kml/Europe_24h.kml

Changed 13 months ago by strk

  • owner changed from strk to colivier
  • status changed from assigned to new

Changed 13 months ago by strk

Was there any special reason for NOT using strtod ?

Changed 13 months ago by strk

I've a working strtod implementation but to my surprise this case is under regression testsuite as one that is supposed to fail. Really !? Why ?

From regress/in_kml.sql:

-- ERROR: Spaces insides 
SELECT 'coordinates_14', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1, 2 3, 4</kml:coordinates></kml:LineString>'));

Changed 13 months ago by strk

My implementation supports the above interpreting it as SRID=4326;LINESTRING(1 2,3 4) , which I find correct

Changed 13 months ago by strk

Another 2 that my implementation supports:

<Point><coordinates>1,.1</coordinates></Point>
becomes POINT(1.0 0.1)
<Point><coordinates>1,-.1</coordinates></Point>
becomes POINT(1.0 -0.1)

Problems with that ?

Changed 13 months ago by strk

  • owner changed from colivier to strk
  • status changed from new to assigned

Changed 13 months ago by strk

  • keywords history, kml added
  • status changed from assigned to closed
  • resolution set to fixed

r11589 in trunk r11590 in 2.1 branch r11591 in 2.0 branch r11592 in 1.5 branch

Phewww, I just hope it's all fine. Looking forward to #444 !

Changed 13 months ago by colivier

Is it really a bug ? As whitespace mean tuple separator in KML spec (cf section 16.9.1)

On the other hand, i'm fine with .1 and -.1 support. Maybe worth also to look on ST_GeomFromGML one as they were both written closely...

Changed 13 months ago by strk

Better accepting it that bailing out, if it can be interpreted univoquely, don't you agree ? See the NASA kml, it's a real world case. Where's the KML spec ?

Changed 13 months ago by colivier

The KML spec is provided by OGC  http://www.opengeospatial.org/standards/kml

The paragraph related to coordinatesType seems to me quite clear:

"String representing one or more coordinate tuples, with each tuple consisting of decimal values for geodetic longitude, geodetic latitude, and altitude. The altitude component is optional. The coordinate separator is a comma and the tuple separator is a whitespace. Longitude and latitude coordinates are expressed in decimal degrees only."

Changed 13 months ago by colivier

  • status changed from closed to reopened
  • resolution fixed deleted

Changed 13 months ago by strk

The spec is still respected, comma separates ordinate values, space separates tuples. Only we now additionally support blank characters surrounding the comma, what's the problem with that ? I don't think the paragraph you report forbids that.

Have you tried the NASA KML against google earth or some other viewer ? I tried it here:  http://www.doogal.co.uk/KmlViewer.php (can read it)

According to the website from which the KML was downloaded, these KML can be directly opened in Google Earth:  http://earthdata.nasa.gov/data/near-real-time-data/firms/active-fire-data#tab-content-2

Do we want to be picky and NOT support them because OGC wording is open to stricter interpretation than KML writers may think ?

Changed 13 months ago by colivier

No the spec is NOT respected anymore. As if you use the same char (space here) for tuple separator AND for coordinate separator you have no control on what you are parsing (meaning tuples and dimensions)

Few samples:

 SELECT 'Valid KML', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614 -71.1667,42.2616</coordinates>
		</LineString>'));

SELECT 'Junk KML I', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614, 12 -71.1667, 42.2616,14</coordinates>
		</LineString>'));

SELECT 'Junk KML II', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614, 12 42.2616, 14</coordinates>
		</LineString>'));

Changed 13 months ago by strk

The "Junk KML I" I'd consider a valid 3D linestring, the "Junk KML II" I'd indeed call invalid,and the current code doesn't complain (new bug).

Changed 13 months ago by colivier

But how could 'Junk KML I' be a valid KML 2.2 3D Linestring ? It just broke the spec.

I could hear that from your point of view you it's a progress to allow some invalid data to also be parsed. But from my point of view it's a regress, because we loose the validation information.

Adding a ST_MakeValidKML seems clearly overkill, but an extra parameter to ST_GeomFromKML allowing the end user to switch from one behaviour to an another could make sense at the very minimum printing a NOTICE each time we met an invalid data.

For Junk KML II, it's was just a sample, there's possibly some other use case quite similar

Changed 13 months ago by strk

I think 'Junk KML I' is valid because it contains 2 coordinates of 3 dimensions each, with each ordinate value separated by comma and coordinates separated by _only_ spaces.

Instead 'Junk KML II' is invalid because it contains a mix of 3d and 2d coordinates. I tried a version of 'Junk KML II' without spaces and guess what ? The old implementation considers it valid:

SELECT 'Junk KML II no spacepad', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614,12 42.2616,14</coordinates>
		</LineString>'));

So accepting it is not even a regression.

You're right I generally think it's better to accept invalids and clean within the database, but anyway I've not seens a KML in the wild with mixed coordinate values so it shouldn't be a problem to bail out in that case. It'd be a matter of _checking_ num_dims variable to match the previous value whenever a new coordinate is pushed on the stack. Do you want me to do that ?

Changed 13 months ago by strk

  • cc rouault added

Thanks to rouault here's another reference:  https://developers.google.com/kml/documentation/kmlreference?hl=fr#point

This one is even more explicit about spaces NOT being allowed (agreeing with Colivier): "Do not include spaces between the three values that describe a coordinate."

Still, I think we should follow the robustness principle "Be conservative in what you do, be liberal in what you accept from others".

The GDAL ticket:  http://trac.osgeo.org/gdal/ticket/5140

Changed 13 months ago by colivier

I'm OK to be liberal, on this one, as soon as:

  • we keep the ability to also known if the parsed data was (or not) valid according the spec
  • very obviously if we still able to rightly parse every use case needed by the initial spec.

I suggested printing NOTICE, or extra parameter to do so, do you have a preference or an another idea ?

Changed 13 months ago by strk

See how you like r11603 for handling the mixed-dimensions KML. Note it will still fail to detect multi-component geometries with different components having different dimensions (but this isn't a regression).

About notifying not-strictly-valid I think raising a NOTICE is going to be the simplest implementation as it won't introduce any new parameter.

Changed 13 months ago by strk

For the NOTICE, when do we want to raise it ? Should space ONLY be allowed to separate coordinates ? In other words, are these all to raise a NOTICE ?

1: <> 1,2<>
2: <>1,2 <>
3: <>1 ,2<>
4: <>1, 2<>

And what about multiple spaces ? (as the spec say "a whitespace"):

5: <>1,2   1,2<>

Changed 13 months ago by colivier

I will take time to check r11603 since tomorrow noon.

Related to NOTICE, every time we accept to parse the input as it's indeed not conform to the spec.

Space is tuple separator so all theses use cases should falling in invalid ones. (1,2,5 previously was tolerated, but no reason to do so with NOTICE)

Changed 13 months ago by colivier

Humm i've replyied a bit too fast on my latest post:

After checking back in OGC KML 2.2 spec, it itself contains some examples (e.g in 9.10.4) allowing space before/after tuples, or also several spaces as a tuple separator

So explain why 1,2,5 use cases were previously allowed, in first coordinates parser implementation.

So mean NOTICE only on 3 and 4 uses cases.

I've give a look on latest trunk Here an enhanced version of coordinates regress/in_kml.sql who should reflect current discussion:

--
-- Coordinates
--

-- X,Y
SELECT 'coordinates_1', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2</kml:coordinates></kml:Point>'));

-- ERROR: single X
SELECT 'coordinates_2', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1</kml:coordinates></kml:Point>'));

-- X,Y,Z
SELECT 'coordinates_3', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,3</kml:coordinates></kml:Point>'));

-- ERROR: 4 dimension
SELECT 'coordinates_4', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,3,4</kml:coordinates></kml:Point>'));

-- ERROR: Only commas
SELECT 'coordinates_5', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>,</kml:coordinates></kml:Point>'));
SELECT 'coordinates_6', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates> , </kml:coordinates></kml:Point>'));

-- ERROR: empty or spaces
SELECT 'coordinates_7', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates></kml:coordinates></kml:Point>'));
SELECT 'coordinates_8', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>  </kml:coordinates></kml:Point>'));

-- ERROR: End on comma
SELECT 'coordinates_9', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,3,</kml:coordinates></kml:Point>'));
SELECT 'coordinates_10', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,</kml:coordinates></kml:Point>'));

-- ERROR: Begin on comma
SELECT 'coordinates_11', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>,1 2,3</kml:coordinates></kml:LineString>'));

-- Whitespaces before and after 
SELECT 'coordinates_12', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates> 1,2 3,4 </kml:coordinates></kml:LineString>'));
SELECT 'coordinates_13', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates> 
                                                                1,2 3,4  
                                                   </kml:coordinates></kml:LineString>'));

-- NOTICE: Spaces insides -- Cf #2372
SELECT 'coordinates_14', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1, 2 3,4</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_15', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 3 ,4</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_16', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2, 3 4, 5,6</kml:coordinates></kml:LineString>'));

-- Several spaces as tuples separator
SELECT 'coordinates_17', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2   3,4</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_18', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>
                                                                        1,2
                                                                        3,4
                                                   </kml:coordinates></kml:LineString>'));

-- ERROR: Mixed dimension
SELECT 'coordinates_19', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 3,4,5</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_20', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2,3 4,5</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_21', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 ,3</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_22', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1, 2,3</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_23', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2,3 ,4,5</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_24', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2, 3,4,5</kml:coordinates></kml:LineString>'));

-- ERROR: Mixed dimension and extra tuple separator (Cf #2372)
SELECT 'coordinates_25', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 3,4, 5</kml:coordinates></kml:LineString>'));


-- ERROR: Junk
SELECT 'coordinates_26', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>!@#$%^*()"</kml:coordinates></kml:LineString>'));

Changed 13 months ago by strk

Colivier: could you provide testsuite changes in form of a patch ? Please patch both .sql and _expected (with the expected message).

Changed 5 months ago by pramsey

How many spaces can dance on the head of a pin?

Note: See TracTickets for help on using tickets.