Opened 11 years ago

Closed 10 years ago

#2372 closed defect (fixed)

KML with space around ordinate values considered invalid

Reported by: strk Owned by: strk
Priority: high Milestone: PostGIS 1.5.9
Component: postgis Version: 1.5.X
Keywords: history, kml Cc: rouault

Description

Works:

ST_GeomFromKML('<Point><coordinates>0,0</coordinates></Point>')

Fails:

ST_GeomFromKML('<Point><coordinates>0, 0</coordinates></Point>')

Fails:

ST_GeomFromKML('<Point><coordinates>0 ,0</coordinates></Point>')

If the spaces are before X or after Y ordinates, it still works.

Broken from 1.5 to 2.2..

Change History (28)

comment:1 by strk, 11 years ago

Owner: changed from pramsey to strk
Status: newassigned

Example KML containing the unsupported representation: http://firms.modaps.eosdis.nasa.gov/active_fire/kml/Europe_24h.kml

comment:2 by strk, 11 years ago

Owner: changed from strk to colivier
Status: assignednew

comment:3 by strk, 11 years ago

Was there any special reason for NOT using strtod ?

comment:4 by strk, 11 years ago

I've a working strtod implementation but to my surprise this case is under regression testsuite as one that is supposed to fail. Really !? Why ?

From regress/in_kml.sql:

-- ERROR: Spaces insides 
SELECT 'coordinates_14', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1, 2 3, 4</kml:coordinates></kml:LineString>'));

comment:5 by strk, 11 years ago

My implementation supports the above interpreting it as SRID=4326;LINESTRING(1 2,3 4) , which I find correct

comment:6 by strk, 11 years ago

Another 2 that my implementation supports:

<Point><coordinates>1,.1</coordinates></Point>
becomes POINT(1.0 0.1)
<Point><coordinates>1,-.1</coordinates></Point>
becomes POINT(1.0 -0.1)

Problems with that ?

comment:7 by strk, 11 years ago

Owner: changed from colivier to strk
Status: newassigned

comment:8 by strk, 11 years ago

Keywords: history kml added
Resolution: fixed
Status: assignedclosed

r11589 in trunk r11590 in 2.1 branch r11591 in 2.0 branch r11592 in 1.5 branch

Phewww, I just hope it's all fine. Looking forward to #444 !

comment:9 by colivier, 11 years ago

Is it really a bug ? As whitespace mean tuple separator in KML spec (cf section 16.9.1)

On the other hand, i'm fine with .1 and -.1 support. Maybe worth also to look on ST_GeomFromGML one as they were both written closely…

comment:10 by strk, 11 years ago

Better accepting it that bailing out, if it can be interpreted univoquely, don't you agree ? See the NASA kml, it's a real world case. Where's the KML spec ?

comment:11 by colivier, 11 years ago

The KML spec is provided by OGC http://www.opengeospatial.org/standards/kml

The paragraph related to coordinatesType seems to me quite clear:

"String representing one or more coordinate tuples, with each tuple consisting of decimal values for geodetic longitude, geodetic latitude, and altitude. The altitude component is optional. The coordinate separator is a comma and the tuple separator is a whitespace. Longitude and latitude coordinates are expressed in decimal degrees only."

comment:12 by colivier, 11 years ago

Resolution: fixed
Status: closedreopened

comment:13 by strk, 11 years ago

The spec is still respected, comma separates ordinate values, space separates tuples. Only we now additionally support blank characters surrounding the comma, what's the problem with that ? I don't think the paragraph you report forbids that.

Have you tried the NASA KML against google earth or some other viewer ? I tried it here: http://www.doogal.co.uk/KmlViewer.php (can read it)

According to the website from which the KML was downloaded, these KML can be directly opened in Google Earth: http://earthdata.nasa.gov/data/near-real-time-data/firms/active-fire-data#tab-content-2

Do we want to be picky and NOT support them because OGC wording is open to stricter interpretation than KML writers may think ?

comment:14 by colivier, 11 years ago

No the spec is NOT respected anymore. As if you use the same char (space here) for tuple separator AND for coordinate separator you have no control on what you are parsing (meaning tuples and dimensions)

Few samples:

 SELECT 'Valid KML', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614 -71.1667,42.2616</coordinates>
		</LineString>'));

SELECT 'Junk KML I', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614, 12 -71.1667, 42.2616,14</coordinates>
		</LineString>'));

SELECT 'Junk KML II', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614, 12 42.2616, 14</coordinates>
		</LineString>'));

comment:15 by strk, 11 years ago

The "Junk KML I" I'd consider a valid 3D linestring, the "Junk KML II" I'd indeed call invalid,and the current code doesn't complain (new bug).

comment:16 by colivier, 11 years ago

But how could 'Junk KML I' be a valid KML 2.2 3D Linestring ? It just broke the spec.

I could hear that from your point of view you it's a progress to allow some invalid data to also be parsed. But from my point of view it's a regress, because we loose the validation information.

Adding a ST_MakeValidKML seems clearly overkill, but an extra parameter to ST_GeomFromKML allowing the end user to switch from one behaviour to an another could make sense at the very minimum printing a NOTICE each time we met an invalid data.

For Junk KML II, it's was just a sample, there's possibly some other use case quite similar

comment:17 by strk, 11 years ago

I think 'Junk KML I' is valid because it contains 2 coordinates of 3 dimensions each, with each ordinate value separated by comma and coordinates separated by _only_ spaces.

Instead 'Junk KML II' is invalid because it contains a mix of 3d and 2d coordinates. I tried a version of 'Junk KML II' without spaces and guess what ? The old implementation considers it valid:

SELECT 'Junk KML II no spacepad', ST_AsEWKT(ST_GeomFromKML('
		<LineString>
			<coordinates>-71.1663,42.2614,12 42.2616,14</coordinates>
		</LineString>'));

So accepting it is not even a regression.

You're right I generally think it's better to accept invalids and clean within the database, but anyway I've not seens a KML in the wild with mixed coordinate values so it shouldn't be a problem to bail out in that case. It'd be a matter of _checking_ num_dims variable to match the previous value whenever a new coordinate is pushed on the stack. Do you want me to do that ?

comment:18 by strk, 11 years ago

Cc: rouault added

Thanks to rouault here's another reference: https://developers.google.com/kml/documentation/kmlreference?hl=fr#point

This one is even more explicit about spaces NOT being allowed (agreeing with Colivier): "Do not include spaces between the three values that describe a coordinate."

Still, I think we should follow the robustness principle "Be conservative in what you do, be liberal in what you accept from others".

The GDAL ticket: http://trac.osgeo.org/gdal/ticket/5140

comment:19 by colivier, 11 years ago

I'm OK to be liberal, on this one, as soon as:

  • we keep the ability to also known if the parsed data was (or not) valid according the spec
  • very obviously if we still able to rightly parse every use case needed by the initial spec.

I suggested printing NOTICE, or extra parameter to do so, do you have a preference or an another idea ?

comment:20 by strk, 11 years ago

See how you like r11603 for handling the mixed-dimensions KML. Note it will still fail to detect multi-component geometries with different components having different dimensions (but this isn't a regression).

About notifying not-strictly-valid I think raising a NOTICE is going to be the simplest implementation as it won't introduce any new parameter.

comment:21 by strk, 11 years ago

For the NOTICE, when do we want to raise it ? Should space ONLY be allowed to separate coordinates ? In other words, are these all to raise a NOTICE ?

1: <> 1,2<>
2: <>1,2 <>
3: <>1 ,2<>
4: <>1, 2<>

And what about multiple spaces ? (as the spec say "a whitespace"):

5: <>1,2   1,2<>

comment:22 by colivier, 11 years ago

I will take time to check r11603 since tomorrow noon.

Related to NOTICE, every time we accept to parse the input as it's indeed not conform to the spec.

Space is tuple separator so all theses use cases should falling in invalid ones. (1,2,5 previously was tolerated, but no reason to do so with NOTICE)

comment:23 by colivier, 11 years ago

Humm i've replyied a bit too fast on my latest post:

After checking back in OGC KML 2.2 spec, it itself contains some examples (e.g in 9.10.4) allowing space before/after tuples, or also several spaces as a tuple separator

So explain why 1,2,5 use cases were previously allowed, in first coordinates parser implementation.

So mean NOTICE only on 3 and 4 uses cases.

I've give a look on latest trunk Here an enhanced version of coordinates regress/in_kml.sql who should reflect current discussion:

--
-- Coordinates
--

-- X,Y
SELECT 'coordinates_1', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2</kml:coordinates></kml:Point>'));

-- ERROR: single X
SELECT 'coordinates_2', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1</kml:coordinates></kml:Point>'));

-- X,Y,Z
SELECT 'coordinates_3', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,3</kml:coordinates></kml:Point>'));

-- ERROR: 4 dimension
SELECT 'coordinates_4', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,3,4</kml:coordinates></kml:Point>'));

-- ERROR: Only commas
SELECT 'coordinates_5', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>,</kml:coordinates></kml:Point>'));
SELECT 'coordinates_6', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates> , </kml:coordinates></kml:Point>'));

-- ERROR: empty or spaces
SELECT 'coordinates_7', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates></kml:coordinates></kml:Point>'));
SELECT 'coordinates_8', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>  </kml:coordinates></kml:Point>'));

-- ERROR: End on comma
SELECT 'coordinates_9', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,3,</kml:coordinates></kml:Point>'));
SELECT 'coordinates_10', ST_AsEWKT(ST_GeomFromKML('<kml:Point><kml:coordinates>1,2,</kml:coordinates></kml:Point>'));

-- ERROR: Begin on comma
SELECT 'coordinates_11', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>,1 2,3</kml:coordinates></kml:LineString>'));

-- Whitespaces before and after 
SELECT 'coordinates_12', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates> 1,2 3,4 </kml:coordinates></kml:LineString>'));
SELECT 'coordinates_13', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates> 
                                                                1,2 3,4  
                                                   </kml:coordinates></kml:LineString>'));

-- NOTICE: Spaces insides -- Cf #2372
SELECT 'coordinates_14', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1, 2 3,4</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_15', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 3 ,4</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_16', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2, 3 4, 5,6</kml:coordinates></kml:LineString>'));

-- Several spaces as tuples separator
SELECT 'coordinates_17', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2   3,4</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_18', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>
                                                                        1,2
                                                                        3,4
                                                   </kml:coordinates></kml:LineString>'));

-- ERROR: Mixed dimension
SELECT 'coordinates_19', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 3,4,5</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_20', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2,3 4,5</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_21', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 ,3</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_22', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1, 2,3</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_23', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2,3 ,4,5</kml:coordinates></kml:LineString>'));
SELECT 'coordinates_24', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2, 3,4,5</kml:coordinates></kml:LineString>'));

-- ERROR: Mixed dimension and extra tuple separator (Cf #2372)
SELECT 'coordinates_25', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>1,2 3,4, 5</kml:coordinates></kml:LineString>'));


-- ERROR: Junk
SELECT 'coordinates_26', ST_AsEWKT(ST_GeomFromKML('<kml:LineString><kml:coordinates>!@#$%^*()"</kml:coordinates></kml:LineString>'));

comment:24 by strk, 11 years ago

Colivier: could you provide testsuite changes in form of a patch ? Please patch both .sql and _expected (with the expected message).

comment:25 by pramsey, 10 years ago

How many spaces can dance on the head of a pin?

comment:26 by pramsey, 10 years ago

Resolution: wontfix
Status: reopenedclosed

comment:27 by strk, 10 years ago

Resolution: wontfix
Status: closedreopened

the issue in the ticket summary was actually fixed (ie: now space around ordinate values are accepted)

comment:28 by strk, 10 years ago

Resolution: fixed
Status: reopenedclosed

raising NOTICEs for not strictly-valid cases could be another ticket (@coliver, if you want)

Note: See TracTickets for help on using tickets.