Opened 18 years ago

Closed 15 years ago

Last modified 15 years ago

#1017 closed defect (fixed)

[OGR-GML] fids written by the GML driver are not read back correctly

Reported by: sebastien.grignard@… Owned by: chaitanya
Priority: normal Milestone: 1.7.0
Component: OGR_SF Version: unspecified
Severity: normal Keywords: GML
Cc: gdal@…, warmerdam

Description (last modified by warmerdam)

All fids written by the GML driver (F...) are ignored by the reader.

Attachments (9)

patch.diff (937 bytes ) - added by sebastien.grignard@… 18 years ago.
This patch tries to detect if the GML fid was written by the OGR driver to assign it correctly
ogr_gml.h.patch (365 bytes ) - added by chaitanya 15 years ago.
ogrgmllayer.cpp.patch (3.3 KB ) - added by chaitanya 15 years ago.
This patch takes care of the situations with FIDs with prefixes. They will be unique unless the GML file has non-unique FIDs.
ogrgmllayer.cpp.2.patch (1.2 KB ) - added by chaitanya 15 years ago.
Takes care of the situations when numeric part of the fids are too large for int to hold.
ogrgmllayer.cpp.3.patch (1.7 KB ) - added by chaitanya 15 years ago.
Patch alternative to the ogrgmllayer.cpp.2.patch
ogr_gml_read.py.diff (1.1 KB ) - added by chaitanya 15 years ago.
Added test for ticket#1017
test_point1.gml (2.2 KB ) - added by chaitanya 15 years ago.
Data file for the test to be added to 'autotest/ogr/data' folder
test_point2.gml (2.2 KB ) - added by chaitanya 15 years ago.
Data file for the test to be added to 'autotest/ogr/data' folder
test_point3.gml (1.9 KB ) - added by chaitanya 15 years ago.
Data file for the test to be added to 'autotest/ogr/data' folder

Download all attachments as: .zip

Change History (25)

by sebastien.grignard@…, 18 years ago

Attachment: patch.diff added

This patch tries to detect if the GML fid was written by the OGR driver to assign it correctly

comment:1 by warmerdam, 18 years ago

Sebastian,

I'm still not comfortable with the FID handling.  It seems like we could
get very screwy FID values in some situations, and non-unique fids could be
pretty bad karma. 

I'll leave this bug report open for the time being and think about it a bit. 

comment:2 by warmerdam, 15 years ago

Cc: gdal@… warmerdam added
Description: modified (diff)
Keywords: GML added
Owner: changed from warmerdam to chaitanya
Priority: highnormal
Status: assignednew

Chaitanya,

Could you please revise the patch to work properly against trunk, and to do some more careful evaluation of the FID values. In particular I would like:

  • The FID values to be more completely scanned to ensure they are strictly a string of numeric digits, potentially prefixed by "F", and with no other characters, spaces, etc.
  • Each time an FID value is extracted, iNextGMLId should be set to the maximum of the existing iNextGMLId value, and the FID value plus 1 - the objective here is to keep track of the next safe FID we could assign if we need to switch to serial fid assignment, without risk of conflicting with any existing FIDs.
  • Keep a flag indicating whether we have encountered any illegal FIDs, in which case we will assign the FIDs serially even if the current FID seems legitimate.

If/when you are satisfied with these changes, please then extent autotest/ogr/ogr_gml_read.py to test these new capabilities and special cases, including introducing new (small) test files as needed.

comment:3 by warmerdam, 15 years ago

Chris Schmidt has pointed other that there are a variety of other situations where a standard prefix is used for numeric FIDs. So I'd also like to see the patch extended to scan the first fid encountered (for a given layer), and if it is any prefix followed by a numeric id, record it and check for it in subsequent layers.

So, from the first feature we examine the FID and try to separate it into a prefix and a number id. If successful we will continue to parse and use FIDs until we reach a feature that does not have the exact same prefix at which point we start assigning FIDs serially. The goal is to use the provided GML FIDs if possible, but if we can't to just assign serially.

comment:4 by chaitanya, 15 years ago

Status: newassigned

by chaitanya, 15 years ago

Attachment: ogr_gml.h.patch added

by chaitanya, 15 years ago

Attachment: ogrgmllayer.cpp.patch added

This patch takes care of the situations with FIDs with prefixes. They will be unique unless the GML file has non-unique FIDs.

comment:5 by warmerdam, 15 years ago

I have reviewed the patch and made a few updates for preferred variable naming conventions, initializing the FIDprefix variable, and avoiding a memory leak of the GMLfid. I have only done minimal testing of the results. I observe that the ogr_gml_4 test in the ogr_gml_read.py autotest script now fails - presumably due to this patch. I haven't tracked why. The patch is applied in trunk (r16241).

Please review the ogr_gml_read.py test failure, and also extend that test script to test the new feature. I think we will keep this feature only in trunk as opposed to back porting it into 1.6.

comment:6 by chaitanya, 15 years ago

The FIDs in the test GML file has sixteen digit numeric parts. I believe int cannot hold this. I will resolve this issue.

by chaitanya, 15 years ago

Attachment: ogrgmllayer.cpp.2.patch added

Takes care of the situations when numeric part of the fids are too large for int to hold.

comment:7 by chaitanya, 15 years ago

Frank, The patch "ogrgmllayer.cpp.2.patch" should handle FIDs that have too big numeric parts. int cannot hold more than 9 digits.

In the modified code, if the first FID has more than 8 digits at the end, it only takes the last eight digits and treats the more significant digits as part of the prefix. Later on in the file if a FID has more than nine digits after the prefix, it is treated like an invalid FID and thereafter FIDs are assigned serially.

I am also adding a different patch to just ignore the given FIDs at the first FID itself.

by chaitanya, 15 years ago

Attachment: ogrgmllayer.cpp.3.patch added

Patch alternative to the ogrgmllayer.cpp.2.patch

comment:8 by warmerdam, 15 years ago

I have skimmed the new patch and it seems to test the length of the FID string which to me seems like a poor way of determining if it can be represented in a 32bit integer. It seems to me it would be better to use atof() on it, and determine if the values is larger than 231-1. Please ensure that the python test to go with this tests with an FID slightly less than 231-1 and slightly larger than 231.

I have not applied the patch in trunk as it is.

in reply to:  8 comment:9 by chaitanya, 15 years ago

Frank,

I did think of using atof() for this. However, we are already counting the digits to determine the prefix length.

Also, we need to have some buffer for iNextGMLId to grow if any subsequent FIDs become invalid. The patch only counts upto 8 digits.

I will make a new patch with atof() if you insist.

comment:10 by Even Rouault, 15 years ago

Tamas discovered a failure on ogr_gml_4 on Windows that might be related to the changes made for this ticket. On Linux, an ogrinfo on autotest/ogr/data/bom.gml now show the 3 features with an identical fid = 2147483647. On Windows, ogrinfo.exe loops endlessly and returns always the content of the first feature with a fid = -416888225.

comment:11 by Even Rouault, 15 years ago

Ok, by re-reading this thread, I see that ogr_gml_4 failure was already known.

Memory leak introduced by r16241 fixed in r16450

by chaitanya, 15 years ago

Attachment: ogr_gml_read.py.diff added

Added test for ticket#1017

by chaitanya, 15 years ago

Attachment: test_point1.gml added

Data file for the test to be added to 'autotest/ogr/data' folder

by chaitanya, 15 years ago

Attachment: test_point2.gml added

Data file for the test to be added to 'autotest/ogr/data' folder

by chaitanya, 15 years ago

Attachment: test_point3.gml added

Data file for the test to be added to 'autotest/ogr/data' folder

comment:12 by chaitanya, 15 years ago

Resolution: fixed
Status: assignedclosed

comment:13 by Even Rouault, 15 years ago

Milestone: 1.7.0
Resolution: fixed
Status: closedreopened

Chaitanya,

I reopen the ticket as there's still a pending issue, the one I mentionned in the comments above about the failures on ogr_gml_4 :

Tamas discovered a failure on ogr_gml_4 on Windows that might be related to the 
changes made for this ticket. On Linux, an ogrinfo on autotest/ogr/data/bom.gml now 
show the 3 features with an identical fid = 2147483647. On Windows, ogrinfo.exe
 loops endlessly and returns always the content of the first feature with a
fid = -416888225.

I've tried the ogrgmllayer.cpp.3.patch you've proposed, and it seems to fix this issue, so you should probably commit it.

On a methodological point of view, I encourage you to note in the ticket comment the revisions in which the commits related to the tickets have been done. This way it's easier to have better tracability between the commits (that refer to the tickets) and the tickets (that refer to the commits). Like : "Fixed in trunk (rXXXX) and in branches/1.6 (rYYYY). Tests added in trunk in rZZZZ"

comment:14 by chaitanya, 15 years ago

Resolution: fixed
Status: reopenedclosed

I apologise for the lack of explanation.

Fixed in trunk (r16241 and r16450). Fixed the error in handling long FIDs in trunk ( r16640 ); the ogr_gml_4() test now passes.

Also the test ogr_gml_6() has been added ( r16635, r16636, r16637, r16638 ) and improved (r16639) in trunk.

comment:15 by Even Rouault, 15 years ago

In r16779, re-commit of a previous fix commited in r16450 and accidently reverted by r16640.

comment:16 by Even Rouault, 15 years ago

In r17033, I've fixed a nasty crash when reading a GML file whose FIDs were just regular numeric values. Test added in r17033

Note: See TracTickets for help on using tickets.