Opened 19 years ago
Last modified 19 years ago
#839 closed defect (fixed)
[OGR] Attempt to use attribute index fails for large shp file
Reported by: | Owned by: | Daniel Morissette | |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | default | Version: | unspecified |
Severity: | normal | Keywords: | |
Cc: |
Description
The file is a 1.1 GB shapefile. An attribute index file has been created on a specific column (parcelid), creating .idm and .ind files. To test this attribute index the following ogrinfo command is used: $ ogrinfo SCAP_Cert2004_ParcelsUTM16M.shp SCAP_Cert2004_ParcelsUTM16M -sql "SELECT * FROM SCAP_Cert2004_ParcelsUTM16M WHERE parcelid = '093519 D00015'" which returns: INFO: Open of `SCAP_Cert2004_ParcelsUTM16M.shp' using driver `ESRI Shapefile' successful. layer names ignored in combination with -sql. Layer name: SCAP_Cert2004_ParcelsUTM16M Geometry: Polygon ERROR 1: Attempt to read shape with feature id (703141887) out of available rang e. Here is the ogrinfo -summary output: $ ogrinfo SCAP_Cert2004_ParcelsUTM16M.shp SCAP_Cert2004_ParcelsUTM16M -summary INFO: Open of `SCAP_Cert2004_ParcelsUTM16M.shp' using driver `ESRI Shapefile' successful. Layer name: SCAP_Cert2004_ParcelsUTM16M Geometry: Polygon Feature Count: 332190 ... Here is the size of the files: jeff users 1.1G Apr 25 21:06 SCAP_Cert2004_ParcelsUTM16M.dbf jeff users 246 Apr 26 14:18 SCAP_Cert2004_ParcelsUTM16M.idm jeff users 742M Apr 26 15:38 SCAP_Cert2004_ParcelsUTM16M.ind jeff users 8.9M Apr 26 14:13 SCAP_Cert2004_ParcelsUTM16M.qix jeff users 294M Apr 25 20:32 SCAP_Cert2004_ParcelsUTM16M.shp jeff users 2.6M Apr 25 20:33 SCAP_Cert2004_ParcelsUTM16M.shx
Change History (6)
comment:2 by , 19 years ago
Quick update: After discussing this with Frank, it seems that shapefile attribute indexes were implemented only for integer fields, but not for string fields. Unfortunately we do not have any integer field in this file to verify that this is really the issue. Another possibility is that this could be a limitation of the implementation of the .IND file format or of the format itself. I'll make a few more tests to verify that.
comment:3 by , 19 years ago
okay, I have verified that I am able to use an index on an string field on a shapefile in OGR and to query it, also verified that the index is indeed being used. So the issue is not that string field indexes are not implemented. It must be a limitation of the index format or of its implementation. Will dig further later.
comment:4 by , 19 years ago
More good news: I have converted the dataset to MapInfo .TAB format (which uses the same index stuff as the OGR shapefile index) and indexed on the "parcelid" field and run a few tests using the MITAB utilities and the index works. We're not there yet, but at least that confirms that the index file format and its implementation can work on such a large dataset.
comment:5 by , 19 years ago
We've found that a good way to work around the issue was to reduce the size of the parcelid field from 254 chars to 25 chars. This way the index works, and the file has a more reasonable size. I finally identified the source of the problem: it's an overflow of a 1 byte field in the header of the .IND file. The original file had string fields 254 chars long, which resulted in a 128 bytes key being used in the index. The index uses 512 bytes nodes, so only 3 entries can fit in a node. That explains why the indes ended up being so big. The field that overflows is the SubTreeDepth, i.e. the depth of the tree. In this case the resuulting index is 551 levels deep, but since the value is written in a byte, when we reopen the file we read 39... and hence all the problems that we've seen. I am a bit surprised to see the index going 551 levels deep, I would have expected that for this number of records and the key size it should go no deeper than 20-30 levels... I must have screwed up something in my calculations. I won't spend any more time to fix this. What I'll do is add a test when writing the header of the .IND file, and if there is an overflow you'll get the following message: ERROR 7: Index no 1 is too large and will not be useable. (SubTreeDepth = 551, cannot exceed 255).
comment:6 by , 19 years ago
Marking Fixed. I have committed the test with the error message to the master MITAB CVS and backported to the OGR CVS.
Note:
See TracTickets
for help on using tickets.