#5811 closed defect (fixed)
Py3 Unicode problem
Reported by: | nelsonminar | Owned by: | hobu |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | PythonBindings | Version: | 1.11.0 |
Severity: | normal | Keywords: | |
Cc: |
Description
The OGR bindings in Python 3 throw a Unicode exception when reading fields from a non-UTF-8 shapefile. I believe this only occurs in cases where OLCStringsAsUTF8 is false. My specific case is a shapefile that appears to be in ISO-8859-1; when I go to read a non-ASCII field I get an exception. The same code works in Python 2, I end up with a byte string with ISO-8859-1 characters in it.
$ python3 ogr-test.py Traceback (most recent call last):
File "ogr-test.py", line 17, in <module>
field = in_feature.GetField(i)
File "/usr/lib/python3/dist-packages/osgeo/ogr.py", line 3033, in GetField
return self.GetFieldAsString(fld_index)
File "/usr/lib/python3/dist-packages/osgeo/ogr.py", line 2362, in GetFieldAsString
return _ogr.Feature_GetFieldAsString(self, *args)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte
There is sample code and a datafile in http://www.somebits.com/~nelson/ogr-py3-bugreport-nelson/
The whole package is 135 megs (large shapefile), someone looking at this may want to start with reading my code. Attached, also at http://www.somebits.com/~nelson/ogr-py3-bugreport-nelson/unzipped/ogr-test.py
My guess is the Python 3 bindings are assuming the input is always UTF-8 when converting to Unicode strings. I'm not aware of any way to override that choice, setting the SHAPE_ENCODING=iso-8859-1 does not help.
Attachments (1)
Change History (5)
by , 9 years ago
Attachment: | ogr-test.py added |
---|
comment:1 by , 9 years ago
If the reason for the error is in some non-UTF-8 attribute you should be able to reproduce the error with a shapefile with just one feature which you could attach to the ticket. It would be much better if it happens to take some time before the issue is resolved. External links tend to break with the time.
comment:2 by , 9 years ago
I've tried to make a smaller test input file but don't know how. If I create a smaller one using ogr2ogr the new shapefile no longer triggers OLCStringsAsUTF8=False, the necessary condition for the error. (Perhaps ogr2ogr re-encodes to UTF-8?). I have no idea what software produced the original shapefile.
The URLs I've referenced above are on my own server and should work months or years. Also glad to upload the files anywhere someone asks.
This is the ogr2ogr command I tried to prepare a smaller fie with the error case:
ogr2ogr -where "STRAATNMID = 29320" short.shp CrabAdr.shp
comment:3 by , 9 years ago
Milestone: | → 2.0 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
trunk r28366 "Make GetFieldAsBinary() work with OFTString fields; For Python3 compat, make Feature.GetField() use GetFieldAsBinary() if GetFieldAsString() fails (#5811)"
test code