Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#5811 closed defect (fixed)

Py3 Unicode problem

Reported by: nelsonminar Owned by: hobu
Priority: normal Milestone: 2.0.0
Component: PythonBindings Version: 1.11.0
Severity: normal Keywords:
Cc:

Description

The OGR bindings in Python 3 throw a Unicode exception when reading fields from a non-UTF-8 shapefile. I believe this only occurs in cases where OLCStringsAsUTF8 is false. My specific case is a shapefile that appears to be in ISO-8859-1; when I go to read a non-ASCII field I get an exception. The same code works in Python 2, I end up with a byte string with ISO-8859-1 characters in it.

$ python3 ogr-test.py Traceback (most recent call last):

File "ogr-test.py", line 17, in <module>

field = in_feature.GetField?(i)

File "/usr/lib/python3/dist-packages/osgeo/ogr.py", line 3033, in GetField?

return self.GetFieldAsString?(fld_index)

File "/usr/lib/python3/dist-packages/osgeo/ogr.py", line 2362, in GetFieldAsString?

return _ogr.Feature_GetFieldAsString(self, *args)

UnicodeDecodeError?: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

There is sample code and a datafile in http://www.somebits.com/~nelson/ogr-py3-bugreport-nelson/

The whole package is 135 megs (large shapefile), someone looking at this may want to start with reading my code. Attached, also at http://www.somebits.com/~nelson/ogr-py3-bugreport-nelson/unzipped/ogr-test.py

My guess is the Python 3 bindings are assuming the input is always UTF-8 when converting to Unicode strings. I'm not aware of any way to override that choice, setting the SHAPE_ENCODING=iso-8859-1 does not help.

Attachments (1)

ogr-test.py (660 bytes) - added by nelsonminar 7 years ago.
test code

Download all attachments as: .zip

Change History (5)

Changed 7 years ago by nelsonminar

Attachment: ogr-test.py added

test code

comment:1 Changed 7 years ago by Jukka Rahkonen

If the reason for the error is in some non-UTF-8 attribute you should be able to reproduce the error with a shapefile with just one feature which you could attach to the ticket. It would be much better if it happens to take some time before the issue is resolved. External links tend to break with the time.

comment:2 Changed 7 years ago by nelsonminar

I've tried to make a smaller test input file but don't know how. If I create a smaller one using ogr2ogr the new shapefile no longer triggers OLCStringsAsUTF8=False, the necessary condition for the error. (Perhaps ogr2ogr re-encodes to UTF-8?). I have no idea what software produced the original shapefile.

The URLs I've referenced above are on my own server and should work months or years. Also glad to upload the files anywhere someone asks.

This is the ogr2ogr command I tried to prepare a smaller fie with the error case: ogr2ogr -where "STRAATNMID = 29320" short.shp CrabAdr.shp

comment:3 Changed 7 years ago by Even Rouault

Milestone: 2.0
Resolution: fixed
Status: newclosed

trunk r28366 "Make GetFieldAsBinary?() work with OFTString fields; For Python3 compat, make Feature.GetField?() use GetFieldAsBinary?() if GetFieldAsString?() fails (#5811)"

comment:4 Changed 6 years ago by Even Rouault

Milestone: 2.02.0.0

Milestone renamed

Note: See TracTickets for help on using tickets.