Opened 7 years ago

Last modified 4 years ago

#3220 new defect

WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values

Reported by: hellik Owned by: grass-dev@…
Priority: normal Milestone: 7.2.4
Component: Default Version: svn-releasebranch72
Keywords: Cc:
CPU: Unspecified Platform: MSWindows 8

Description

taken from the user ML:

https://lists.osgeo.org/pipermail/grass-user/2016-December/075682.html

I've got shape files with Swedish accented letters (ÄÖÅ) in the some of the attribute
values. The Attributes are shwon as they should in the GUI. SQL statements, however,
are not recognizing them. They're also messed up in the command output if another
(not accented) values are queried.

I sat GRASS_DB_ENCODING to cp1252 firstly and it didn't work. Then I converted the
dbf file into utf-8 and sat it as the value of the variable, to no avail. I also
tried using the 'encoding' parameter in v.in.ogr in both cases, didn't work.

I tried it on windows 8.1 and windows 10. The same is happening in both, stable GRASS
7.0.5 and GRASS 7.2.0RC1.

The problem is only happening on Windows. Fedora and Mac OsX don't have this issue
with the same shape files.

https://lists.osgeo.org/pipermail/grass-user/2016-December/075688.html

confirmed with

GRASS version: 7.3.svn                                                          
GRASS SVN revision: r70001                                                      
Build date: 2016-12-06                                                          
Build platform: x86_64-w64-mingw32                                              
GDAL: 2.1.2                                                                     
PROJ.4: 4.9.3                                                                   
GEOS: 3.5.0                                                                     
SQLite: 3.14.1                                                                  
Python: 2.7.5                                                                   
wxPython: 2.8.12.1                                                              
Platform: Windows-8-6.2.9200 (OSGeo4W) 
and a test vector with following attributes

v.db.select map=test_points at data file=D:\temp\test_point.txt

cat|id|names
1|1|ÄÖÅ
2||Æ
3||Ø
4||Å,å,Æ,æ,Ø,ø
5||ø, Ø
6||Þ
7||Ð
8||Å
9||æ
d.vect map=test_points2 at data where="names = 'Å,å,Æ,æ,Ø,ø'" width=1
icon=basic/point size=10

doesn't show the selected point in the map display.
v.report map=test_points at data option=coor                                       
cat|id|names|x|y|z
1|1|ÄÖÅ|1.37409120951759|47.039352838731|0.0
2||Æ|2.62326503635168|28.5515802015863|0.0
3||Ø|44.095836087244|57.2825782187707|0.0
4||Å,å,Æ,æ,Ø,ø|30.8545935228025|49.787535257766|0.0
5||ø, Ø|10.1183079973563|51.0367090846001|0.0
6||Þ|20.361533377396|52.0360481460674|0.0
8||Ã…|15.1491119517375|60.3621017805262|0.0
9||æ|-1.26290587954035|52.5879880709736|0.0

Traceback (most recent call last):
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 473, in OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 772, in AddStyledMessage

self.AddTextWrapped(message, wrap=None)
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 721, in AddTextWrapped

txt = EncodeString(txt)
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\co
re\gcmd.py", line 97, in EncodeString

return string.encode(_enc)
  File "C:\OSGEO4~1\apps\Python27\lib\encodings\cp1252.py",
line 12, in encode

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 3: ordinal
not in range(128)

Attachments (2)

test_points_encoding_errors.zip (1.5 KB ) - added by hellik 7 years ago.
zipped shapefile in wgs84 for testing
qgis_shapefile_cp1252.zip (1.3 KB ) - added by hellik 7 years ago.
qgis generated cp1252 example shapefile

Download all attachments as: .zip

Change History (16)

comment:1 by martinl, 7 years ago

Import (v.import/v.in.ogr) with encoding=cp1252 will not help?

in reply to:  1 comment:2 by hellik, 7 years ago

Replying to martinl:

Import (v.import/v.in.ogr) with encoding=cp1252 will not help?

v.import encoding=cp1252 input=D:\temp\test_points.shp layer=test_points output=testimportcp1252
WARNING: All available OGR layers will be imported into vector map <test_points>
Check if OGR layer <test_points> contains polygons...
Importing 9 features (OGR layer <test_points>)...
-----------------------------------------------------
Building topology for vector map <testimportcp1252@data2>...
Registering primitives...
9 primitives registered
9 vertices registered
Building areas...
0 areas built
0 isles built
Attaching islands...
Attaching centroids...
Number of nodes: 0
Number of primitives: 9
Number of points: 9
Number of lines: 0
Number of boundaries: 0
Number of centroids: 0
Number of areas: 0
Number of isles: 0
Input <D:\temp\test_points.shp> successfully imported without reprojection
v.report map=testimportcp1252@data2 option=coor                                 
cat|id|names|x|y|z
1|1|ÄÖÅ|1.37409120951759|47.039352838731|0.0
2||Æ|2.62326503635168|28.5515802015863|0.0
3||Ø|44.095836087244|57.2825782187707|0.0
4||Å,å,Æ,æ,Ø,ø|30.8545935228025|49.787535257766|0.0
5||ø, Ø|10.1183079973563|51.0367090846001|0.0
6||Þ|20.361533377396|52.0360481460674|0.0
8||Ã…|15.1491119517375|60.3621017805262|0.0
9||æ|-1.26290587954035|52.5879880709736|0.0

Traceback (most recent call last):
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 473, in OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 772, in AddStyledMessage

self.AddTextWrapped(message, wrap=None)
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu
i_core\goutput.py", line 721, in AddTextWrapped

txt = EncodeString(txt)
  File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\co
re\gcmd.py", line 97, in EncodeString

return string.encode(_enc)
  File "C:\OSGEO4~1\apps\Python27\lib\encodings\cp1252.py",
line 12, in encode

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 3: ordinal
not in range(128)

it doesn't help.

by hellik, 7 years ago

zipped shapefile in wgs84 for testing

in reply to:  1 comment:3 by razz, 7 years ago

Replying to martinl:

Import (v.import/v.in.ogr) with encoding=cp1252 will not help?


encoding=cp1252 in v.in.ogr did not help. And I'm getting them messed even with v.db.select but without any error output:

v.db.select map=test_points                                                          
cat|id|names
1|1|ÄÖÅ
2||Æ
3||Ø
4||Å,å,Æ,æ,Ø,ø
5||ø, Ø
6||Þ
7||Ð
8||Ã…
9||æ
(Wed Dec 07 15:32:51 2016) Command finished (0 sec)

I used the stand-alone installer for both GRASS 7.0.5 and GRASS 7.2.0svn, if it matters.

in reply to:  1 ; comment:4 by mlennert, 7 years ago

Replying to martinl:

Import (v.import/v.in.ogr) with encoding=cp1252 will not help?

Looking at the file, I do not have the feeling that it is in cp1252, but rather in utf-8, so IIUC the parameter setting for v.in.ogr should be encoding=utf-8.

in reply to:  4 comment:5 by hellik, 7 years ago

Replying to mlennert:

Replying to martinl:

Import (v.import/v.in.ogr) with encoding=cp1252 will not help?

Looking at the file, I do not have the feeling that it is in cp1252, but rather in utf-8, so IIUC the parameter setting for v.in.ogr should be encoding=utf-8.

Tried it also with UTF-8,it fails here too.

by hellik, 7 years ago

Attachment: qgis_shapefile_cp1252.zip added

qgis generated cp1252 example shapefile

in reply to:  4 comment:6 by hellik, 7 years ago

Replying to mlennert:

Replying to martinl:

Import (v.import/v.in.ogr) with encoding=cp1252 will not help?

Looking at the file, I do not have the feeling that it is in cp1252, but rather in utf-8, so IIUC the parameter setting for v.in.ogr should be encoding=utf-8.

added now a qgis generated (hopefully) cp1p1252 example shape files. this one fails here also on a self compiled linux grass trunk.

comment:7 by marisn, 7 years ago

A comment without looking into actual code. It is necessary to provide clear info on reproducing the issue. Crucial info is:

  • Windows locale (will influence assumed encoding);
  • The mechanism of executing example command (CMD.exe will have different encoding than other places. Think ANSI vs OEM).

Some related reading: https://trac.osgeo.org/grass/ticket/2525#comment:1 https://trac.osgeo.org/grass/ticket/2120#comment:10 http://stackoverflow.com/a/17177904 https://bugs.python.org/issue6135

in reply to:  7 comment:8 by razz, 7 years ago

Replying to marisn:

... Crucial info is:

  • Windows locale (will influence assumed encoding);
  • The mechanism of executing example command (CMD.exe will have different encoding than other places. Think ANSI vs OEM).


Here is what I've got: Windows locale

systeminfo
System Locale:             sv;Svenska
Input Locale:              sv;Svenska

Originally, I've got

chcp
850

But since it's not working, I tried using

chcp 1252

and the Nordic OEM:

chcp 865

before importing, in the cmd and from within GRASS in the command console. Nothing really changed. The rest of the reading I did was way over my head, sorry. But here's a link to the original shape file I'm having issues with. I can't attach it here because it's a bit over 2MB and I'm afraid that taking a sample from it and exporting it might change the encoding on export: https://www.dropbox.com/s/2ptgaf5owco63f0/stockholm.zip?dl=0 (the link should be valid for a month).

comment:9 by neteler, 7 years ago

Milestone: 7.2.07.2.1

Ticket retargeted after milestone closed

comment:10 by martinl, 7 years ago

Milestone: 7.2.17.2.2

comment:11 by neteler, 7 years ago

Milestone: 7.2.27.2.3

Ticket retargeted after milestone closed

comment:12 by martinl, 6 years ago

Milestone: 7.2.3

Ticket retargeted after milestone closed

comment:13 by martinl, 6 years ago

Milestone: 7.2.4

comment:14 by hellik, 4 years ago

see also #3925

Note: See TracTickets for help on using tickets.