Opened 11 months ago

Last modified 6 months ago

#3925 new defect

winGRASS 7.8.1dev: 'charmap' codec can't decode byte 0x9d - issue in vector attribute data handling (e.g. opening attribute table, v.report, etc)

Reported by: hellik Owned by: grass-dev@…
Priority: major Milestone: 7.8.3
Component: Vector Version: git-releasebranch78
Keywords: python3, py3, wingrass Cc:
CPU: x86-64 Platform: MSWindows

Description

tested with

GRASS Version: 7.8.1dev                                                         
Code revision: d1c4ad132                                                        
Build date: 2019-10-22                                                          
Build platform: x86_64-w64-mingw32                                              
GDAL: 2.4.1                                                                     
PROJ: 5.2.0                                                                     
GEOS: 3.8.0                                                                     
SQLite: 3.29.0                                                                  
Python: 3.7.0                                                                   
wxPython: 4.0.3                                                                 
Platform: Windows-10-10.0.18362-SP0 (OSGeo4W) 

downloaded data from geonames.org and imported data by v.in.geonames

v.report map=at_out@data option=coor                                            
Traceback (most recent call last):
  File "C:\OSGEO4~1\apps\grass\grass78/scripts/v.report.py",
line 226, in <module>
    main()
  File "C:\OSGEO4~1\apps\grass\grass78/scripts/v.report.py",
line 108, in main
    cols = decode(line).rstrip('\r\n').split('|')
  File "C:\OSGEO4~1\apps\grass\grass78\etc\python\grass\scri
pt\utils.py", line 193, in decode
    return bytes_.decode(enc)
  File "C:\OSGEO4~1\apps\Python37\lib\encodings\cp1252.py",
line 15, in decode
    return
codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d
in position 195: character maps to <undefined>

Change History (14)

comment:1 Changed 11 months ago by hellik

Priority: majorblocker
Summary: v.report - UnicodeDecodeError: 'charmap' codec can't decode byte 0x9dwinGRASS 7.8.1dev: 'charmap' codec can't decode byte 0x9d - issue in vector attribute data handling (e.g. opening attribute table, v.report, etc)

comment:2 Changed 11 months ago by hellik

now tested in a windows 10 box with a german locale:

GRASS Version: 7.8.1dev                                                         
Code revision: f5bfe545c                                                        
Build date: 2019-10-27                                                          
Build platform: x86_64-w64-mingw32                                              
GDAL: 2.4.1                                                                     
PROJ: 5.2.0                                                                     
GEOS: 3.8.0                                                                     
SQLite: 3.29.0                                                                  
Python: 3.7.0                                                                   
wxPython: 4.0.3                                                                 
Platform: Windows-10-10.0.18362-SP0 (OSGeo4W) 

download geonames data of an AT dump

v.in.geonames input=D:\temp\geonames\AT\AT.txt output=at_geonames               
Converting 51999 place names...
Scanne die Eingabe zur Ermittelung der Spaltentypen...
Number of columns: 19
Number of rows: 51999
WARNING: Spalte Nummer 13 <admin3code> ist als string definiert und hat nur Integer-Werte.
WARNING: Spalte Nummer 14 <admin4code> ist als string definiert und hat nur Integer-Werte.
Importiere Punkte...
Fülle Tabelle...
Erstelle Topologie für die Vektorkarte <at_geonames@data>...
Registriere Primitive...
GRASS_INFO_PROGRESS: 10000
GRASS_INFO_PROGRESS: 20000
GRASS_INFO_PROGRESS: 30000
GRASS_INFO_PROGRESS: 40000
GRASS_INFO_PROGRESS: 50000

clicking on a point with a german umlaut

east, north: 11.31700474110463, 47.23295282435024
at_geonames@data: 
  Type: Point
  Id: 10136
  Layer: 1
  Category: 10136
  Driver: sqlite
  Database: D:\grassdata\loc_test_vingeonames\data\sqlite\sqlite.db
  Table: at_geonames
  Key_column: cat
  Attributes: 
    cat: 10136
    geonameid: 2762446
    name: Vellenberg
    asciiname: Vellenberg
    latitude: 47.23333
    longitude: 11.31667
    featureclass: S
    featurecode: FRM
    countrycode: AT
    admin1code: 07
    admin2code: 703
    admin3code: 70312
    population: 0
    gtopo30: 865
    timezone: Europe/Vienna
    modification: 2014-05-03
at_geonames@data: 
  Type: Point
  Id: 25781
  Layer: 1
  Category: 25781
  Driver: sqlite
  Database: D:\grassdata\loc_test_vingeonames\data\sqlite\sqlite.db
  Table: at_geonames
  Key_column: cat
  Attributes: 
    cat: 25781
    geonameid: 2778215
    name: Götznerberg
    asciiname: Goetznerberg
    latitude: 47.23333
    longitude: 11.31667
    featureclass: S
    featurecode: FRM
    countrycode: AT
    admin1code: 07
    admin2code: 703
    admin3code: 70312
    population: 0
    gtopo30: 865
    timezone: Europe/Vienna
    modification: 2014-05-03

trying v.report

v.report map=at_geonames@data option=coor                                       
Traceback (most recent call last):
  File "C:\OSGEO4~1\apps\grass\grass78/scripts/v.report.py",
line 226, in <module>
    main()
  File "C:\OSGEO4~1\apps\grass\grass78/scripts/v.report.py",
line 108, in main
    cols = decode(line).rstrip('\r\n').split('|')
  File "C:\OSGEO4~1\apps\grass\grass78\etc\python\grass\scri
pt\utils.py", line 193, in decode
    return bytes_.decode(enc)
  File "C:\OSGEO4~1\apps\Python37\lib\encodings\cp1252.py",
line 15, in decode
    return
codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d
in position 195: character maps to <undefined>

or trying to open the vector attribute table

Traceback (most recent call last):
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\lmgr\frame.py",
line 2060, in OnShowAttributeTable

selection=selection)
  File "C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\dbmgr\ma
nager.py", line 112, in __init__

self.CreateDbMgrPage(parent=self, pageName='browse')
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\dbmgr\base.py",
line 811, in CreateDbMgrPage

parent=parent, parentDbMgrBase=self, onlyLayer=onlyLayer)
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\dbmgr\base.py",
line 1095, in __init__

self.AddLayer(layer)
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\dbmgr\base.py",
line 1138, in AddLayer

self.dbMgrData, layer, self.pages)
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\dbmgr\base.py",
line 113, in __init__

keyColumn = self.LoadData(layer)
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\dbmgr\base.py",
line 278, in LoadData

record = decode(outFile.readline().strip()).replace('\n',
'')
  File "C:\OSGEO4~1\apps\grass\grass78\etc\python\grass\scri
pt\utils.py", line 193, in decode

return bytes_.decode(enc)
  File "C:\OSGEO4~1\apps\Python37\lib\encodings\cp1252.py",
line 15, in decode

return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError
:
'charmap' codec can't decode byte 0x9d in position 219:
character maps to <undefined>

opening the table freezes the attribute table window

it seems to be an encoding issue of attribute data handling

'charmap' codec can't decode byte 0x9d

comment:3 Changed 11 months ago by hellik

maybe related #3220 WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values

comment:4 Changed 11 months ago by hellik

v.db.select output of the geonames data:

v.db.select map=at_geonames@data                                                
cat|geonameid|name|asciiname|alternatename|latitude|longitude|featureclass|featurecode|countrycode|cc2|admin1code|admin2code|admin3code|admin4code|population|elevation|gtopo30|timezone|modification
1|2598245|Sandgatterl|Sandgatterl||47.75|14.56667|T|PASS|AT||04|415|41522||0||1490|Europe/Vienna|2014-05-02
2|2598246|Viehtalalm|Viehtalalm||47.75|14.56667|L|GRAZ|AT||04||||0||1490|Europe/Vienna|1999-04-30
3|2598247|Adlmoarstein|Adlmoarstein||47.75|14.55|T|CLF|AT||04||||0||1023|Europe/Vienna|1999-04-30
4|2598248|Waldbaueralm|Waldbaueralm||47.75|14.56667|L|GRAZ|AT||04||||0||1490|Europe/Vienna|1999-04-30
5|2598249|Federeck|Federeck||47.75|14.56667|T|PK|AT||04|415|41522||0||1490|Europe/Vienna|2014-05-02
6|2598250|Mooshöhe|Mooshoehe||47.75|14.55|P|PPL|AT||04|415|41522||0||1023|Europe/Vienna|2014-05-02
7|2598251|Antonihütte|Antonihuette||47.75|14.53333|S|HUT|AT||04|415|41522||0||866|Europe/Vienna|2014-05-02
8|2598252|Bergeralm|Bergeralm||47.75|14.51667|L|GRAZ|AT||04||||0||780|Europe/Vienna|1999-04-30
9|2598253|Blabergalm|Blabergalm||47.75|14.5|L|GRAZ|AT||04||||0||885|Europe/Vienna|1999-04-30
10|2598254|Nattereck|Nattereck||47.75|14.48333|T|PK|AT||04|409|40914||0||721|Europe/Vienna|2014-05-02
11|2598255|Langeck|Langeck||47.75|14.48333|T|PK|AT||04|409|40914||0||721|Europe/Vienna|2014-05-02
12|2598256|Zorngraben|Zorngraben||47.75|14.46667|H|STMI|AT||04||||0||951|Europe/Vienna|1999-04-30
13|2598257|Gugler|Gugler||47.75|14.45|T|PK|AT||04|409|40914||0||1019|Europe/Vienna|2014-05-02
14|2598258|Zorngrabenklause|Zorngrabenklause||47.75|14.45|T|SLP|AT||04||||0||1019|Europe/Vienna|1999-04-30
15|2598259|Sitzenbacher Klause|Sitzenbacher Klause||47.75|14.45|T|PK|AT||04|409|40914||0||1019|Europe/Vienna|2014-05-02
16|2598260|Sitzenbachhütte|Sitzenbachhuette||47.75|14.45|S|HUT|AT||04|409|40914||0||1019|Europe/Vienna|2014-05-02
17|2598261|Deckleitnerbach|Deckleitnerbach||47.75|14.45|H|STM|AT||04||||0||1019|Europe/Vienna|1999-04-30
18|2598262|Hundseck|Hundseck||47.75|14.41667|T|PK|AT||04|409|40914||0||1120|Europe/Vienna|2014-05-02
19|2598263|Schafgraben|Schafgraben||47.75|14.4|H|STMI|AT||04||||0||1081|Europe/Vienna|1999-04-30
20|2598264|Maierreut|Maierreut||47.75|14.4|L|GRAZ|AT||04||||0||1081|Europe/Vienna|1999-04-30
21|2598265|Rumpelmayrreut|Rumpelmayrreut||47.75|14.38333|L|GRAZ|AT||04||||0||1094|Europe/Vienna|1999-04-30
22|2598266|Bloßboden|Blossboden||47.75|14.36667|T|SLP|AT||04||||0||1439|Europe/Vienna|1999-04-30
23|2598267|Weiße Ries|Weisse Ries||47.75|14.36667|T|CLF|AT||04||||0||1439|Europe/Vienna|1999-04-30

v.db.select seems to work, but some encoding issues also there, e.g. Weiße Ries|Weisse Ries

comment:5 in reply to:  4 ; Changed 11 months ago by hellik

Replying to hellik:

v.db.select output of the geonames data:

v.db.select map=at_geonames@data                                                
cat|geonameid|name|asciiname|alternatename|latitude|longitude|featureclass|featurecode|countrycode|cc2|admin1code|admin2code|admin3code|admin4code|population|elevation|gtopo30|timezone|modification
1|2598245|Sandgatterl|Sandgatterl||47.75|14.56667|T|PASS|AT||04|415|41522||0||1490|Europe/Vienna|2014-05-02
2|2598246|Viehtalalm|Viehtalalm||47.75|14.56667|L|GRAZ|AT||04||||0||1490|Europe/Vienna|1999-04-30
3|2598247|Adlmoarstein|Adlmoarstein||47.75|14.55|T|CLF|AT||04||||0||1023|Europe/Vienna|1999-04-30
4|2598248|Waldbaueralm|Waldbaueralm||47.75|14.56667|L|GRAZ|AT||04||||0||1490|Europe/Vienna|1999-04-30
5|2598249|Federeck|Federeck||47.75|14.56667|T|PK|AT||04|415|41522||0||1490|Europe/Vienna|2014-05-02
6|2598250|Mooshöhe|Mooshoehe||47.75|14.55|P|PPL|AT||04|415|41522||0||1023|Europe/Vienna|2014-05-02
7|2598251|Antonihütte|Antonihuette||47.75|14.53333|S|HUT|AT||04|415|41522||0||866|Europe/Vienna|2014-05-02
8|2598252|Bergeralm|Bergeralm||47.75|14.51667|L|GRAZ|AT||04||||0||780|Europe/Vienna|1999-04-30
9|2598253|Blabergalm|Blabergalm||47.75|14.5|L|GRAZ|AT||04||||0||885|Europe/Vienna|1999-04-30
10|2598254|Nattereck|Nattereck||47.75|14.48333|T|PK|AT||04|409|40914||0||721|Europe/Vienna|2014-05-02
11|2598255|Langeck|Langeck||47.75|14.48333|T|PK|AT||04|409|40914||0||721|Europe/Vienna|2014-05-02
12|2598256|Zorngraben|Zorngraben||47.75|14.46667|H|STMI|AT||04||||0||951|Europe/Vienna|1999-04-30
13|2598257|Gugler|Gugler||47.75|14.45|T|PK|AT||04|409|40914||0||1019|Europe/Vienna|2014-05-02
14|2598258|Zorngrabenklause|Zorngrabenklause||47.75|14.45|T|SLP|AT||04||||0||1019|Europe/Vienna|1999-04-30
15|2598259|Sitzenbacher Klause|Sitzenbacher Klause||47.75|14.45|T|PK|AT||04|409|40914||0||1019|Europe/Vienna|2014-05-02
16|2598260|Sitzenbachhütte|Sitzenbachhuette||47.75|14.45|S|HUT|AT||04|409|40914||0||1019|Europe/Vienna|2014-05-02
17|2598261|Deckleitnerbach|Deckleitnerbach||47.75|14.45|H|STM|AT||04||||0||1019|Europe/Vienna|1999-04-30
18|2598262|Hundseck|Hundseck||47.75|14.41667|T|PK|AT||04|409|40914||0||1120|Europe/Vienna|2014-05-02
19|2598263|Schafgraben|Schafgraben||47.75|14.4|H|STMI|AT||04||||0||1081|Europe/Vienna|1999-04-30
20|2598264|Maierreut|Maierreut||47.75|14.4|L|GRAZ|AT||04||||0||1081|Europe/Vienna|1999-04-30
21|2598265|Rumpelmayrreut|Rumpelmayrreut||47.75|14.38333|L|GRAZ|AT||04||||0||1094|Europe/Vienna|1999-04-30
22|2598266|Bloßboden|Blossboden||47.75|14.36667|T|SLP|AT||04||||0||1439|Europe/Vienna|1999-04-30
23|2598267|Weiße Ries|Weisse Ries||47.75|14.36667|T|CLF|AT||04||||0||1439|Europe/Vienna|1999-04-30

v.db.select seems to work, but some encoding issues also there, e.g. Weiße Ries|Weisse Ries

starting v.db.select pops up the same encoding error:

Exception in thread Thread-20:
Traceback (most recent call last):
  File "C:\OSGEO4~1\apps\Python37\lib\threading.py", line
917, in _bootstrap_inner
    self.run()
  File "C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\core\gco
nsole.py", line 162, in run
    self.resultQ.put((requestId, self.requestCmd.run()))
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\core\gcmd.py",
line 606, in run
    self._redirect_stream()
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\core\gcmd.py",
line 631, in _redirect_stream
    line = recv_some(self.module, e=0, stderr=0)
  File
"C:\OSGEO4~1\apps\grass\grass78\gui\wxpython\core\gcmd.py",
line 335, in recv_some
    y.append(decode(r))
  File "C:\OSGEO4~1\apps\grass\grass78\etc\python\grass\scri
pt\utils.py", line 193, in decode
    return bytes_.decode(enc)
  File "C:\OSGEO4~1\apps\Python37\lib\encodings\cp1252.py",
line 15, in decode
    return
codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d
in position 840: character maps to <undefined>

comment:6 Changed 11 months ago by annakrat

This is a larger problem. In case of geonames, they are encoded in utf8, but the decode function uses local encoding. There is GRASS_DB_ENCODING, which should in theory help, but it's not used in v.report and more generally, it's not tied to individual tables and it's not user-friendly. One practical way, which wouldn't solve this for all cases, but perhaps majority is to have a new decode function for attribute data, which would try first GRASS_DB_ENCODING if specified, then try decoding with local encoding and if that doesn't work, use utf8.

comment:7 in reply to:  6 ; Changed 11 months ago by neteler

Replying to annakrat:

One practical way, which wouldn't solve this for all cases, but perhaps majority is to have a new decode function for attribute data, which would try first GRASS_DB_ENCODING if specified, then try decoding with local encoding and if that doesn't work, use utf8.

Conditional GRASS_DB_ENCODING which might be used for inspiration in this regards:

https://github.com/OSGeo/grass/blob/b5b00f972917e38a6a912f7c385ddbf791889e70/gui/wxpython/dbmgr/base.py#L971

comment:8 in reply to:  7 Changed 11 months ago by annakrat

Replying to neteler:

Conditional GRASS_DB_ENCODING which might be used for inspiration in this regards:

https://github.com/OSGeo/grass/blob/b5b00f972917e38a6a912f7c385ddbf791889e70/gui/wxpython/dbmgr/base.py#L971

Based on this if you don't have DB encoding specified (in GUI preferences or through env variable) then it uses utf-8. That's fine on systems with utf-8 but on Windows? Should it use local encoding instead? Since we need to work with Python 3 and unicode strings, the garbage in, garbage out doesn't work now and at the same time we don't know the encoding of the attributes.

comment:9 Changed 11 months ago by neteler

Milestone: 7.8.17.8.2

Ticket retargeted after milestone closed

comment:10 Changed 9 months ago by neteler

Milestone: 7.8.2

Ticket retargeted after milestone closed

comment:11 Changed 9 months ago by neteler

Milestone: 7.8.3

comment:12 in reply to:  5 Changed 7 months ago by mmetz

Last edited 7 months ago by mmetz (previous) (diff)

comment:13 Changed 6 months ago by martinl

Is it really a blocker?

comment:14 Changed 6 months ago by annakrat

Priority: blockermajor

There has been some fixes in GUI, which could help a little bit. The general problem persists, but I don't think it's blocker.

Note: See TracTickets for help on using tickets.