Opened 13 years ago

Last modified 8 years ago

#1293 new defect

Creating mapset with non-latin letter gives python ascii error

Reported by: marisn Owned by: grass-dev@…
Priority: normal Milestone: 6.4.6
Component: wxGUI Version: svn-trunk
Keywords: Cc: martinl, torsti
CPU: Unspecified Platform: Unspecified

Description

Start GRASS WXGUI. In startup screen create a new mapset called "āšņļ" -> Unable to create mapset: 'ascii' codec cant decode blahblah (can not copy/paste text :( ). Still it creates new mapset and I can select it and start GRASS session.

WinGRASS-6.4.1RC1-1 on Windows Vista

C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:623: UnicodeWarning: Unicode e
qual comparison failed to convert both arguments to Unicode - interpreting them
as being unequal
  if mapset not in self.listOfMapsetsSelectable or \
C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:725: UnicodeWarning: Unicode e
qual comparison failed to convert both arguments to Unicode - interpreting them
as being unequal
  self.lbmapsets.SetSelection(self.listOfMapsets.index(mapset))
C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:680: UnicodeWarning: Unicode e
qual comparison failed to convert both arguments to Unicode - interpreting them
as being unequal
  if event.GetText() not in self.listOfMapsetsSelectable:

Attachments (1)

frame.py.patch (664 bytes ) - added by torsti 11 years ago.
patch to verify filename in OnCreateMapset

Download all attachments as: .zip

Change History (18)

comment:1 by marisn, 13 years ago

Seems that not only startup screen is failing with non-latin mapsets. No idea where this error comes from. Got it in WXGUI "Command console" by using mapset named "šaursliežu dzelzceļš".

C:\Program
Files\GRASS-64\etc\wxpython\gui_modules\goutput.py:748:
UnicodeWarning: Unicode equal comparison failed to convert
both arguments to Unicode - interpreting them as being
unequal
  if mapName in mapLayers:

comment:2 by marisn, 13 years ago

Another place, that fails: g.help

Traceback (most recent call last):
  File "c:/osgeo4w/usr/src/grass-6.4.1RC1/dist.i686-pc-
mingw32/etc/wxpython/gui_modules/prompt.py", line 777, in
OnItemSelected
  File "C:\Program
Files\GRASS-64\etc\wxpython\gui_modules\menuform.py", line
2022, in ParseInterface

tree = etree.fromstring(getInterfaceDescription(cmd[0]))
  File "C:\Program
Files\GRASS-64\etc\wxpython\gui_modules\menuform.py", line
1975, in getInterfaceDescription

"Details: %s") % (cmd, cmderr)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xd0 in position 30: ordinal
not in range(128)

And afterwards things are going only downhill from there. Starting a new GRASS WXGUI session greets with python crash and skipping of startup screen:

KŽ█DA:KARąU KOPA ┼Īaurslie┼Šu dzelzce─╝┼Ī nav atrasta
C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:623: UnicodeWarning: Unicode e
qual comparison failed to convert both arguments to Unicode - interpreting them
as being unequal
  if mapset not in self.listOfMapsetsSelectable or \
Traceback (most recent call last):
  File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 858, in <module
>
    GRASSStartUp = StartUp(0)
  File "C:\OSGeo4W\apps\Python25\lib\site-packages\wx-2.8-msw-unicode\wx\_core.p
y", line 7935, in __init__
  File "C:\OSGeo4W\apps\Python25\lib\site-packages\wx-2.8-msw-unicode\wx\_core.p
y", line 7509, in _BootstrapApp
  File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 829, in OnInit
    StartUp = GRASSStartup()
  File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 166, in __init_
_
    self._set_properties()
  File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 241, in _set_pr
operties
    (utils.UnicodeString(mapset))
  File "C:\Program Files\GRASS-64\etc\wxpython\gui_modules\utils.py", line 669,
in UnicodeString
    return unicode(string, enc)
  File "C:\Program Files\GRASS-64\Python25\lib\encodings\cp1257.py", line 15, in
 decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0xa1 in position 1: charac
ter maps to <undefined>

Contents of .grassrc6 ("šaursliežu dzelzceļš" is a mapset in location "spearfish60" and not "demo"):

GISDBASE: C:\Users\tests\Documents\GIS_DataBase
LOCATION_NAME: demolocation
MAPSET: šaursliežu dzelzceļš
GRASS_GUI: wxpython
MAPSET : PERMANENT

in reply to:  description ; comment:3 by glynn, 13 years ago

Replying to marisn:

Note that 8-bit characters are not valid within names of locations, mapsets or maps. It should refuse to create the mapset, albeit with a more meaningful error message.

in reply to:  3 ; comment:4 by martinl, 13 years ago

Replying to glynn:

Note that 8-bit characters are not valid within names of locations, mapsets or maps. It should refuse to create the mapset, albeit with a more meaningful error message.

How complicated would be to allow 8-bit characters in the location/mapset/map names? BTW, on Linux, I am able to create location, mapset with 8-bit characters, start GRASS with created mapset and display maps from this mapset.

comment:5 by martinl, 13 years ago

Cc: martinl added

in reply to:  3 comment:6 by marisn, 13 years ago

Replying to glynn:

Note that 8-bit characters are not valid within names of locations, mapsets or maps. It should refuse to create the mapset, albeit with a more meaningful error message.

URL to documentation? RFC?

in reply to:  4 comment:7 by glynn, 13 years ago

Replying to martinl:

How complicated would be to allow 8-bit characters in the location/mapset/map names?

It's not particularly complicated, just a lot of work. We would have to audit the whole of GRASS for any issues, then fix them.

For code which simply passes strings around within GRASS, it isn't an issue. It becomes an issue when code tries to interpret strings or pass them outside of GRASS.

The main issue is that once you use characters outside of ASCII, encoding becomes an issue. Many functions (e.g. case conversions) are affected by the current locale. Also, if the map name is written to a file, some file formats require the use of specific encodings, so we would have to convert the text to that encoding to avoid creating invalid files. E.g. if a map name is in ISO-8859-* and it gets written directly to a file format which uses UTF-8, the result is an invalid file.

If you stick to ASCII, none of this matters. All commonly-used encodings are supersets of ASCII, so conversion between ASCII and such encodings is a no-op. Once you move outside of ASCII you have to perform conversions, which means knowing which encoding you're converting to/from.

Another significant issue is that it's unspecified whether "char" is signed or unsigned. Signed is more common, which can break code which asssumes that a char ranges from 0 to 255.

BTW, on Linux, I am able to create location, mapset with 8-bit characters, start GRASS with created mapset and display maps from this mapset.

That indicates that G_legal_name() isn't being called when it should. This function tests for invalid characters:

	if (*s == '/' || *s == '"' || *s == '\'' || *s <= ' ' ||
	    *s == '@' || *s == ',' || *s == '=' || *s == '*' || *s > 0176) {
	    G_warning(_("Illegal filename <%s>. Character <%c> not allowed.\n"), name, *s);
	    return -1;
	}

This prohibits all 8-bit characters (due to either the "*s > 0176" or the "*s <= ' '", depending upon whether char is signed or unsigned).

comment:8 by hellik, 13 years ago

Keywords: wingrass added

in reply to:  8 comment:9 by glynn, 13 years ago

Replying to hellik:

keywords wingrass added

Is this actually Windows-specific?

comment:10 by torsti, 11 years ago

Cc: torsti added
Keywords: wingrass removed
Platform: MSWindows VistaUnspecified
Version: 6.4.1 RCssvn-trunk

This bug persists in a fairly recent grass7 on Linux. But it's really two separate issues: Unicode handling in the python GUI (lost of other tickets on that...) and the fact that the validity of the mapset name isn't being checked.

in reply to:  10 comment:11 by annakrat, 11 years ago

Replying to torsti:

This bug persists in a fairly recent grass7 on Linux. But it's really two separate issues: Unicode handling in the python GUI (lost of other tickets on that...) and the fact that the validity of the mapset name isn't being checked.

When you create new mapset within the startup screen, it is checked (added 1 or 2 months ago). I don't understand why g.mapset itself doesn't check it, it prints only warning but the mapset with illegal name is created anyway.

by torsti, 11 years ago

Attachment: frame.py.patch added

patch to verify filename in OnCreateMapset

comment:12 by mlennert, 8 years ago

Neither in trunk, not in release70 can I create a mapset with accents, be it in the GUI startup screen or using g.mapset -c. This part thus seems to be fixed.

The question is whether this bug should be left open for the question of actually allowing such mapset names.

I would plead for closing this bug and possibly opening another if this is deemed worth the effort.

comment:13 by annakrat, 8 years ago

When creating a mapset with non ascii characters from GUI menu - Create new mapset, I am getting error:

Traceback (most recent call last):
  File "/home/anna/dev/grass/trunk1/dist.x86_64-unknown-
linux-gnu/gui/wxpython/lmgr/frame.py", line 950, in
OnCreateMapset

mapset = mapset)
  File "/home/anna/dev/grass/trunk1/dist.x86_64-unknown-
linux-gnu/gui/wxpython/core/gcmd.py", line 701, in
RunCommand

stdout, stderr = map(DecodeString, ps.communicate())
  File "/home/anna/dev/grass/trunk1/dist.x86_64-unknown-
linux-gnu/gui/wxpython/core/gcmd.py", line 76, in
DecodeString

return string.decode(_enc)
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in
decode

return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError
:
'utf8' codec can't decode byte 0xc4 in position 47: invalid
continuation byte

The patch is probably trying to prevent from doing it.

in reply to:  13 ; comment:14 by mlennert, 8 years ago

Replying to annakrat:

When creating a mapset with non ascii characters from GUI menu - Create new mapset, I am getting error:

[...]

The patch is probably trying to prevent from doing it.

Right, I didn't try that path. And yes, the patch tries to prevent the GUI from calling g.mapset with an illegal filename. But should the GUI do such checks if the module already provides them ? Shouldn't the GUI just pass on the string correctly and let the module handle the error ? Especially if we might decide one day that mapsets should be able to have special characters in their names ?

in reply to:  14 comment:15 by annakrat, 8 years ago

Replying to mlennert:

Replying to annakrat:

When creating a mapset with non ascii characters from GUI menu - Create new mapset, I am getting error:

[...]

The patch is probably trying to prevent from doing it.

Right, I didn't try that path. And yes, the patch tries to prevent the GUI from calling g.mapset with an illegal filename. But should the GUI do such checks if the module already provides them ? Shouldn't the GUI just pass on the string correctly and let the module handle the error ? Especially if we might decide one day that mapsets should be able to have special characters in their names ?

I agree, but we have to wrap it with try except, it tries to convert to unicode incomplete character from g.mapset error message (Character <�> not allowed.) I can look at it later today.

in reply to:  14 comment:16 by glynn, 8 years ago

Replying to mlennert:

And yes, the patch tries to prevent the GUI from calling g.mapset with an illegal filename. But should the GUI do such checks if the module already provides them ? Shouldn't the GUI just pass on the string correctly and let the module handle the error ?

The GUI gets the name as a unicode string, which it needs to convert to a byte string in order to pass it to a command as an argument. That requires knowing the encoding, and it requires that the mapset name is actually representable in that encoding.

The problem with allowing non-ASCII mapset names is that there may be other people using the system and who use a different encoding. If they use the GUI to list the mapsets in a location, the mapset name may not be valid in the encoding they use.

Having the GUI forbid non-ASCII names up front largely eliminates the first issue. Most of the encodings in current use are supersets of ASCII, and the few that aren't are close enough that the differences are unlikely to cause problems in practice (e.g. such encodings are typically unibyte for the 7-bit range and only deviate from ASCII for less-common punctuation characters). And the second issue means that the prohibition on 8-bit characters is unlikely to be removed in the foreseeable future.

comment:17 by neteler, 8 years ago

Milestone: 6.4.16.4.6
Note: See TracTickets for help on using tickets.