Opened 14 years ago
Last modified 9 years ago
#1293 new defect
Creating mapset with non-latin letter gives python ascii error
Reported by: | marisn | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 6.4.6 |
Component: | wxGUI | Version: | svn-trunk |
Keywords: | Cc: | martinl, torsti | |
CPU: | Unspecified | Platform: | Unspecified |
Description
Start GRASS WXGUI. In startup screen create a new mapset called "āšņļ" -> Unable to create mapset: 'ascii' codec cant decode blahblah (can not copy/paste text :( ). Still it creates new mapset and I can select it and start GRASS session.
WinGRASS-6.4.1RC1-1 on Windows Vista
C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:623: UnicodeWarning: Unicode e qual comparison failed to convert both arguments to Unicode - interpreting them as being unequal if mapset not in self.listOfMapsetsSelectable or \ C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:725: UnicodeWarning: Unicode e qual comparison failed to convert both arguments to Unicode - interpreting them as being unequal self.lbmapsets.SetSelection(self.listOfMapsets.index(mapset)) C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:680: UnicodeWarning: Unicode e qual comparison failed to convert both arguments to Unicode - interpreting them as being unequal if event.GetText() not in self.listOfMapsetsSelectable:
Attachments (1)
Change History (18)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Another place, that fails: g.help
Traceback (most recent call last): File "c:/osgeo4w/usr/src/grass-6.4.1RC1/dist.i686-pc- mingw32/etc/wxpython/gui_modules/prompt.py", line 777, in OnItemSelected File "C:\Program Files\GRASS-64\etc\wxpython\gui_modules\menuform.py", line 2022, in ParseInterface tree = etree.fromstring(getInterfaceDescription(cmd[0])) File "C:\Program Files\GRASS-64\etc\wxpython\gui_modules\menuform.py", line 1975, in getInterfaceDescription "Details: %s") % (cmd, cmderr) UnicodeDecodeError : 'ascii' codec can't decode byte 0xd0 in position 30: ordinal not in range(128)
And afterwards things are going only downhill from there. Starting a new GRASS WXGUI session greets with python crash and skipping of startup screen:
KŽ█DA:KARąU KOPA ┼Īaurslie┼Šu dzelzce─╝┼Ī nav atrasta C:/Program Files/GRASS-64/etc/wxpython/gis_set.py:623: UnicodeWarning: Unicode e qual comparison failed to convert both arguments to Unicode - interpreting them as being unequal if mapset not in self.listOfMapsetsSelectable or \ Traceback (most recent call last): File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 858, in <module > GRASSStartUp = StartUp(0) File "C:\OSGeo4W\apps\Python25\lib\site-packages\wx-2.8-msw-unicode\wx\_core.p y", line 7935, in __init__ File "C:\OSGeo4W\apps\Python25\lib\site-packages\wx-2.8-msw-unicode\wx\_core.p y", line 7509, in _BootstrapApp File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 829, in OnInit StartUp = GRASSStartup() File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 166, in __init_ _ self._set_properties() File "C:/Program Files/GRASS-64/etc/wxpython/gis_set.py", line 241, in _set_pr operties (utils.UnicodeString(mapset)) File "C:\Program Files\GRASS-64\etc\wxpython\gui_modules\utils.py", line 669, in UnicodeString return unicode(string, enc) File "C:\Program Files\GRASS-64\Python25\lib\encodings\cp1257.py", line 15, in decode return codecs.charmap_decode(input,errors,decoding_table) UnicodeDecodeError: 'charmap' codec can't decode byte 0xa1 in position 1: charac ter maps to <undefined>
Contents of .grassrc6 ("šaursliežu dzelzceļš" is a mapset in location "spearfish60" and not "demo"):
GISDBASE: C:\Users\tests\Documents\GIS_DataBase LOCATION_NAME: demolocation MAPSET: šaursliežu dzelzceļš GRASS_GUI: wxpython MAPSET : PERMANENT
follow-ups: 4 6 comment:3 by , 14 years ago
Replying to marisn:
Note that 8-bit characters are not valid within names of locations, mapsets or maps. It should refuse to create the mapset, albeit with a more meaningful error message.
follow-up: 7 comment:4 by , 14 years ago
Replying to glynn:
Note that 8-bit characters are not valid within names of locations, mapsets or maps. It should refuse to create the mapset, albeit with a more meaningful error message.
How complicated would be to allow 8-bit characters in the location/mapset/map names? BTW, on Linux, I am able to create location, mapset with 8-bit characters, start GRASS with created mapset and display maps from this mapset.
comment:5 by , 14 years ago
Cc: | added |
---|
comment:6 by , 14 years ago
Replying to glynn:
Note that 8-bit characters are not valid within names of locations, mapsets or maps. It should refuse to create the mapset, albeit with a more meaningful error message.
URL to documentation? RFC?
comment:7 by , 14 years ago
Replying to martinl:
How complicated would be to allow 8-bit characters in the location/mapset/map names?
It's not particularly complicated, just a lot of work. We would have to audit the whole of GRASS for any issues, then fix them.
For code which simply passes strings around within GRASS, it isn't an issue. It becomes an issue when code tries to interpret strings or pass them outside of GRASS.
The main issue is that once you use characters outside of ASCII, encoding becomes an issue. Many functions (e.g. case conversions) are affected by the current locale. Also, if the map name is written to a file, some file formats require the use of specific encodings, so we would have to convert the text to that encoding to avoid creating invalid files. E.g. if a map name is in ISO-8859-* and it gets written directly to a file format which uses UTF-8, the result is an invalid file.
If you stick to ASCII, none of this matters. All commonly-used encodings are supersets of ASCII, so conversion between ASCII and such encodings is a no-op. Once you move outside of ASCII you have to perform conversions, which means knowing which encoding you're converting to/from.
Another significant issue is that it's unspecified whether "char" is signed or unsigned. Signed is more common, which can break code which asssumes that a char ranges from 0 to 255.
BTW, on Linux, I am able to create location, mapset with 8-bit characters, start GRASS with created mapset and display maps from this mapset.
That indicates that G_legal_name() isn't being called when it should. This function tests for invalid characters:
if (*s == '/' || *s == '"' || *s == '\'' || *s <= ' ' || *s == '@' || *s == ',' || *s == '=' || *s == '*' || *s > 0176) { G_warning(_("Illegal filename <%s>. Character <%c> not allowed.\n"), name, *s); return -1; }
This prohibits all 8-bit characters (due to either the "*s > 0176" or the "*s <= ' '", depending upon whether char is signed or unsigned).
follow-up: 9 comment:8 by , 14 years ago
Keywords: | wingrass added |
---|
comment:9 by , 14 years ago
follow-up: 11 comment:10 by , 12 years ago
Cc: | added |
---|---|
Keywords: | wingrass removed |
Platform: | MSWindows Vista → Unspecified |
Version: | 6.4.1 RCs → svn-trunk |
This bug persists in a fairly recent grass7 on Linux. But it's really two separate issues: Unicode handling in the python GUI (lost of other tickets on that...) and the fact that the validity of the mapset name isn't being checked.
comment:11 by , 12 years ago
Replying to torsti:
This bug persists in a fairly recent grass7 on Linux. But it's really two separate issues: Unicode handling in the python GUI (lost of other tickets on that...) and the fact that the validity of the mapset name isn't being checked.
When you create new mapset within the startup screen, it is checked (added 1 or 2 months ago). I don't understand why g.mapset itself doesn't check it, it prints only warning but the mapset with illegal name is created anyway.
comment:12 by , 9 years ago
Neither in trunk, not in release70 can I create a mapset with accents, be it in the GUI startup screen or using g.mapset -c. This part thus seems to be fixed.
The question is whether this bug should be left open for the question of actually allowing such mapset names.
I would plead for closing this bug and possibly opening another if this is deemed worth the effort.
follow-up: 14 comment:13 by , 9 years ago
When creating a mapset with non ascii characters from GUI menu - Create new mapset, I am getting error:
Traceback (most recent call last): File "/home/anna/dev/grass/trunk1/dist.x86_64-unknown- linux-gnu/gui/wxpython/lmgr/frame.py", line 950, in OnCreateMapset mapset = mapset) File "/home/anna/dev/grass/trunk1/dist.x86_64-unknown- linux-gnu/gui/wxpython/core/gcmd.py", line 701, in RunCommand stdout, stderr = map(DecodeString, ps.communicate()) File "/home/anna/dev/grass/trunk1/dist.x86_64-unknown- linux-gnu/gui/wxpython/core/gcmd.py", line 76, in DecodeString return string.decode(_enc) File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError : 'utf8' codec can't decode byte 0xc4 in position 47: invalid continuation byte
The patch is probably trying to prevent from doing it.
follow-ups: 15 16 comment:14 by , 9 years ago
Replying to annakrat:
When creating a mapset with non ascii characters from GUI menu - Create new mapset, I am getting error:
[...]
The patch is probably trying to prevent from doing it.
Right, I didn't try that path. And yes, the patch tries to prevent the GUI from calling g.mapset with an illegal filename. But should the GUI do such checks if the module already provides them ? Shouldn't the GUI just pass on the string correctly and let the module handle the error ? Especially if we might decide one day that mapsets should be able to have special characters in their names ?
comment:15 by , 9 years ago
Replying to mlennert:
Replying to annakrat:
When creating a mapset with non ascii characters from GUI menu - Create new mapset, I am getting error:
[...]
The patch is probably trying to prevent from doing it.
Right, I didn't try that path. And yes, the patch tries to prevent the GUI from calling g.mapset with an illegal filename. But should the GUI do such checks if the module already provides them ? Shouldn't the GUI just pass on the string correctly and let the module handle the error ? Especially if we might decide one day that mapsets should be able to have special characters in their names ?
I agree, but we have to wrap it with try except, it tries to convert to unicode incomplete character from g.mapset error message (Character <�> not allowed.) I can look at it later today.
comment:16 by , 9 years ago
Replying to mlennert:
And yes, the patch tries to prevent the GUI from calling g.mapset with an illegal filename. But should the GUI do such checks if the module already provides them ? Shouldn't the GUI just pass on the string correctly and let the module handle the error ?
The GUI gets the name as a unicode string, which it needs to convert to a byte string in order to pass it to a command as an argument. That requires knowing the encoding, and it requires that the mapset name is actually representable in that encoding.
The problem with allowing non-ASCII mapset names is that there may be other people using the system and who use a different encoding. If they use the GUI to list the mapsets in a location, the mapset name may not be valid in the encoding they use.
Having the GUI forbid non-ASCII names up front largely eliminates the first issue. Most of the encodings in current use are supersets of ASCII, and the few that aren't are close enough that the differences are unlikely to cause problems in practice (e.g. such encodings are typically unibyte for the 7-bit range and only deviate from ASCII for less-common punctuation characters). And the second issue means that the prohibition on 8-bit characters is unlikely to be removed in the foreseeable future.
comment:17 by , 9 years ago
Milestone: | 6.4.1 → 6.4.6 |
---|
Seems that not only startup screen is failing with non-latin mapsets. No idea where this error comes from. Got it in WXGUI "Command console" by using mapset named "šaursliežu dzelzceļš".