Opened 7 years ago
Closed 6 years ago
#3441 closed defect (fixed)
Use some UTF-8 locale for en language
Reported by: | marisn | Owned by: | marisn |
---|---|---|---|
Priority: | major | Milestone: | 7.4.2 |
Component: | Translations | Version: | svn-trunk |
Keywords: | locale | Cc: | |
CPU: | Unspecified | Platform: | Unspecified |
Description
At the moment choosing en in the GUI language override, sets locale to C. Although it seems to be fine, as GRASS is not shipping any en translations, it causes problems by being limited to ASCII charset. See #3423 for an example. The lack of UTF-8 support in C locale (or coupling of UTF-8 with particular language_country) is a known issue and has been fixed by many GNU/Linux distributions by providing a C.UTF-8 locale (note, it is not the same as en_US.UTF-8).
As en_US.UTF-8 locale might not be present, in case of user overriding to en, GRASS should try to use en_US.UTF-8, then C.UTF-8 and only then fail back to C (with appropriate warnings).
More details:
https://sourceware.org/bugzilla/show_bug.cgi?id=17318
https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
https://unix.stackexchange.com/questions/149111/what-should-i-set-my-locale-to-and-what-are-the-implications-of-doing-so/
https://www.python.org/dev/peps/pep-0538/
https://bugs.python.org/issue28180
https://bugs.python.org/issue19846
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=636086
Change History (17)
follow-up: 3 comment:1 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:2 by , 7 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Autoclosed by accident. Must keep open as needs backport to 7.4.0.
Needs testing on exotic locale setups and MS Windows. How to test:
- start GRASS GIS, take a look at your data (r.category, vector attributes);
- if all is fine, go to GUI preferences and change system language to en. Save settings and restart GRASS;
- repeat the first step and report results here.
Works for me on system with lv_LV.UTF-8 with and without en_US locale available.
follow-ups: 4 5 comment:3 by , 7 years ago
Replying to marisn:
In 71729:
I use the following in NC basic:
g.copy rast=landuse,lu > r.category lu rules=- sep=comma << EOF > 1,developpé > 2,agriculture > 3,herbacé > 4,shrubland > 5,forêt > 6,water > 7,sédiments > EOF
When I set the preferences in the GUI to 'en', and relaunch GRASS I get the following when I launch g.gui:
17:50:19: Cannot set locale to language "English (U.S.)". 17:50:19: locale 'en_US' cannot be set.
and:
> locale locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en LANGUAGE=en LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL=
When I launch r.category in the GUI I only get the following output
r.category map=lu@user1 2 agriculture 4 shrubland 6 water
and this traceback in the console:
Traceback (most recent call last): File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in OnCmdOutput self.cmdOutput.AddStyledMessage(message, type) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in AddStyledMessage self.AddTextWrapped(message, wrap=None) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in AddTextWrapped txt = EncodeString(txt) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/core/gcmd.py", line 97, in EncodeString return string.encode(_enc) UnicodeDecodeError : 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128) Traceback (most recent call last): File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in OnCmdOutput self.cmdOutput.AddStyledMessage(message, type) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in AddStyledMessage self.AddTextWrapped(message, wrap=None) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in AddTextWrapped txt = EncodeString(txt) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/core/gcmd.py", line 97, in EncodeString return string.encode(_enc) UnicodeDecodeError : 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128) Traceback (most recent call last): File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in OnCmdOutput self.cmdOutput.AddStyledMessage(message, type) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in AddStyledMessage self.AddTextWrapped(message, wrap=None) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in AddTextWrapped txt = EncodeString(txt) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/core/gcmd.py", line 97, in EncodeString return string.encode(_enc) UnicodeDecodeError : 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128) Traceback (most recent call last): File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in OnCmdOutput self.cmdOutput.AddStyledMessage(message, type) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in AddStyledMessage self.AddTextWrapped(message, wrap=None) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in AddTextWrapped txt = EncodeString(txt) File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc- linux-gnu/gui/wxpython/core/gcmd.py", line 97, in EncodeString return string.encode(_enc) UnicodeDecodeError : 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
comment:4 by , 7 years ago
Replying to mlennert:
When I set the preferences in the GUI to 'en', and relaunch GRASS I get the following when I launch g.gui:
17:50:19: Cannot set locale to language "English (U.S.)". 17:50:19: locale 'en_US' cannot be set.and:
> locale locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en LANGUAGE=en LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL=
If you have set up language (locale) override in GUI, all magic (and thus also all problems) should be tied down to startup screen. If there where no warnings when starting GRASS, it means Python at least considers setting locale to be a success. Still further output indicates that it wasn't a success. Startup screen locale logic is simple – if Python can set locale, assume it to be good, if it fails –, it's bad (and for en fail back to C locale). This also *could* be an issue in wx thus testing with a more recent version could be an option: https://groups.google.com/forum/#!topic/wx-users/duTOQm_jPk4
I'll need output of
- python --version
- locale -a (outside of GRASS session)
- cat /etc/locale.conf
- cat /etc/locale.gen (only uncommented lines matter)
- following python code:
import locale print('Default locale: ', locale.getdefaultlocale()) try: # This should fail locale.setlocale(locale.LC_ALL, 'en') print('Set to en success') except locale.Error as e: print('Set to en fail: %s' % e) norm = locale.normalize('en.UTF-8') print('Normalized version: %s' % norm) try: # This should work if en_US.UTF-8 locale is present locale.setlocale(locale.LC_ALL, norm) print('Set to normalized success') except locale.Error as e: print('Set normalized fail: %s' % e) import wx print('wx version: %s' % wx.version())
follow-up: 6 comment:5 by , 7 years ago
Replying to mlennert:
Replying to marisn:
In 71729:
I use the following in NC basic:
g.copy rast=landuse,lu > r.category lu rules=- sep=comma << EOF > 1,developpé > 2,agriculture > 3,herbacé > 4,shrubland > 5,forêt > 6,water > 7,sédiments > EOFWhen I set the preferences in the GUI to 'en', and relaunch GRASS I get the following when I launch g.gui:
17:50:19: Cannot set locale to language "English (U.S.)". 17:50:19: locale 'en_US' cannot be set.
This comes from the fact that en_US.UTF-8 was not installed on my machine. Installing it solves the issue. However, I think C.UTF-8 was installed by default (on Debian testing) and using that also solves the issue, so it might not be necessary to use en_US.UTF-8, if C.UTF-8 provides the same solution. I do not know, however, if C.UTF-8 is present on most machines by default, but en_US.UTF-8 is not.
For some reason, it does not detect the absence of en_US.UTF-8 early enough, and all locale variables are set to that locale.
Here the situation when I do not have en_US.UTF-8 installed:
locale -a C C.UTF-8 fr_BE.utf8 POSIX
In /etc/locale.gen, only fr_BE.utf8 is uncommented.
python --version Python 2.7.14
And apparently Python translates C.UTF-8 automatically:
>>> import locale >>> locale.getlocale() (None, None) >>> locale.setlocale(locale.LC_ALL, 'C.UTF-8') 'C.UTF-8' >>> locale.getlocale() ('en_US', 'UTF-8')
follow-up: 7 comment:6 by , 7 years ago
Replying to mlennert:
And apparently Python translates C.UTF-8 automatically:
This is a bug in Python. It means – there is nothing (reasonable*) we can do in GRASS till it gets fixed. https://bugs.python.org/issue30755
- Unreasonable solution could be checking if Python sort is using C or en_US order and thus detecting affected Python versions.
Technical explanation: GRASS startup locale code is calling locale.setlocale() to set locale for current program. It should fail if locale is not available and then this locale is not used. If it succeeds, same locale is set to environmental variables to enable it for the rest of GRASS work session. In this case Python is magically translating 'C.UTF-8' to 'en_US.UTF-8', although 'en_US.UTF-8' is missing and any attempts to use it should cause a failure. As Python is lying on 'en_US.UTF-8' usability, there is no way* (see remark) how to determine it and all of my careful attempts to set only useable locale are lost.
follow-up: 8 comment:7 by , 7 years ago
Replying to marisn:
Replying to mlennert:
And apparently Python translates C.UTF-8 automatically:
This is a bug in Python. It means – there is nothing (reasonable*) we can do in GRASS till it gets fixed. https://bugs.python.org/issue30755
Thanks for finding this.
- Unreasonable solution could be checking if Python sort is using C or en_US order and thus detecting affected Python versions.
No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as
Cannot set locale to language "English (U.S.)". locale 'en_US' cannot be set.
does not really make it clear why it cannot be set...
follow-up: 9 comment:8 by , 7 years ago
Replying to mlennert:
No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as does not really make it clear why it cannot be set...
Can you test this patch? I can not commit, as I have some other uncommitted changes in startup script. I'll commit in the evening from my home computer.
@@ -1160,6 +1168,18 @@ locale.setlocale(locale.LC_ALL, normalized) except locale.Error as e: if language == 'en': + # A workaround for Python Issue30755 + # https://bugs.python.org/issue30755 + if locale.normalize('C.UTF-8') == 'en_US.UTF-8': + locale.setlocale(locale.LC_ALL, 'C') + os.environ['LANGUAGE'] = 'C' + os.environ['LANG'] = 'C' + os.environ['LC_MESSAGES'] = 'C' + os.environ['LC_NUMERIC'] = 'C' + os.environ['LC_TIME'] = 'C' + sys.stderr.write("To avoid Unicode errors in GUI, install en_US.UTF-8 locale and restart GRASS.\n" + "Also consider upgrading your Python version to one containg fix for Python Issue 30755.\n") + return # en_US locale might be missing, still all messages in # GRASS are already in en_US language. # Using plain C as locale forces encodings to ascii
follow-up: 10 comment:9 by , 7 years ago
Replying to marisn:
Replying to mlennert:
No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as does not really make it clear why it cannot be set...
Can you test this patch? I can not commit, as I have some other uncommitted changes in startup script. I'll commit in the evening from my home computer.
Committed as r71807. If works for you now, then needs backport (together with r71729).
comment:10 by , 7 years ago
Replying to marisn:
Replying to marisn:
Replying to mlennert:
No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as does not really make it clear why it cannot be set...
Can you test this patch? I can not commit, as I have some other uncommitted changes in startup script. I'll commit in the evening from my home computer.
Committed as r71807. If works for you now, then needs backport (together with r71729).
This works in that I don't have the error message about a missing en_US.UTF-8 anymore, but I'm back to point 0, as I don't see any output containing special characters (e.g. the r.category example).
My question was how widespread C.UTF-8 is as a locale present on machines. This would allow to do the following, which solves my problem:
Index: grass.py =================================================================== --- grass.py (révision 71807) +++ grass.py (copie de travail) @@ -1163,12 +1163,12 @@ # A workaround for Python Issue30755 # https://bugs.python.org/issue30755 if locale.normalize('C.UTF-8') == 'en_US.UTF-8': - locale.setlocale(locale.LC_ALL, 'C') + locale.setlocale(locale.LC_ALL, 'C.UTF-8') os.environ['LANGUAGE'] = 'C' - os.environ['LANG'] = 'C' - os.environ['LC_MESSAGES'] = 'C' + os.environ['LANG'] = 'C.UTF-8' + os.environ['LC_MESSAGES'] = 'C.UTF-8' os.environ['LC_NUMERIC'] = 'C' - os.environ['LC_TIME'] = 'C' + os.environ['LC_TIME'] = 'C.UTF-8' sys.stderr.write("To avoid Unicode errors in GUI, install en_US.UTF-8 locale and restart GRASS.\n" "Also consider upgrading your Python version to one containg fix for Python Issue 30755.\n") return
follow-up: 12 comment:11 by , 7 years ago
Actually, just setting LC_CTYPE seems enough, AFAIU:
Index: grass.py =================================================================== --- grass.py (révision 71807) +++ grass.py (copie de travail) @@ -1166,6 +1166,7 @@ locale.setlocale(locale.LC_ALL, 'C') os.environ['LANGUAGE'] = 'C' os.environ['LANG'] = 'C' + os.environ['LC_CTYPE'] = 'C.UTF-8' os.environ['LC_MESSAGES'] = 'C' os.environ['LC_NUMERIC'] = 'C' os.environ['LC_TIME'] = 'C'
follow-up: 13 comment:12 by , 7 years ago
Replying to mlennert:
Actually, just setting LC_CTYPE seems enough, AFAIU:
LC_CTYPE might be the only safe option here, as I was getting errors when trying to set other LC_ parameters to C.UTF-8 due to that Python bug (any Python call locale.setlocale(XX, 'C.UTF-8') will cause failure if en_US.UTF-8 is missing). But please check if your system really does not have en_US.UTF-8, as on my Ubuntu box I observed that this locale was present even when I uncommented it locale.gen file (locale-gen was not removing existing locale files).
comment:13 by , 7 years ago
Replying to marisn:
Replying to mlennert:
Actually, just setting LC_CTYPE seems enough, AFAIU:
LC_CTYPE might be the only safe option here, as I was getting errors when trying to set other LC_ parameters to C.UTF-8 due to that Python bug (any Python call locale.setlocale(XX, 'C.UTF-8') will cause failure if en_US.UTF-8 is missing). But please check if your system really does not have en_US.UTF-8, as on my Ubuntu box I observed that this locale was present even when I uncommented it locale.gen file (locale-gen was not removing existing locale files).
locale -a does not show en_US.UTF-8. Don't know if it might "hide" somewhere ;-)
comment:15 by , 7 years ago
Milestone: | 7.4.1 → 7.4.2 |
---|
comment:17 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
I think we have done everything what is possible on the GRASS side to make it work. Closing as fixed.
In 71729: