Opened 2 years ago

Closed 14 months ago

#3441 closed defect (fixed)

Use some UTF-8 locale for en language

Reported by: marisn Owned by: marisn
Priority: major Milestone: 7.4.2
Component: Translations Version: svn-trunk
Keywords: locale Cc:
CPU: Unspecified Platform: Unspecified

Description

At the moment choosing en in the GUI language override, sets locale to C. Although it seems to be fine, as GRASS is not shipping any en translations, it causes problems by being limited to ASCII charset. See #3423 for an example. The lack of UTF-8 support in C locale (or coupling of UTF-8 with particular language_country) is a known issue and has been fixed by many GNU/Linux distributions by providing a C.UTF-8 locale (note, it is not the same as en_US.UTF-8).

As en_US.UTF-8 locale might not be present, in case of user overriding to en, GRASS should try to use en_US.UTF-8, then C.UTF-8 and only then fail back to C (with appropriate warnings).

More details:
https://sourceware.org/bugzilla/show_bug.cgi?id=17318
https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
https://unix.stackexchange.com/questions/149111/what-should-i-set-my-locale-to-and-what-are-the-implications-of-doing-so/
https://www.python.org/dev/peps/pep-0538/
https://bugs.python.org/issue28180
https://bugs.python.org/issue19846
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=636086

Change History (17)

comment:1 Changed 2 years ago by marisn

Resolution: fixed
Status: assignedclosed

In 71729:

Try to use some of en_XX locales for en language.
As a failback, consider C.UTF-8 and only then plain C locale.
Attempts to fix #3441.

comment:2 Changed 2 years ago by marisn

Resolution: fixed
Status: closedreopened

Autoclosed by accident. Must keep open as needs backport to 7.4.0.

Needs testing on exotic locale setups and MS Windows. How to test:

  • start GRASS GIS, take a look at your data (r.category, vector attributes);
  • if all is fine, go to GUI preferences and change system language to en. Save settings and restart GRASS;
  • repeat the first step and report results here.

Works for me on system with lv_LV.UTF-8 with and without en_US locale available.

comment:3 in reply to:  1 ; Changed 2 years ago by mlennert

Replying to marisn:

In 71729:

Try to use some of en_XX locales for en language.
As a failback, consider C.UTF-8 and only then plain C locale.
Attempts to fix #3441.

I use the following in NC basic:

g.copy rast=landuse,lu
> r.category lu rules=- sep=comma << EOF
> 1,developpé
> 2,agriculture
> 3,herbacé
> 4,shrubland
> 5,forêt
> 6,water
> 7,sédiments
> EOF

When I set the preferences in the GUI to 'en', and relaunch GRASS I get the following when I launch g.gui:

17:50:19: Cannot set locale to language "English (U.S.)".
17:50:19: locale 'en_US' cannot be set.

and:

> locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en
LANGUAGE=en
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=C
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

When I launch r.category in the GUI I only get the following output

r.category map=lu@user1                                                         
2	agriculture
4	shrubland
6	water

and this traceback in the console:

Traceback (most recent call last):
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in
OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in
AddStyledMessage

self.AddTextWrapped(message, wrap=None)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in
AddTextWrapped

txt = EncodeString(txt)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/core/gcmd.py", line 97, in
EncodeString

return string.encode(_enc)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 9: ordinal
not in range(128)
Traceback (most recent call last):
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in
OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in
AddStyledMessage

self.AddTextWrapped(message, wrap=None)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in
AddTextWrapped

txt = EncodeString(txt)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/core/gcmd.py", line 97, in
EncodeString

return string.encode(_enc)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 8: ordinal
not in range(128)
Traceback (most recent call last):
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in
OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in
AddStyledMessage

self.AddTextWrapped(message, wrap=None)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in
AddTextWrapped

txt = EncodeString(txt)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/core/gcmd.py", line 97, in
EncodeString

return string.encode(_enc)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 5: ordinal
not in range(128)
Traceback (most recent call last):
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 472, in
OnCmdOutput

self.cmdOutput.AddStyledMessage(message, type)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 771, in
AddStyledMessage

self.AddTextWrapped(message, wrap=None)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/gui_core/goutput.py", line 720, in
AddTextWrapped

txt = EncodeString(txt)
  File
"/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-
linux-gnu/gui/wxpython/core/gcmd.py", line 97, in
EncodeString

return string.encode(_enc)
UnicodeDecodeError
:
'ascii' codec can't decode byte 0xc3 in position 3: ordinal
not in range(128)

comment:4 in reply to:  3 Changed 2 years ago by marisn

Replying to mlennert:

When I set the preferences in the GUI to 'en', and relaunch GRASS I get the following when I launch g.gui:

17:50:19: Cannot set locale to language "English (U.S.)".
17:50:19: locale 'en_US' cannot be set.

and:

> locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en
LANGUAGE=en
LC_CTYPE=en_US.UTF-8
LC_NUMERIC=C
LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

If you have set up language (locale) override in GUI, all magic (and thus also all problems) should be tied down to startup screen. If there where no warnings when starting GRASS, it means Python at least considers setting locale to be a success. Still further output indicates that it wasn't a success. Startup screen locale logic is simple – if Python can set locale, assume it to be good, if it fails –, it's bad (and for en fail back to C locale). This also *could* be an issue in wx thus testing with a more recent version could be an option: https://groups.google.com/forum/#!topic/wx-users/duTOQm_jPk4

I'll need output of

  • python --version
  • locale -a (outside of GRASS session)
  • cat /etc/locale.conf
  • cat /etc/locale.gen (only uncommented lines matter)
  • following python code:
    import locale
    
    print('Default locale: ', locale.getdefaultlocale())
    try:
        # This should fail
        locale.setlocale(locale.LC_ALL, 'en')
        print('Set to en success')
    except locale.Error as e:
        print('Set to en fail: %s' % e)
    
    norm = locale.normalize('en.UTF-8')
    print('Normalized version: %s' % norm)
    try:
        # This should work if en_US.UTF-8 locale is present
        locale.setlocale(locale.LC_ALL, norm)
        print('Set to normalized success')
    except locale.Error as e:
        print('Set normalized fail: %s' % e)
    
    import wx
    print('wx version: %s' % wx.version())
    
    

comment:5 in reply to:  3 ; Changed 2 years ago by mlennert

Replying to mlennert:

Replying to marisn:

In 71729:

Try to use some of en_XX locales for en language.
As a failback, consider C.UTF-8 and only then plain C locale.
Attempts to fix #3441.

I use the following in NC basic:

g.copy rast=landuse,lu
> r.category lu rules=- sep=comma << EOF
> 1,developpé
> 2,agriculture
> 3,herbacé
> 4,shrubland
> 5,forêt
> 6,water
> 7,sédiments
> EOF

When I set the preferences in the GUI to 'en', and relaunch GRASS I get the following when I launch g.gui:

17:50:19: Cannot set locale to language "English (U.S.)".
17:50:19: locale 'en_US' cannot be set.

This comes from the fact that en_US.UTF-8 was not installed on my machine. Installing it solves the issue. However, I think C.UTF-8 was installed by default (on Debian testing) and using that also solves the issue, so it might not be necessary to use en_US.UTF-8, if C.UTF-8 provides the same solution. I do not know, however, if C.UTF-8 is present on most machines by default, but en_US.UTF-8 is not.

For some reason, it does not detect the absence of en_US.UTF-8 early enough, and all locale variables are set to that locale.

Here the situation when I do not have en_US.UTF-8 installed:

locale -a
C
C.UTF-8
fr_BE.utf8
POSIX

In /etc/locale.gen, only fr_BE.utf8 is uncommented.

python --version
Python 2.7.14

And apparently Python translates C.UTF-8 automatically:

>>> import locale
>>> locale.getlocale()
(None, None)
>>> locale.setlocale(locale.LC_ALL, 'C.UTF-8')
'C.UTF-8'
>>> locale.getlocale()
('en_US', 'UTF-8')

comment:6 in reply to:  5 ; Changed 2 years ago by marisn

Replying to mlennert:

And apparently Python translates C.UTF-8 automatically:

This is a bug in Python. It means – there is nothing (reasonable*) we can do in GRASS till it gets fixed. https://bugs.python.org/issue30755

  • Unreasonable solution could be checking if Python sort is using C or en_US order and thus detecting affected Python versions.

Technical explanation: GRASS startup locale code is calling locale.setlocale() to set locale for current program. It should fail if locale is not available and then this locale is not used. If it succeeds, same locale is set to environmental variables to enable it for the rest of GRASS work session. In this case Python is magically translating 'C.UTF-8' to 'en_US.UTF-8', although 'en_US.UTF-8' is missing and any attempts to use it should cause a failure. As Python is lying on 'en_US.UTF-8' usability, there is no way* (see remark) how to determine it and all of my careful attempts to set only useable locale are lost.

comment:7 in reply to:  6 ; Changed 2 years ago by mlennert

Replying to marisn:

Replying to mlennert:

And apparently Python translates C.UTF-8 automatically:

This is a bug in Python. It means – there is nothing (reasonable*) we can do in GRASS till it gets fixed. https://bugs.python.org/issue30755

Thanks for finding this.

  • Unreasonable solution could be checking if Python sort is using C or en_US order and thus detecting affected Python versions.

No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as

Cannot set locale to language "English (U.S.)".
locale 'en_US' cannot be set.

does not really make it clear why it cannot be set...

comment:8 in reply to:  7 ; Changed 2 years ago by marisn

Replying to mlennert:

No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as does not really make it clear why it cannot be set...

Can you test this patch? I can not commit, as I have some other uncommitted changes in startup script. I'll commit in the evening from my home computer.

@@ -1160,6 +1168,18 @@
                 locale.setlocale(locale.LC_ALL, normalized)
             except locale.Error as e:
                 if language == 'en':
+                    # A workaround for Python Issue30755
+                    # https://bugs.python.org/issue30755
+                    if locale.normalize('C.UTF-8') == 'en_US.UTF-8':
+                        locale.setlocale(locale.LC_ALL, 'C')
+                        os.environ['LANGUAGE'] = 'C'
+                        os.environ['LANG'] = 'C'
+                        os.environ['LC_MESSAGES'] = 'C'
+                        os.environ['LC_NUMERIC'] = 'C'
+                        os.environ['LC_TIME'] = 'C'
+                        sys.stderr.write("To avoid Unicode errors in GUI, install en_US.UTF-8 locale and restart GRASS.\n"
+                        "Also consider upgrading your Python version to one containg fix for Python Issue 30755.\n")
+                        return
                     # en_US locale might be missing, still all messages in
                     # GRASS are already in en_US language.
                     # Using plain C as locale forces encodings to ascii

comment:9 in reply to:  8 ; Changed 2 years ago by marisn

Replying to marisn:

Replying to mlennert:

No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as does not really make it clear why it cannot be set...

Can you test this patch? I can not commit, as I have some other uncommitted changes in startup script. I'll commit in the evening from my home computer.

Committed as r71807. If works for you now, then needs backport (together with r71729).

comment:10 in reply to:  9 Changed 2 years ago by mlennert

Replying to marisn:

Replying to marisn:

Replying to mlennert:

No, the only thing we could do would be to output a message inviting the user to install the en_US.UTF-8 locale, as does not really make it clear why it cannot be set...

Can you test this patch? I can not commit, as I have some other uncommitted changes in startup script. I'll commit in the evening from my home computer.

Committed as r71807. If works for you now, then needs backport (together with r71729).

This works in that I don't have the error message about a missing en_US.UTF-8 anymore, but I'm back to point 0, as I don't see any output containing special characters (e.g. the r.category example).

My question was how widespread C.UTF-8 is as a locale present on machines. This would allow to do the following, which solves my problem:

Index: grass.py
===================================================================
--- grass.py	(révision 71807)
+++ grass.py	(copie de travail)
@@ -1163,12 +1163,12 @@
                     # A workaround for Python Issue30755
                     # https://bugs.python.org/issue30755
                     if locale.normalize('C.UTF-8') == 'en_US.UTF-8':
-                        locale.setlocale(locale.LC_ALL, 'C')
+                        locale.setlocale(locale.LC_ALL, 'C.UTF-8')
                         os.environ['LANGUAGE'] = 'C'
-                        os.environ['LANG'] = 'C'
-                        os.environ['LC_MESSAGES'] = 'C'
+                        os.environ['LANG'] = 'C.UTF-8'
+                        os.environ['LC_MESSAGES'] = 'C.UTF-8'
                         os.environ['LC_NUMERIC'] = 'C'
-                        os.environ['LC_TIME'] = 'C'
+                        os.environ['LC_TIME'] = 'C.UTF-8'
                         sys.stderr.write("To avoid Unicode errors in GUI, install en_US.UTF-8 locale and restart GRASS.\n"
                         "Also consider upgrading your Python version to one containg fix for Python Issue 30755.\n")
                         return

comment:11 Changed 2 years ago by mlennert

Actually, just setting LC_CTYPE seems enough, AFAIU:

Index: grass.py
===================================================================
--- grass.py	(révision 71807)
+++ grass.py	(copie de travail)
@@ -1166,6 +1166,7 @@
                         locale.setlocale(locale.LC_ALL, 'C')
                         os.environ['LANGUAGE'] = 'C'
                         os.environ['LANG'] = 'C'
+                        os.environ['LC_CTYPE'] = 'C.UTF-8'
                         os.environ['LC_MESSAGES'] = 'C'
                         os.environ['LC_NUMERIC'] = 'C'
                         os.environ['LC_TIME'] = 'C'

comment:12 in reply to:  11 ; Changed 2 years ago by marisn

Replying to mlennert:

Actually, just setting LC_CTYPE seems enough, AFAIU:

LC_CTYPE might be the only safe option here, as I was getting errors when trying to set other LC_ parameters to C.UTF-8 due to that Python bug (any Python call locale.setlocale(XX, 'C.UTF-8') will cause failure if en_US.UTF-8 is missing). But please check if your system really does not have en_US.UTF-8, as on my Ubuntu box I observed that this locale was present even when I uncommented it locale.gen file (locale-gen was not removing existing locale files).

comment:13 in reply to:  12 Changed 2 years ago by mlennert

Replying to marisn:

Replying to mlennert:

Actually, just setting LC_CTYPE seems enough, AFAIU:

LC_CTYPE might be the only safe option here, as I was getting errors when trying to set other LC_ parameters to C.UTF-8 due to that Python bug (any Python call locale.setlocale(XX, 'C.UTF-8') will cause failure if en_US.UTF-8 is missing). But please check if your system really does not have en_US.UTF-8, as on my Ubuntu box I observed that this locale was present even when I uncommented it locale.gen file (locale-gen was not removing existing locale files).

locale -a does not show en_US.UTF-8. Don't know if it might "hide" somewhere ;-)

comment:14 Changed 22 months ago by neteler

Milestone: 7.4.07.4.1

Ticket retargeted after milestone closed

comment:15 Changed 18 months ago by neteler

Milestone: 7.4.17.4.2

comment:16 Changed 14 months ago by neteler

What is the state of this ticket?

comment:17 Changed 14 months ago by marisn

Resolution: fixed
Status: reopenedclosed

I think we have done everything what is possible on the GRASS side to make it work. Closing as fixed.

Note: See TracTickets for help on using tickets.