Opened 8 months ago

Last modified 8 days ago

#3790 reopened enhancement

Cleanup gettext usage for python code

Reported by: pmav99 Owned by: grass-dev@…
Priority: normal Milestone: 7.8.2
Component: Python Version: svn-trunk
Keywords: Cc:
CPU: Unspecified Platform: Unspecified

Description

GRASS uses 3 translation domains:

  • grasslibs
  • grassmods
  • grasswxpy

Calling gettext.install() injects _() in the builtins namespace thus making it globally available. To make i18n work, we should only need to call gettext.install() once per translation domain.

The provided patch tries to simplify and centralize the usage of the gettext module in the python code. The main idea is that lib/python/__init__.py should be the one and only place where getetx.install() calls are being made. The direct consequence of this is that any python code that needs to use _() only needs to import the grass library first. This includes both the GUI code and the "scripts".

The patch is attached and can be applied with:

patch -p0 < gettext_cleanup.diff

The changes can also be browsed here.

In order to test this you need to:

  1. Enable a non-english language locale on your OS. The exact way you do that depends on your distribution. French and German are probably the best options since AFAIK they have the most complete translations, but any supported language should work.
  2. Configure GRASS with NLS --with-nls and re-compile. If NLS was not previously enabled you need to run make distclean before configuring/compiling.
  3. Start a GRASS session with LC_ALL=fr_FR.utf-8 ./bin.x86_64-pc-linux-gnu/grass77 (note you don't need to export any ENV variables; LC_ALL's purpose is specifically to override the values of all the other LC_* variables.source)
  4. your new session should now be in French. You should see the "text-based splash screen" in French. The command descriptions/options in french + the GUI in french. If not, that's a bug (or a missing translation!).

What has not been tested is the doctests. I would appreciate if someone could verify that they work and/or provide info on how to run them.

Attachments (2)

gettext_cleanup.diff (116.1 KB) - added by pmav99 8 months ago.
gettext_cleanup2.diff (116.5 KB) - added by pmav99 8 months ago.

Download all attachments as: .zip

Change History (14)

Changed 8 months ago by pmav99

Attachment: gettext_cleanup.diff added

comment:1 Changed 8 months ago by martinl

Milestone: 7.8.0

Changed 8 months ago by pmav99

Attachment: gettext_cleanup2.diff added

comment:2 Changed 8 months ago by pmav99

There were some conflicts after [74264]. I uploaded a new patch.

comment:3 Changed 8 months ago by neteler

Resolution: fixed
Status: newclosed

In 74307:

i18N: cleanup gettext usage for Python code (fixes #3790) (contributed by pmav99)

comment:4 Changed 8 months ago by wenzeslaus

The things in lib/python are potentially (and actually) used as Python modules by other applications or scripts. Thus from https://pymotw.com/2/gettext/index.html#module-localization, it is actually the second option which applies:

For a library, or individual module, modifying __builtins__ is not a good idea because you don’t know what conflicts you might introduce with an application global value. You can import or re-bind the names of translation functions by hand at the top of your module.

The doctests and interative command line are probably where the issues will happen first. See Django practices from https://docs.djangoproject.com/en/dev/topics/i18n/translation/ as linked in #1739:

Python’s standard library gettext module installs _() into the global namespace, as an alias for gettext(). In Django, we have chosen not to follow this practice, for a couple of reasons:

Sometimes, you should use gettext_lazy() as the default translation method for a particular file. Without _() in the global namespace, the developer has to think about which is the most appropriate translation function.

The underscore character (_) is used to represent “the previous result” in Python’s interactive shell and doctest tests. Installing a global _() function causes interference. Explicitly importing gettext() as _() avoids this problem.

To run doctests, use

python -m doctest -v example.py

or (often already included in code):

if __name__ == "__main__":
    import doctest
    doctest.testmod()

both taken from https://docs.python.org/2/library/doctest.html, so the question is if you have an particular issue. Another things to test are Python in command line, Python console in GUI, and Jupyter Notebook. (But the doctest are where the issues definitively emerged in the past.)

Please note that I never understood all the details of #1739 as noted there, so maybe we have things to investigate more. Also, try to examine do_doctest_gettext_workaround() which doctests need when the builin is modified or perhaps even in other cases.

Finally, is there any overhead in centralizing the tr string installation? So far we were relatively successful in keeping the GUI separate (whether for performance or code organization reasons).

comment:5 in reply to:  4 ; Changed 8 months ago by pmav99

Disclaimer: The real goal of this refactoring was to get rid of os.getenv("GISBASE") calls at the module level from all over the place. This is a prerequisite for #3772. The scripts and the GUI code didn't really need to be touched, but I did them too for uniformity.

Replying to wenzeslaus:

The things in lib/python are potentially (and actually) used as Python modules by other applications or scripts. Thus from https://pymotw.com/2/gettext/index.html#module-localization, it is actually the second option which applies:

For a library, or individual module, modifying __builtins__ is not a good idea because you don’t know what conflicts you might introduce with an application global value. You can import or re-bind the names of translation functions by hand at the top of your module.

I really like it when things are explicit. I dislike the idea of injecting things into builtins and I do found the convention of using _() as an alias for gettext.gettext() rather unfortunate. Consequently, I don't have any objections with getting rid of gettext.install() and/or even _().

Regardless, I am not sure that GRASS was actually following Doug Helman's advice. More specifically, almost all GRASS modules used to import the grass library before actually calling gettext.install() on their own (source). The grass library was already calling gettext.install (source). grass.temporal was doing the same for no apparent reason though, since it was just re-registering the "grasslibs" domain (source). Anyway, the end result is that most of the time, if not always, as soon as you imported the grass lib, the builtins were modified anyway.

In other words, I don't think that the proposed patch really changes what the grass library was doing. The builtin injection continues to happen just like it used before.

That being said, I am completely open to any improvements. I am also perfectly fine with following Mr. Helman's advice (i.e. by adding explicit imports in each and every grass lib module) if someone wants to contribute that. But IMHV the proposed implementation is cleaner than the previous one.

So far we were relatively successful in keeping the GUI separate (whether for performance or code organization reasons).

I don't really see any coupling of the GUI code with the grass library. All the grass library does is registering the "grasswxpy" domain. It doesn't import anything.

TBH, I did consider moving the gettext.install("grasswxpy", ...) code inside ./gui/wxpython, but the way it is structured doesn't allow to do that cleanly and I didn't want to make changes there too. It is doable though.

Finally, is there any overhead in centralizing the tr string installation?

I haven't measured it. Obviously there is some overhead. Previously when you were importing grass for the first time in a process, you were registering 2 domains. "grasslibs" + "grassmods" when using the CLI and "grasslibs" + "grasswxpy" when using the GUI. Now 3 domains are being registered. So there is overhead.

TBH I haven't read the whole gettext module line by line, but I think that the gist of gettext.install() is these lines. Line 552 calls the method that does the builtin injection (check lines 314-316) while line 551 calls the function that registers the domain in a "private" dictionary named _translations. AFAI can tell from just reading the code, the only overhead to really speak of is the parsing of the *.mo files which in GRASS are a few hundred KBs. On SSDs this overhead should be negligible while on mechanical drives the bottleneck is not going to be the code anyway. Re-registering a domain should incur any overhead since the results are cached.

That being said, if you feel that it is necessary, it should be possible to avoid registering "grasswxpy" and "grassmods" inside grass lib. E.g. by adding a couple of functions in ./lib/python/__init__.py:

# ...
_LOCALE_DIR = os.path.join(os.getenv("GISBASE"), 'locale')
gettext.install('grasslibs', _LOCALE_DIR)


def register_grassmods():
    gettext.install('grassmods', _LOCALE_DIR)


def register_grasswxpy():
    gettext.install('grasswxpy', _LOCALE_DIR)



and making sure that they are imported by each and every module that needs them. But I am curious if the impact is that significant to make it worth it.

Last edited 8 months ago by pmav99 (previous) (diff)

comment:6 in reply to:  3 Changed 7 months ago by neteler

Resolution: fixed
Status: closedreopened

Replying to neteler:

In 74307:

i18N: cleanup gettext usage for Python code (fixes #3790) (contributed by pmav99)

According to #3838 now wxGUI is broken... reopening.

comment:7 Changed 6 months ago by neteler

Here, how to see the (or one of the) errors easily:

GRASS 7.7.svn (nc_spm_08_grass7):~ > d.mon wx0
GRASS 7.7.svn (nc_spm_08_grass7):~ > Traceback (most recent call last):
  File "/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/mapdisp/main.py", line 47, in <module>
    from core.render import Map, MapLayer, Overlay, RenderMapMgr
  File "/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/core/render.py", line 42, in <module>
    from core.ws import RenderWMSMgr
  File "/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/core/ws.py", line 33, in <module>
    from core.gthread import gThread
  File "/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/core/gthread.py", line 28, in <module>
    from core.gconsole import EVT_CMD_DONE, wxCmdDone
  File "/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/core/gconsole.py", line 49, in <module>
    from gui_core.forms import GUI
  File "/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/gui_core/forms.py", line 98, in <module>
    from gui_core.widgets import StaticWrapText, ScrolledPanel, ColorTablesComboBox, \
  File "/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/gui_core/widgets.py", line 90, in <module>
    from core.utils import _
ImportError: cannot import name '_' from 'core.utils' (/home/mneteler/software/grass77/dist.x86_64-pc-linux-gnu/gui/wxpython/core/utils.py)

comment:8 Changed 6 months ago by pmav99

Are you sure this is at trunk Markus? Since 74307 gui_core/widgets.py shouldn't be trying to import _.

comment:9 in reply to:  8 Changed 6 months ago by neteler

Replying to pmav99:

Are you sure this is at trunk Markus? Since 74307 gui_core/widgets.py shouldn't be trying to import _.

Oh, apparently a old compilation leftover. After "make distclean" and recompilation it works (right now using Python 2). Sorry for the "d.mon" noise.

comment:10 in reply to:  5 Changed 6 months ago by wenzeslaus

Replying to pmav99:

Disclaimer: The real goal of this refactoring was to get rid of os.getenv("GISBASE") calls at the module level from all over the place. This is a prerequisite for #3772. The scripts and the GUI code didn't really need to be touched, but I did them too for uniformity.

You will have to help me here. Why is the relative path solution from #3772 not applicable to the explicit import approach?

In relation to that, since #3772 is about using grass as a standard Python package, I think it should follow the "Helman's advice" for a Python package.

Replying to wenzeslaus:

The things in lib/python are potentially (and actually) used as Python modules by other applications or scripts. Thus from https://pymotw.com/2/gettext/index.html#module-localization, it is actually the second option which applies:

For a library, or individual module, modifying __builtins__ is not a good idea because you don’t know what conflicts you might introduce with an application global value. You can import or re-bind the names of translation functions by hand at the top of your module.

...

I am not sure that GRASS was actually following Doug Helman's advice. More specifically, almost all GRASS modules used to import the grass library before actually calling gettext.install() on their own (source). ... Anyway, the end result is that most of the time, if not always, as soon as you imported the grass lib, the builtins were modified anyway.

Please read #1739 and #2425 linked from there. I explain that the wxGUI is using the practice while the library (lib/python) is not and it should.

That being said, I am completely open to any improvements. I am also perfectly fine with following Mr. Helman's advice (i.e. by adding explicit imports in each and every grass lib module) if someone wants to contribute that. But IMHV the proposed implementation is cleaner than the previous one.

Cleaner in which way? I think the registration of the domains separately and all at once is a separate issue from "Helman's advice" (which is notably also the way which Django is following).

So far we were relatively successful in keeping the GUI separate (whether for performance or code organization reasons).

I don't really see any coupling of the GUI code with the grass library. All the grass library does is registering the "grasswxpy" domain. It doesn't import anything.

It is not just the imports. The idea is that you should be able to install GRASS GIS as command line only application without any GUI parts. But there is more than one way to deal with this. If the registration of the domain can just fail silently, that's probably good enough in this particular case. However, as I said, this is a separate issue.

What is crucial here are the explicit imports versus using builtin. There were issues with builtin solution in the past (hence #1739), builtin _ currently causes issues (doctests, static code linting), builtin solution is not considered good practice for libraries (Helman, Django) and the change breaks wxGUI according to #3838. Thus, I suggest reverting it (at least the builtin part) and finding a different solution for #3772.

comment:11 Changed 2 months ago by neteler

Milestone: 7.8.07.8.1

Ticket retargeted after milestone closed

comment:12 Changed 8 days ago by neteler

Milestone: 7.8.17.8.2

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.