Opened 7 years ago

Closed 5 years ago

#3392 closed defect (fixed)

t.register: encoding error

Reported by: mlennert Owned by: grass-dev@…
Priority: normal Milestone: 7.2.4
Component: Temporal Version: svn-trunk
Keywords: t.register encoding Cc:
CPU: Unspecified Platform: Unspecified

Description

Using the data and examples from 1 in a french (fr_BE@UTF-8) encoding environment, I get an encoding error using t.register:

g.list type=raster pattern="*tempmean" output=fichier_cartes_tempmean
t.create output=pluies_nc semantictype=sum title=precipitation_mois description="Précipitation totale mensuelle NC"
t.register -i input=pluies_nc@climate_2000_2012 file=fichier_cartes_tempmean start=2000-01-01 increment="1 months"
Collecte des informations sur la carte ...
 100%
Registering maps in the temporal database...
Registering maps in the space time dataset...
Updating space time dataset...
ERREUR :ascii,Mettre à jour les métadonnées, l'emprise spatiale et
        temporelle de toutes les cartes enregistrées de <,7,8,ordinal not
        in range(128)

And when I run t.register again:

t.register -i input=pluies_nc@climate_2000_2012 file=fichier_cartes_tempmean start=2000-01-01 increment="1 months"
Collecte des informations sur la carte ...
ATTENTION: Map is already registered in temporal database. Unable to update
           raster map <2000_01_tempmean@climate_2000_2012>. Overwrite flag
           is not set.
[...]
ATTENTION: Map is already registered in temporal database. Unable to update
           raster map <2012_12_tempmean@climate_2000_2012>. Overwrite flag
           is not set.
 100%
Registering maps in the space time dataset...
ERREUR :ascii,%s> est déjà enregistrée.,9,10,ordinal not in range(128)

ould be supercool if this could still be solved for 7.2.2 :-)

Attachments (5)

temporal.patch (85.4 KB ) - added by zarch 7 years ago.
from str() => unicode()
temporal_encoding.diff (90.8 KB ) - added by zarch 7 years ago.
Convert str => encoding
temporal_script_utils_encoding.diff (1003 bytes ) - added by zarch 7 years ago.
Convert value to bytes/unicode if is not already
pygrass_temporal_gis_utf_handling.diff (17.2 KB ) - added by huhabla 7 years ago.
Approach to fix the handling of utf-8 strings and translations in the temporal framework
temporal_encode.patch (93.9 KB ) - added by zarch 7 years ago.
Transform every variable to bytes

Download all attachments as: .zip

Change History (47)

by zarch, 7 years ago

Attachment: temporal.patch added

from str() => unicode()

in reply to:  description ; comment:1 by zarch, 7 years ago

Replying to mlennert:

Using the data and examples from 1 in a french (fr_BE@UTF-8) encoding environment, I get an encoding error using t.register:

I was able to reproduce the error, actually the #3394, and with the attached patch is working on my PC. Can you test it?

by zarch, 7 years ago

Attachment: temporal_encoding.diff added

Convert str => encoding

by zarch, 7 years ago

Convert value to bytes/unicode if is not already

in reply to:  1 ; comment:2 by zarch, 7 years ago

Replying to zarch:

Replying to mlennert:

Using the data and examples from 1 in a french (fr_BE@UTF-8) encoding environment, I get an encoding error using t.register:

I was able to reproduce the error, actually the #3394, and with the attached patch is working on my PC. Can you test it?

I remembered that once Glynn in grass-dev (2016-05-05) said:

"Anything using the script module should be using byte strings throughout (in spite of how awkward Python 3 tries to make this)."

So I think that instead of returning unicode as I did in the first patch we should return bytes... Please apply the two diff files, and let me know if it is working for you.

in reply to:  2 comment:3 by huhabla, 7 years ago

Dear Pietro, thanks a lot for taking care of this issue. Unfortunately i have no clue howto fix this. If your patch works well, i would like to put it in trunk for more tests.

Replying to zarch:

Replying to zarch:

Replying to mlennert:

Using the data and examples from 1 in a french (fr_BE@UTF-8) encoding environment, I get an encoding error using t.register:

I was able to reproduce the error, actually the #3394, and with the attached patch is working on my PC. Can you test it?

I remembered that once Glynn in grass-dev (2016-05-05) said:

"Anything using the script module should be using byte strings throughout (in spite of how awkward Python 3 tries to make this)."

So I think that instead of returning unicode as I did in the first patch we should return bytes... Please apply the two diff files, and let me know if it is working for you.

in reply to:  2 ; comment:4 by mlennert, 7 years ago

Replying to zarch:

Replying to zarch:

Replying to mlennert:

Using the data and examples from 1 in a french (fr_BE@UTF-8) encoding environment, I get an encoding error using t.register:

I was able to reproduce the error, actually the #3394, and with the attached patch is working on my PC. Can you test it?

I remembered that once Glynn in grass-dev (2016-05-05) said:

"Anything using the script module should be using byte strings throughout (in spite of how awkward Python 3 tries to make this)."

So I think that instead of returning unicode as I did in the first patch we should return bytes... Please apply the two diff files, and let me know if it is working for you.

I've applied your patches (BTW: it is always best to make diffs relative to the source tree root. That way one doesn't have to search for where the patched files are...).

t.create output=pluies_nc semantictype=sum title=precipitation_mois description="Précipitation totale mensuelle NC"Traceback (most recent call last):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.create", line 86, in <module>
    main()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.create", line 82, in main
    semantic, None, grass.overwrite())
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/open_stds.py", line 174, in open_new_stds
    sp.insert(dbif)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_space_time_dataset.py", line 381, in insert
    statement += AbstractDataset.insert(self, dbif=dbif, execute=False)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_dataset.py", line 398, in insert
    statement += self.metadata.get_insert_statement_mogrified(dbif)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/base.py", line 404, in get_insert_statement_mogrified
    mapset=self.mapset)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 946, in mogrify_sql_statement
    return self.connections[mapset].mogrify_sql_statement(content)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 1216, in mogrify_sql_statement
    unicode(args[count]),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

However:

t.create output=pluies_nc semantictype=sum title=precipitation_mois description="Precipitation totale mensuelle NC"

Note the accent in the description of the first try, but not the second. This wasn't a problem before applying your patch.

Then

t.register -i input=pluies_nc@climate_2000_2012 file=fichier_cartes_tempmean start=2000-01-01 increment="1 months"
Collecte des informations sur la carte ...
 100%
Enregistrement des cartes dans la base temporelle ...
Enregistrement des cartes dans le jeu de données temporel ...
Mise à jour du jeu de données temporel ...
ERREUR :ascii,Mettre à jour les métadonnées, l'emprise spatiale et
        temporelle de toutes les cartes enregistrées de <,7,8,ordinal not
        in range(128)

And :

t.unregister file=fichier_cartes_tempmean 
Unregister maps
 100%
Unregister maps from the temporal database
Traceback (most recent call last):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 183, in <module>
    tgis.profile_function(main)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 85, in profile_function
    func()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 169, in main
    sp.update_from_registered_maps(dbif)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_space_time_dataset.py", line 2295, in update_from_registered_maps
    " all registered maps of <%s>") % (self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

And:

t.remove pluies_nc -fNote: registered maps themselves have not been removed, only the strds
Traceback (most recent call last):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.remove", line 171, in <module>
    tgis.profile_function(main)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 85, in profile_function
    func()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.remove", line 157, in main
    statement += sp.delete(dbif=dbif, execute=False)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_space_time_dataset.py", line 1959, in delete
    self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)

If this is a fundamental issue in the temporal modules then we should probably keep the discussion centralized here. I'll close the other bug reports as duplicates of this one.

t.info (#3394) and t.rast.extract (#3395) now work, but I suspect that the errors there came from the accent in the description of the strds.

in reply to:  4 ; comment:5 by zarch, 7 years ago

Replying to mlennert:

Replying to zarch:

So I think that instead of returning unicode as I did in the first patch we should return bytes... Please apply the two diff files, and let me know if it is working for you.

I've applied your patches (BTW: it is always best to make diffs relative to the source tree root. That way one doesn't have to search for where the patched files are...).

Ops, sorry I did not notice that the diff was sensitive to my current working directory.


t.info (#3394) and t.rast.extract (#3395) now work, but I suspect that the errors there came from the accent in the description of the strds.

Indeed the error was due to é character in the description. I was able to reproduce only the error with t.info. I added a test r71391.


I'm not able to reproduce these errors in: t.create, t.register, t.unregister, t.remove.

I executed the testsuite in grass/lib/python/temporal and grass/temporal with no regressions. But, since I was able to reproduce only the t.info error, it would be nice, before do the commit, to add tests in: t.create, t.register, t.unregister, t.remove that problem.

At the moment I'm not able to write the tests for them, someone can help me?

comment:6 by huhabla, 7 years ago

Moritz, can you please provide more information about the errors? Some of them occurred by calling self.get_id(). The ascii conversion of ids that are build using "name@mapset" produces an error. Does the map or the mapset contain special characters?

in reply to:  5 comment:7 by mlennert, 7 years ago

Replying to zarch:

I'm not able to reproduce these errors in: t.create, t.register, t.unregister, t.remove.

Ok, here I go again (leaving the stdrs desciption issue aside for the moment, so t.create is not an issue), with a completely fresh svn checkout of trunk, applied your two patches, 'temporal_encoding.diff​' and 'temporal_script_utils_encoding.diff​', compiled.

Get the demo location from the FOSS4G 2014 workshop site. Use the climate_2000_2012 mapset.

GRASS 7.3.svn (NC_spm_temporal_workshop):~ > g.version -g
version=7.3.svn
date=2017
revision=r71392M
build_date=2017-08-11
build_platform=x86_64-pc-linux-gnu
build_off_t_size=8
GRASS 7.3.svn (NC_spm_temporal_workshop):~ > locale
LANG=fr_BE
LANGUAGE=fr_BE
LC_CTYPE=fr_BE.UTF-8
LC_NUMERIC=C
LC_TIME=fr_BE.UTF-8
LC_COLLATE=fr_BE.UTF-8
LC_MONETARY=fr_BE.UTF-8
LC_MESSAGES=fr_BE.UTF-8
LC_PAPER=fr_BE.UTF-8
LC_NAME=fr_BE.UTF-8
LC_ADDRESS=fr_BE.UTF-8
LC_TELEPHONE=fr_BE.UTF-8
LC_MEASUREMENT=fr_BE.UTF-8
LC_IDENTIFICATION=fr_BE.UTF-8
LC_ALL=

Then follow the steps described there (with light modifications for ease):

t.create output=tempmean type=strds temporaltype=absolute title="Average temperature" description="Monthly temperature average in NC [deg C]"
g.list type=raster pattern="*tempmean" > rasters_tempmean.list
t.register -i input=tempmean type=raster start=2000-01-01 increment="1 months" file=rasters_tempmean.list 
Collecte des informations sur la carte ...
 100%
Enregistrement des cartes dans la base temporelle ...
Enregistrement des cartes dans le jeu de données temporel ...
Mise à jour du jeu de données temporel ...
ERREUR :ascii,Mettre à jour les métadonnées, l'emprise spatiale et
        temporelle de toutes les cartes enregistrées de <,7,8,ordinal not
        in range(128)

t.list and t.list raster seem to show that everything is alright, but IIUC the output of t.info and t.topology seems to indicate that something went wrong:

t.info tempmean@climate_2000_2012
 +-------------------- Space Time Raster Dataset -----------------------------+
 |                                                                            |
 +-------------------- Basic information -------------------------------------+
 | Id: ........................ tempmean@climate_2000_2012
 | Name: ...................... tempmean
 | Mapset: .................... climate_2000_2012
 | Creator: ................... mlennert
 | Temporal type: ............. absolute
 | Creation time: ............. 2017-08-11 18:12:14.819514
 | Modification time:.......... 2017-08-11 18:12:14.819546
 | Semantic type:.............. mean
 +-------------------- Absolute time -----------------------------------------+
 | Start time:................. None
 | End time:................... None
 | Granularity:................ None
 | Temporal type of maps:...... None
 +-------------------- Spatial extent ----------------------------------------+
 | North:...................... None
 | South:...................... None
 | East:.. .................... None
 | West:....................... None
 | Top:........................ None
 | Bottom:..................... None
 +-------------------- Metadata information ----------------------------------+
 | Raster register table:...... raster_map_register_031cdb28ede04a64b03d115101b86c0b
 | North-South resolution min:. None
 | North-South resolution max:. None
 | East-west resolution min:... None
 | East-west resolution max:... None
 | Minimum value min:.......... None
 | Minimum value max:.......... None
 | Maximum value min:.......... None
 | Maximum value max:.......... None
 | Aggregation type:........... None
 | Number of registered maps:.. None
 |
 | Title:
 | Average temperature
 | Description:
 | Monthly temperature average in NC [deg C]
 | Command history:
 | # 2017-08-11 18:12:14 
 | t.create output="tempmean" type="strds"
 |     temporaltype="absolute" title="Average temperature"
 |     description="Monthly temperature average in NC [deg C]"
 | 
 +----------------------------------------------------------------------------+
t.topology tempmean@climate_2000_2012
 +-------------------- Basic information -------------------------------------+
 | Id: ........................ tempmean@climate_2000_2012
 | Name: ...................... tempmean
 | Mapset: .................... climate_2000_2012
 | Creator: ................... mlennert
 | Temporal type: ............. absolute
 | Creation time: ............. 2017-08-11 18:12:14.819514
 | Modification time:.......... 2017-08-11 18:12:14.819546
 | Semantic type:.............. mean
 +-------------------- Temporal topology -------------------------------------+
 | Is subset of dataset: ...... False
 | Temporal topology is: ...... invalid
 | Number of intervals: ....... 156
 | Invalid time stamps: ....... 0
 | Number of points: .......... 0
 | Number of gaps: ............ 0
 | Granularity: ............... 1 month
 +-------------------- Topological relations ---------------------------------+
 | Overlaps: .................. 0
 | Overlapped: ................ 0
 | Finishes: .................. 0
 | Started: ................... 0
 | Follows: ................... 155
 | Contains: .................. 0
 | Equal:...................... 0
 | Finished: .................. 0
 | Precedes: .................. 155
 | Starts: .................... 0
 | During: .................... 0
 +----------------------------------------------------------------------------+
t.remove -f tempmean@climate_2000_2012
Note: registered maps themselves have not been removed, only the strds
Traceback (most recent call last):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.remove", line 171, in <module>
    tgis.profile_function(main)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 85, in profile_function
    func()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.remove", line 157, in main
    statement += sp.delete(dbif=dbif, execute=False)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_space_time_dataset.py", line 1959, in delete
    self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128)
t.unregister file=rasters_tempmean.list 
Unregister maps
 100%
Unregister maps from the temporal database
Traceback (most recent call last):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 183, in <module>
    tgis.profile_function(main)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 85, in profile_function
    func()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 169, in main
    sp.update_from_registered_maps(dbif)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_space_time_dataset.py", line 2295, in update_from_registered_maps
    " all registered maps of <%s>") % (self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

So, I still have exactly the same issues. My guess is that most of them come from the fact that the message strings are flagged as translatable and that in the French translation there are accents... The t.register one I find more difficult to understand.

in reply to:  6 comment:8 by mlennert, 7 years ago

Replying to huhabla:

Moritz, can you please provide more information about the errors? Some of them occurred by calling self.get_id(). The ascii conversion of ids that are build using "name@mapset" produces an error. Does the map or the mapset contain special characters?

No, as I said, I use a demo data set. I'm pretty sure it's an issue with the handling of message translations.

comment:9 by mlennert, 7 years ago

Maybe the solution needs to be found in lib/python/pygrass/messages ?

by huhabla, 7 years ago

Approach to fix the handling of utf-8 strings and translations in the temporal framework

in reply to:  9 ; comment:10 by huhabla, 7 years ago

Replying to mlennert:

Maybe the solution needs to be found in lib/python/pygrass/messages ?

Good point. I have tried to fix this issue and attached the "pygrass_temporal_gis_utf_handling.diff​" that must be applied to fresh trunk. Can you please test the changes? I am not able to convince my GRASS installation to use the German locale that i have set, so its a kind of dry test.

https://trac.osgeo.org/grass/raw-attachment/ticket/3392/pygrass_temporal_gis_utf_handling.diff

in reply to:  10 comment:11 by mlennert, 7 years ago

Replying to huhabla:

Replying to mlennert:

Maybe the solution needs to be found in lib/python/pygrass/messages ?

Good point. I have tried to fix this issue and attached the "pygrass_temporal_gis_utf_handling.diff​" that must be applied to fresh trunk. Can you please test the changes?

They don't seem to work:

t.create output=tempmean type=strds temporaltype=absolute title="Average temperature" description="Monthly temperature average in NC [deg C]"
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/pygrass/messages/__init__.py", line 95, in message_server
    libgis.G_verbose_message(message.encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

and, even worse:

t.register -i input=tempmean type=raster start=2000-01-01 increment="1 months" file=rasters_tempmean.list 
Collecte des informations sur la carte ...
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/pygrass/messages/__init__.py", line 93, in message_server
    libgis.G_debug(level, message.encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
ATTENTION: Needed to restart the messenger server
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/pygrass/messages/__init__.py", line 93, in message_server
    libgis.G_debug(level, message.encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
ATTENTION: Needed to restart the messenger server
Process Process-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/pygrass/messages/__init__.py", line 93, in message_server
    libgis.G_debug(level, message.encode("utf-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
ATTENTION: Needed to restart the messenger server
[...]

I'm also not sure whether using u' strings is the solution, nor how portable this is between Python 2 and 3. Anna should probably have a look at this as she has quite some experience with the encoding issues...

I am not able to convince my GRASS installation to use the German locale that i have set, so its a kind of dry test.

You can use the GUI: Settings->Preferences->Appearance->Language settings.

However: looking at grasslibs_de.po I have the feeling that most temporal lib strings have not been translated to German, yet. Maybe these strings have only recently been translated to French, making the errors appear now ?

comment:12 by annakrat, 7 years ago

The problem mostly comes from calling gisenv function from script.utils which returns unicode. This was done some time ago to make the library Python3 compatible. So we eventually need to move towards unicode, but to have a quick fix for the release, we can keep the temporal library using bytestrings. This is a possible solution:

Index: lib/python/temporal/core.py
===================================================================
--- lib/python/temporal/core.py	(revision 71395)
+++ lib/python/temporal/core.py	(working copy)
@@ -542,9 +542,9 @@
     grassenv = gscript.gisenv()
 
     # Set the global variable for faster access
-    current_mapset = grassenv["MAPSET"]
-    current_location = grassenv["LOCATION_NAME"]
-    current_gisdbase = grassenv["GISDBASE"]
+    current_mapset = gscript.encode(grassenv["MAPSET"])
+    current_location = gscript.encode(grassenv["LOCATION_NAME"])
+    current_gisdbase = gscript.encode(grassenv["GISDBASE"])
 
     # Check environment variable GRASS_TGIS_RAISE_ON_ERROR
     if os.getenv("GRASS_TGIS_RAISE_ON_ERROR") == "True" or \
@@ -1107,6 +1107,7 @@
                         detect_types=self.dbmi.PARSE_DECLTYPES | self.dbmi.PARSE_COLNAMES)
                 self.connection.row_factory = self.dbmi.Row
                 self.connection.isolation_level = None
+                self.connection.text_factory = str
                 self.cursor = self.connection.cursor()
                 self.cursor.execute("PRAGMA synchronous = OFF")
                 self.cursor.execute("PRAGMA journal_mode = MEMORY")

plus all gisenv calls in temporal modules need to be encoded to not let any unicode into the library. I would appreciate if somebody could test it more thoroughly. This would be a temporary solution, we need to move to unicode eventually, but I think that would require more time to fix and more significant changes.

Also, this works only for sqlite3 backend, I haven't looked at postgres...

in reply to:  12 ; comment:13 by zarch, 7 years ago

Hi Anna,

Replying to annakrat:

The problem mostly comes from calling gisenv function from script.utils which returns unicode. This was done some time ago to make the library Python3 compatible. So we eventually need to move towards unicode, but to have a quick fix for the release, we can keep the temporal library using bytestrings.

I didn't have time to test your solution yet.

Actually for me it is still not clear where should we use unicode and where bytes. Do you have some general guidelines?

In the changes that I did, I transformed everything in bytes, and now the tests that I added to reproduce the ticket's error are passing, so the ticket is "theoretically" fixed. However this solution can only work with python2, because in the temporal library now I'm summing bytes with unicode, that can work only with python2 that does not distinguish between them. In python3 this operation raise an error. In general due to bytes limitation in python3 (mainly there is not format) perhaps we should use just unicode within the python code...

Therefore I was thinking to replace all the grass.script.utils encode function that I added in grass/lib/temporal with decode to ensure that everything is unicode. And then replace the print function with pygrass.messages.Messager to eventually translate all the python unicode back to bytes just before printing them to system terminal.

So I'm not sure when is the right moment to transform unicode to bytes.

by zarch, 7 years ago

Attachment: temporal_encode.patch added

Transform every variable to bytes

in reply to:  12 ; comment:14 by mlennert, 7 years ago

Replying to annakrat:

The problem mostly comes from calling gisenv function from script.utils which returns unicode.

So you are saying that this has nothing to do with translated message strings ?

This was done some time ago to make the library Python3 compatible. So we eventually need to move >towards unicode, but to have a quick fix for the release, we can keep the temporal library using >bytestrings. This is a possible solution:

Index: lib/python/temporal/core.py
===================================================================
--- lib/python/temporal/core.py	(revision 71395)
+++ lib/python/temporal/core.py	(working copy)
@@ -542,9 +542,9 @@
     grassenv = gscript.gisenv()
 
     # Set the global variable for faster access
-    current_mapset = grassenv["MAPSET"]
-    current_location = grassenv["LOCATION_NAME"]
-    current_gisdbase = grassenv["GISDBASE"]
+    current_mapset = gscript.encode(grassenv["MAPSET"])
+    current_location = gscript.encode(grassenv["LOCATION_NAME"])
+    current_gisdbase = gscript.encode(grassenv["GISDBASE"])
 
     # Check environment variable GRASS_TGIS_RAISE_ON_ERROR
     if os.getenv("GRASS_TGIS_RAISE_ON_ERROR") == "True" or \
@@ -1107,6 +1107,7 @@
                         detect_types=self.dbmi.PARSE_DECLTYPES | self.dbmi.PARSE_COLNAMES)
                 self.connection.row_factory = self.dbmi.Row
                 self.connection.isolation_level = None
+                self.connection.text_factory = str
                 self.cursor = self.connection.cursor()
                 self.cursor.execute("PRAGMA synchronous = OFF")
                 self.cursor.execute("PRAGMA journal_mode = MEMORY")

Just the line

self.connection.text_factory = str

already suffices to solve the above-mentioned issues (and I've gone a bit further in the tutorial and tested a series of other t.* modules, but haven't encountered any other error) but it's better to also protect against non-ascii gisdbase variables which wasn't an issue here for me.

Thanks Anna !!

I have to admit that I have no idea how this works. Some form of documentation of the entire system would be very helpful, in line with what Pietro said.

plus all gisenv calls in temporal modules need to be encoded to not let any unicode into the library. I would appreciate if somebody could test it more thoroughly. This would be a temporary solution, we need to move to unicode eventually, but I think that would require more time to fix and more significant changes.

Can we at least already commit this small change which already makes a big difference ?

Also, this works only for sqlite3 backend, I haven't looked at postgres...

How does the database backend come in when dealing with these strings ?

Moritz

in reply to:  13 comment:15 by mlennert, 7 years ago

Replying to zarch:

Hi Anna,

Replying to annakrat:

The problem mostly comes from calling gisenv function from script.utils which returns unicode. This was done some time ago to make the library Python3 compatible. So we eventually need to move towards unicode, but to have a quick fix for the release, we can keep the temporal library using bytestrings.

I didn't have time to test your solution yet.

Actually for me it is still not clear where should we use unicode and where bytes. Do you have some general guidelines?

In the changes that I did, I transformed everything in bytes, and now the tests that I added to reproduce the ticket's error are passing, so the ticket is "theoretically" fixed.

Using only the temporal_encode.patch​ you uploaded this morning and applying it to a fresh svn tree, I still get the same error with t.register and the weird output of t.info and t.topology. t.remove and t.unregister do not throw an error when run in that order, but when I run t.unregister before running t.remove I get the same error.

Also, there is a lot of code such as this:

 print("nsres_min=" + encode(self.get_nsres_min()))
 print("nsres_max=" + encode(self.get_nsres_max()))
 print("ewres_min=" + encode(self.get_ewres_min()))
 print("ewres_max=" + encode(self.get_ewres_max()))

and

string += encode(id) + fs + encode(start) + fs + encode(end)
string += fs + encode(stats["mean"]) + fs + encode(stats["min"])
string += fs + encode(stats["max"]) + fs + encode(stats["mean_of_abs"])
string += fs + encode(stats["stddev"]) + fs + encode(stats["variance"])
string += fs + encode(stats["coeff_var"]) + fs + encode(stats["sum"])
string += fs + encode(stats["null_cells"]) + fs + encode(stats["cells"])

All the info transformed to strings here are numeric values that come from within the module. I don't think we should ever encode such values, or ?

in reply to:  14 comment:16 by mlennert, 7 years ago

Replying to mlennert:

Just the line

self.connection.text_factory = str

already suffices to solve the above-mentioned issues

But it only works with Python2. With Python3, I get:

python --version
Python 3.5.4rc1

t.create output=tempmean type=strds temporaltype=absolute title="Average temperature" description="Monthly temperature average in NC [deg C]"
Traceback (most recent call last):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.create", line 86, in <module>
    main()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.create", line 68, in main
    import grass.temporal as tgis
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/__init__.py", line 3, in <module>
    from .core import *
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 44, in <module>
    from .c_libraries_interface import *
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/c_libraries_interface.py", line 27, in <module>
    from grass.pygrass.rpc.base import RPCServerBase
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/pygrass/rpc/__init__.py", line 21, in <module>
    from grass.pygrass.vector import *
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/pygrass/vector/__init__.py", line 6, in <module>
    libgis.G_gisinit('')
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/lib/gis.py", line 3553, in G_gisinit
    return (G__gisinit (GIS_H_VERSION, pgm))
ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type

But then again, the same happens without Anna's patch, so this is a different issue.

That's probably a point to keep in mind for a future code sprint: systematically test with python 3 and debug the issues...

in reply to:  12 comment:17 by mlennert, 7 years ago

Replying to annakrat:

The problem mostly comes from calling gisenv function from script.utils which returns unicode. This was done some time ago to make the library Python3 compatible. So we eventually need to move towards unicode, but to have a quick fix for the release, we can keep the temporal library using bytestrings. This is a possible solution:

Index: lib/python/temporal/core.py
===================================================================
--- lib/python/temporal/core.py	(revision 71395)
+++ lib/python/temporal/core.py	(working copy)
@@ -542,9 +542,9 @@
     grassenv = gscript.gisenv()
 
     # Set the global variable for faster access
-    current_mapset = grassenv["MAPSET"]
-    current_location = grassenv["LOCATION_NAME"]
-    current_gisdbase = grassenv["GISDBASE"]
+    current_mapset = gscript.encode(grassenv["MAPSET"])
+    current_location = gscript.encode(grassenv["LOCATION_NAME"])
+    current_gisdbase = gscript.encode(grassenv["GISDBASE"])
 
     # Check environment variable GRASS_TGIS_RAISE_ON_ERROR
     if os.getenv("GRASS_TGIS_RAISE_ON_ERROR") == "True" or \
@@ -1107,6 +1107,7 @@
                         detect_types=self.dbmi.PARSE_DECLTYPES | self.dbmi.PARSE_COLNAMES)
                 self.connection.row_factory = self.dbmi.Row
                 self.connection.isolation_level = None
+                self.connection.text_factory = str
                 self.cursor = self.connection.cursor()
                 self.cursor.execute("PRAGMA synchronous = OFF")
                 self.cursor.execute("PRAGMA journal_mode = MEMORY")

Can we apply this patch to trunk ? And could we even apply this to release72 to get it into 7.2.2, or is it too risky ?

comment:18 by annakrat, 7 years ago

In 71411:

temporal lib: use str not unicode to fix problems with translated messages, see #3392

comment:19 by annakrat, 7 years ago

I applied that one line to trunk. It needs testing before we backport it. This works very likely only with Python 2, not Python 3. For that we need to do probably substantial changes, which I haven't looked at yet.

I was previously running into more problems, but I can't reproduce it now. So please test.

in reply to:  19 comment:20 by mlennert, 7 years ago

Replying to annakrat:

I applied that one line to trunk.

Great, thanks a lot !

It needs testing before we backport it. This works very likely only with Python 2, not Python 3. For that we need to do probably substantial changes, which I haven't looked at yet.

As I mentioned before: even without your patch I get errors trying to run with Python 3, so this seems to be an independent issue, or ?

So please test.

Will do.

in reply to:  19 ; comment:21 by mlennert, 7 years ago

Replying to annakrat:

I applied that one line to trunk. It needs testing before we backport it. This works very likely only with Python 2, not Python 3. For that we need to do probably substantial changes, which I haven't looked at yet.

I was previously running into more problems, but I can't reproduce it now. So please test.

Just did and I don't understand why, but I again have issues (see below), but not the same as before. I'm a bit lost as to why it worked before, but not now....

export LANGUAGE=fr_BE;t.create output=tempmean type=strds temporaltype=absolute title="Average temperature" description="Monthly temperature average in NC [deg C]"
Traceback (most recent call last):
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.create", line 86, in <module>
    main()
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.create", line 82, in main
    semantic, None, grass.overwrite())
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/open_stds.py", line 174, in open_new_stds
    sp.insert(dbif)
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_space_time_dataset.py", line 376, in insert
    self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 75: ordinal not in range(128)

And if I go on with LANGUAGE=C to create the strds:

export LANGUAGE=C;t.create output=tempmean type=strds temporaltype=absolute title="Average temperature" description="Monthly temperature average in NC [deg C]"
g.list type=raster pattern="*tempmean" > rasters_tempmean.list
export LANGUAGE=fr_BE;t.register -i input=tempmean type=raster start=2000-01-01 increment="1 months" file=rasters_tempmean.list 
Collecte des informations sur la carte ...
ERREUR :ascii,Définir un temps absolu valide pour la carte <,1,2,ordinal
        not in range(128)

and then

export LANGUAGE=C;t.register -i input=tempmean type=raster start=2000-01-01 increment="1 months" file=rasters_tempmean.list 
Gathering map information...
 100%
Registering maps in the temporal database...
Registering maps in the space time dataset...
Updating space time dataset...
 100%

and then

export LANGUAGE=fr_BE;t.remove tempmean
Note: registered maps themselves have not been removed, only the strds

but

export LANGUAGE=fr_BE;t.unregister file=rasters_tempmean.list
Unregister maps
Traceback (most recent call last):
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 183, in <module>
    tgis.profile_function(main)
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 84, in profile_function
    func()
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 142, in main
    statement += map.delete(dbif=dbif, update=False, execute=False)
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_map_dataset.py", line 860, in delete
    "database") % (self.get_type(), self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

in reply to:  21 ; comment:22 by annakrat, 7 years ago

Replying to mlennert:

Replying to annakrat:

I applied that one line to trunk. It needs testing before we backport it. This works very likely only with Python 2, not Python 3. For that we need to do probably substantial changes, which I haven't looked at yet.

I was previously running into more problems, but I can't reproduce it now. So please test.

Just did and I don't understand why, but I again have issues (see below), but not the same as before. I'm a bit lost as to why it worked before, but not now....

I don't understand why it behaved differently, try to apply the rest of the patch.

in reply to:  22 comment:23 by mlennert, 7 years ago

Replying to annakrat:

Replying to mlennert:

Replying to annakrat:

I applied that one line to trunk. It needs testing before we backport it. This works very likely only with Python 2, not Python 3. For that we need to do probably substantial changes, which I haven't looked at yet.

I was previously running into more problems, but I can't reproduce it now. So please test.

Just did and I don't understand why, but I again have issues (see below), but not the same as before. I'm a bit lost as to why it worked before, but not now....

I don't understand why it behaved differently, try to apply the rest of the patch.

I guess you mean

Index: lib/python/temporal/core.py
===================================================================
--- lib/python/temporal/core.py	(révision 71411)
+++ lib/python/temporal/core.py	(copie de travail)
@@ -542,9 +542,9 @@
     grassenv = gscript.gisenv()
 
     # Set the global variable for faster access
-    current_mapset = grassenv["MAPSET"]
-    current_location = grassenv["LOCATION_NAME"]
-    current_gisdbase = grassenv["GISDBASE"]
+    current_mapset = gscript.encode(grassenv["MAPSET"])
+    current_location = gscript.encode(grassenv["LOCATION_NAME"])
+    current_gisdbase = gscript.encode(grassenv["GISDBASE"])
 
     # Check environment variable GRASS_TGIS_RAISE_ON_ERROR
     if os.getenv("GRASS_TGIS_RAISE_ON_ERROR") == "True" or \

This solves the issue with t.create and t.register, but not with t.unregister:

t.unregister file=rasters_tempmean.list
Unregister maps
Traceback (most recent call last):
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 183, in <module>
    tgis.profile_function(main)
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 84, in profile_function
    func()
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 142, in main
    statement += map.delete(dbif=dbif, update=False, execute=False)
  File "/data/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_map_dataset.py", line 860, in delete
    "database") % (self.get_type(), self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

comment:24 by annakrat, 7 years ago

Yes, I would expect that. As I mentioned before, everywhere in t.* modules where we use gisenv() we need to run encode, otherwise unicode gets in the temporal library and results in these errors. I didn't have problems today when I tested it, not sure why but before it was needed. I don't have a diff for it right now.

in reply to:  24 comment:25 by mlennert, 7 years ago

Replying to annakrat:

Yes, I would expect that. As I mentioned before, everywhere in t.* modules where we use gisenv() we need to run encode, otherwise unicode gets in the temporal library and results in these errors.

That seems to have been it. Encoding all other gisenv() variables in core.py solves all error for me. I've committed this in trunk: r71422. Thanks once again, Anna !

Please others test to see if this causes any regressions for you.

I don't have the feeling that this fix is particularly invasive, but don't know enough about the python libs nor encoding.

Through too rapid typing I stumbled upon another encoding error by accident, but this does not seem as urgent to me:

t.create output=tempmean type=strds temporaltype=absolute title="Average temperature" description="Monthly temperature average in NC [deg C]"

Then (note the typo in the strds name):

t.remove tempmeanµ
Traceback (most recent call last):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.remove", line 171, in <module>
    tgis.profile_function(main)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 84, in profile_function
    func()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/scripts/t.remove", line 115, in main
    sp = tgis.open_old_stds(name, type, dbif)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/open_stds.py", line 67, in open_old_stds
    if not sp.is_in_db(dbif):
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_dataset.py", line 368, in is_in_db
    return self.base.is_in_db(dbif)
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/base.py", line 307, in is_in_db
    sql = self.get_is_in_db_statement()
  File "/home/mlennert/SRC/GRASS/grass_trunk/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/base.py", line 296, in get_is_in_db_statement
    " WHERE id = \'" + str(self.ident) + "\';\n"
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 8: ordinal not in range(128)

Not sure how far we have to go in encoding SQL strings as well ?

Also: the checking of strds names does not seem to strict enough:

t.create output=tempmeanµ type=strds temporaltype=absolute title="Average temperature" description="Monthly temperature average in NC [deg C]"
ATTENTION: Nom de fichier <tempmeanµ> non autorisé. Caractère <�>
           interdit.

but then:

t.list
----------------------------------------------
Space time raster datasets with absolute time available in mapset <climate_2000_2012>:
[...]
tempmeanµ@climate_2000_2012

So I'm being told through a warning that the character 'µ' is forbidden, but the strds is still created. I then cannot remove it because of the above error... Maybe the above warning about the forbidden character should accompany a fatal error, not a warning ? But this is a different bug, so I should probably write up a separate report. And IMHO this is not as urgent as the committed fixes.

comment:26 by annakrat, 7 years ago

In 71423:

temporal: encode mapset string to avoid injecting unicode into temporal library, see #3392

comment:27 by annakrat, 7 years ago

Does t.unregister work for you if you don't specify mapset in the strds name? This is what I got:

Unregister maps
Traceback (most recent call last):
  File "/home/anna/dev/grass/trunk2/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 183, in <module>
    tgis.profile_function(main)
  File "/home/anna/dev/grass/trunk2/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/core.py", line 84, in profile_function
    func()
  File "/home/anna/dev/grass/trunk2/dist.x86_64-pc-linux-gnu/scripts/t.unregister", line 142, in main
    statement += map.delete(dbif=dbif, update=False, execute=False)
  File "/home/anna/dev/grass/trunk2/dist.x86_64-pc-linux-gnu/etc/python/grass/temporal/abstract_map_dataset.py", line 860, in delete
    "database") % (self.get_type(), self.get_id()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

I committed the fix above.

I think this should go to the release. But we will need to come back to this and rethink it to enable Python3. I unfortunately don't know yet what needs to be done for that, I will look at Pietro's changes.

in reply to:  27 comment:28 by mlennert, 7 years ago

Replying to annakrat:

Does t.unregister work for you if you don't specify mapset in the strds name?

t.unregister input=tempmean file=rasters_tempmean.list
Unregister maps
 100%
Unregister maps from space time dataset <tempmean>

So, yes, it worked, with or without your latest commit.

I committed the fix above.

Duh. I was focused on lib/python/temporal and completely forgot to check the modules for gisenv() calls. Sorry, just too many other things on the platter these days. Thanks for catching this.

I think this should go to the release. But we will need to come back to this and rethink it to enable Python3. I unfortunately don't know yet what needs to be done for that, I will look at Pietro's changes.

Ok. I'll try to find time tomorrow to merge it all into release72.

comment:29 by mlennert, 7 years ago

In 71426:

temporal lib: use str not unicode and encode variables resulting from gisenv() (backport from trunk, r71411, r71422, r71423, see #3392)

in reply to:  12 ; comment:30 by glynn, 7 years ago

Replying to annakrat:

The problem mostly comes from calling gisenv function from script.utils which returns unicode.

That shouldn't be the case. Nothing in the grass.script module should ever return unicode. Because if it does, anything which isn't wxgui will just have to turn it back into bytes again. Which is more error-prone than just leaving it as bytes in the first place.

in reply to:  30 ; comment:31 by annakrat, 7 years ago

Replying to glynn:

Replying to annakrat:

The problem mostly comes from calling gisenv function from script.utils which returns unicode.

That shouldn't be the case. Nothing in the grass.script module should ever return unicode. Because if it does, anything which isn't wxgui will just have to turn it back into bytes again. Which is more error-prone than just leaving it as bytes in the first place.

Since r65787 script.gisenv() returns unicode, because of Python3 support as the commit message says. We need to have a discussion about Python3 and what should be the strategy to make the code compatible.

in reply to:  31 ; comment:32 by glynn, 7 years ago

Replying to annakrat:

Since r65787 script.gisenv() returns unicode, because of Python3 support as the commit message says. We need to have a discussion about Python3 and what should be the strategy to make the code compatible.

"because of Python3 support" isn't a reason. "Python 3 support" basically means preventing Python 3 from automatically converting everything to unicode. Python 3 can still handle byte strings; which is fortunate, because I don't expect the C portion of GRASS to be converted to unicode in the foreseeable future (if ever).

The functions in grass.script should never return unicode. That module reads the contents of files (which are sequences of bytes) and the output from processes (again, sequences of bytes). Allowing Python 3 to convert byte strings to unicode (and requiring consumers to convert them back to byte strings) is error prone. In the few cases where Python 3 refuses to return the raw byte strings (e.g. sys.argv, os.environ on Windows), grass.script should encode them for compatibility with Python 2. In all other cases, the byte strings should be returned as-is.

in reply to:  32 ; comment:33 by mlennert, 7 years ago

Replying to glynn:

The functions in grass.script should never return unicode. That module reads the contents of files (which are sequences of bytes) and the output from processes (again, sequences of bytes).

Is this true for translated module messages as well ?

comment:34 by neteler, 7 years ago

Milestone: 7.2.27.2.3

Ticket retargeted after milestone closed

in reply to:  33 ; comment:35 by glynn, 7 years ago

Replying to mlennert:

Replying to glynn:

The functions in grass.script should never return unicode. That module reads the contents of files (which are sequences of bytes) and the output from processes (again, sequences of bytes).

Is this true for translated module messages as well ?

Yes. Anything which originates as a byte string (e.g. output from commands, contents of files) should be kept as a byte string. Anything which (for whatever reason) originates as a unicode string should be encoded at the earliest opportunity.

in reply to:  35 ; comment:36 by mlennert, 7 years ago

Replying to glynn:

Replying to mlennert:

Replying to glynn:

The functions in grass.script should never return unicode. That module reads the contents of files (which are sequences of bytes) and the output from processes (again, sequences of bytes).

Is this true for translated module messages as well ?

Yes. Anything which originates as a byte string (e.g. output from commands, contents of files) should be kept as a byte string. Anything which (for whatever reason) originates as a unicode string should be encoded at the earliest opportunity.

Thanks for the explanations, Glynn. However, I'm really a bit lost in this whole discussion. One example: I always thought that unicode was an abstract representation, but that in practice a unicode string was always encoded in some encoding, with Python 3 using UTF-8 by default. Or does Python 3 really store the actual unicode code points ?

And what would it take for our C code to be converted to unicode ?

I think we need two things to go further on this:

  • Write up (or link to) a clear explanation of the unicode issues in Python, in the specific context of GRASS GIS with links between Python and C code, so that we make sure we have a common understanding.
  • Make it one of our priorities, probably better after the 7.4 release, to sort things out in a coherent and clearly documented manner (including best practice guide to developers), in order to avoid this issue bothering us for years to come.

Moritz

in reply to:  36 comment:37 by glynn, 7 years ago

Replying to mlennert:

One example: I always thought that unicode was an abstract representation, but that in practice a unicode string was always encoded in some encoding, with Python 3 using UTF-8 by default. Or does Python 3 really store the actual unicode code points ?

In Python, a unicode string is an array of wide characters. Typically 16 bits on Windows, 32 bits on Unix, to match the platform's wchar_t type. This allows strings to be passed directly to the platform's APIs without conversion.

And what would it take for our C code to be converted to unicode ?

Re-writing almost everything that uses strings. Also, anything arriving as a byte string would need an associated encoding or it may as well be discarded.

Internally, strings could be stored either as wchar_t[] or as char[] using UTF-8. But whichever representation was used, you would need to perform conversions for some platforms: Unix uses char[] and doesn't much care about encoding, Windows uses wchar_t[] (char[]-based APIs such as fopen() are considered "legacy" and can't be used with e.g. filenames which aren't valid in the current codepage).

comment:38 by martinl, 6 years ago

Milestone: 7.2.3

Ticket retargeted after milestone closed

comment:39 by martinl, 6 years ago

Milestone: 7.2.4

comment:40 by mlennert, 5 years ago

Has the work on Python 3 changed anything for this discussion, here ?

in reply to:  40 ; comment:41 by veroandreo, 5 years ago

Replying to mlennert:

Has the work on Python 3 changed anything for this discussion, here ?

Testing the commands provided in comment:21 gives no error here (trunk, r74471M) within a python3 virtual environment. Also no errors in 76 relbranch.

Tell me if I can test further, but I'd suggest closing since the reported issue seems to be gone.

in reply to:  41 comment:42 by mlennert, 5 years ago

Resolution: fixed
Status: newclosed

Replying to veroandreo:

Replying to mlennert:

Has the work on Python 3 changed anything for this discussion, here ?

Testing the commands provided in comment:21 gives no error here (trunk, r74471M) within a python3 virtual environment. Also no errors in 76 relbranch.

Tell me if I can test further, but I'd suggest closing since the reported issue seems to be gone.

Thanks a lot for testing. As it works for you, I'm closing. We can reopen if needed.

Note: See TracTickets for help on using tickets.