Opened 5 years ago

Closed 13 months ago

#2532 closed defect (wontfix)

TypeError: environment can only contain string when launching script on Windows

Reported by: annakrat Owned by: grass-dev@…
Priority: normal Milestone: 7.0.7
Component: Default Version: svn-trunk
Keywords: encoding Cc:
CPU: Unspecified Platform: MSWindows 8

Description

When launching python script in GUI - File - Launch script, I am asked to add the path to GRASS_ADDON_PATH. I did it and ran the script successfully. However, I am not able to run any command afterwards because of the python error (TypeError?: environment can only contain string). The problem is the script path is unicode type (although I am using only ascii letters). The solution is to encode the script path, but with which encoding? And how it is going to be decoded?

A temporary solution is to reject any scripts with path with non-ascii letters and just use str().

Change History (22)

comment:1 in reply to:  description ; Changed 5 years ago by glynn

Replying to annakrat:

The problem is the script path is unicode type (although I am using only ascii letters).

wxPython uses Unicode for almost everything. So retrieving the contents of a text field will return a Python unicode value.

The solution is to encode the script path, but with which encoding? And how it is going to be decoded?

It won't be decoded. The byte string will be available to the called program as a char* via getenv() (for C) or os.environ (Python).

A temporary solution is to reject any scripts with path with non-ascii letters and just use str().

wxGUI's core.gcmd module has EncodeString?() and DecodeString?() methods which use whatever wxGUI considers to be the "system" encoding. Those are used by gcmd.Popen for converting the arguments to strings and by gcmd.RunCommand?() for converting the process' output to unicode.

comment:2 in reply to:  1 Changed 5 years ago by annakrat

Replying to glynn:

Replying to annakrat:

wxGUI's core.gcmd module has EncodeString?() and DecodeString?() methods which use whatever wxGUI considers to be the "system" encoding. Those are used by gcmd.Popen for converting the arguments to strings and by gcmd.RunCommand?() for converting the process' output to unicode.

OK, I used EncodeString?, but then with non-ascii characters I get (ascii only path works fine now):

Traceback (most recent call last):
  File "C:\Users\akratoc\Programs\GRASS GIS
7.0.0svn\gui\wxpython\lmgr\frame.py", line 842, in
OnRunScript

filename = EncodeString(filename)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.0.0svn\gui\wxpython\core\gcmd.py", line 101, in
EncodeString

return string.encode(_enc)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.0.0svn\Python27\lib\encodings\cp1252.py", line 12, in
encode

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError
:
'charmap' codec can't encode character u'\u0165' in position
40: character maps to <undefined>

I have seen this error in several other tickets, is there something we can do about it?

comment:3 Changed 5 years ago by annakrat

I see what you were writing in #2525. So should we just catch an exception and say the user, sorry, don't use non ascii characters in the script path (and change your operating system)?

comment:4 in reply to:  3 Changed 5 years ago by glynn

Replying to annakrat:

I see what you were writing in #2525. So should we just catch an exception and say the user, sorry, don't use non ascii characters in the script path (and change your operating system)?

It's not "non-ASCII" characters per se, it's characters which aren't representable in your system codepage (configurable on Windows 7 via Control Panel -> Region and Language -> Administrative -> Change system locale ...).

For Western European languages, the system locale's encoding will be cp1252, which is basically ISO-8859-1 but with most of the C1 control codes (\x80-\x9f) remapped to additional graphic characters.

U+0165 is present in cp1250 (Eastern European, similar to ISO-8859-2).

It appears that Windows has a mechanism for approximating accented characters; if I create a directory whose name contains that character, the "dir" command (in a console using cp1252) shows the directory with the character replaced by "t", and I can "cd" into the directory. Unfortunately, this feature doesn't appear to be accessible via Python.

comment:5 Changed 5 years ago by annakrat

I used EncodeString in r63997, r63998. I tested it successfully on Windows (cp1252) with ascii characters and non-ascii characters which are not present in cp1252 result in error dialog with message how to avoid that. However, I failed to run the script when the name contained non-ascii characters present in cp1252 (á). I don't get any error, but in gui console I get:

Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'...                
(Thu Jan 08 12:04:24 2015)                                                                                          
Description:
 Adds the values of two rasters (A + B)
Keywords:
 raster, algebra, sum
Usage:
 test_workshopá.py araster=name braster=name output=name
[--overwrite]
   [--help] [--verbose] [--quiet] [--ui]
Flags:
 --o   Allow output files to overwrite existing files
 --h   Print usage summary
 --v   Verbose module output
 --q   Quiet module output
 --ui  Force launching GUI dialog
Parameters:
  araster   Name of input raster A in an expression A + B
  braster   Name of input raster B in an expression A + B
   output   Name for output raster map
ERROR: Required parameter <araster> not set:
        (Name of input raster A in an expression A + B)
ERROR: Required parameter <braster> not set:
        (Name of input raster B in an expression A + B)
ERROR: Required parameter <output> not set:
        (Name for output raster map)
(Thu Jan 08 12:04:25 2015) Command finished (0 sec) 

comment:6 Changed 5 years ago by annakrat

Priority: majornormal

I backported r63997, r63998 in r64102. Now it's working at least with ascii characters on the path.

comment:7 in reply to:  5 ; Changed 5 years ago by glynn

Replying to annakrat:

However, I failed to run the script when the name contained non-ascii characters present in cp1252 (á). I don't get any error, but in gui console I get:

Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'...                

Is the "..." literal? I.e. does the GUI omit the arguments, or does it include details which have been omitted from the ticket?

ERROR: Required parameter <araster> not set:

Can you get any more debug output?

comment:8 in reply to:  7 ; Changed 5 years ago by annakrat

Replying to glynn:

Replying to annakrat:

However, I failed to run the script when the name contained non-ascii characters present in cp1252 (á). I don't get any error, but in gui console I get:

Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'...                

Is the "..." literal? I.e. does the GUI omit the arguments, or does it include details which have been omitted from the ticket?

That comes from here, there are no details, it's ran without any arguments.

ERROR: Required parameter <araster> not set:

Can you get any more debug output?

Will try.

comment:9 in reply to:  8 Changed 5 years ago by annakrat

Replying to annakrat:

Replying to glynn:

Can you get any more debug output?

Will try.

With debug messages on I get in the GUI console:

Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'...                
(Thu Jan 08 12:04:24 2015)                                                      
C:\Users\akratoc\Desktop\test_workshopá.py    
D2/5: filename = C:\Users\akratoc\Desktop\test_workshopá.py
D1/5: G_set_program_name(): test_workshopá
D2/5: G_file_name(): path =
C:\Users\akratoc\grassdata/nc_basic_spm_grass7/user1                                  
Description:
... and the same as above

and in the terminal window:

GUI D5/5: EncodeString(): enc=cp1252
D1/5: grass.script.core.start_command(): g.gisenv -n
D1/5: G_set_program_name(): g.gisenv
D2/5: G_option_to_separator(): key = separator -> sep = '
'
GUI D1/5: gcmd.CommandThread(): C:\Users\akratoc\Desktop\test_workshopá.py
GUI D5/5: EncodeString(): enc=cp1252
GUI D5/5: EncodeString(): enc=cp1252

It doesn't seem particularly helpful but I don't know what else I can do.

comment:10 in reply to:  8 ; Changed 5 years ago by glynn

Replying to annakrat:

That comes from here, there are no details, it's ran without any arguments.

I see.

It's executing the script, which is executing g.parser, which is reading the option definitions from the script then calling G_parser(). As it's called without arguments, G_parser() should be generating a GUI dialog, but it's not even attempting to do that; it's falling through to the option-checking code.

AFAICT, in order for that error message to occur, either argc would have to be at least 2 or isatty(0) would have to be false. But if argc >= 2, that would result in the value of argv[1] being used as the value for araster= (even if it's an empty string), which would prevent the "Required parameter <araster> not set" error.

Which leaves isatty(0) being false. But that shouldn't have anything to do with whether the script filename contains non-ASCII characters. it might be something to do with wxGUI, or it might be Windows weirdness.

Can you add the following to the script, before the call to grass.parser():

import os
print os.isatty(0)

comment:11 in reply to:  10 ; Changed 5 years ago by annakrat

Replying to glynn:

Replying to annakrat: Can you add the following to the script, before the call to grass.parser():

import os
print os.isatty(0)

It gives me False. I will try to see if there is something wrong in the gui part.

comment:12 in reply to:  11 Changed 5 years ago by glynn

Replying to annakrat:

It gives me False.

Presumably that's only the case when the script filename has non-ASCII characters?

comment:13 in reply to:  11 ; Changed 5 years ago by annakrat

Replying to annakrat:

Replying to glynn:

Replying to annakrat:

It gives me False. I will try to see if there is something wrong in the gui part.

I found that there is raised and ignored exception here and if I remove the try except block, I get:

Traceback (most recent call last):
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\lmgr\frame.py", line 907, in
OnRunScript

self._gconsole.RunCmd([filename])
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\core\gconsole.py", line 554, in RunCmd

task = gtask.parse_interface(command[0])
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\etc\python\grass\script\task.py", line 509, in
parse_interface

tree = etree.fromstring(get_interface_description(name))
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\etc\python\grass\script\task.py", line 465, in
get_interface_description

stderr=PIPE)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\etc\python\grass\script\core.py", line 62, in
__init__

subprocess.Popen.__init__(self, args, **kwargs)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\subprocess.py", line 711, in __init__

errread, errwrite)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\subprocess.py", line 922, in
_execute_child

args = '{} /c "{}"'.format (comspec, args)
UnicodeEncodeError
'ascii' codec can't encode character u'\xe1' in position 38: ordinal not in range(128)

The command[0] is Unicode. It seems Popen in Python 2.7 can't handle non-ascii characters. So I tried to encode the command string and I get different error:

Traceback (most recent call last):
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\lmgr\frame.py", line 907, in
OnRunScript

self._gconsole.RunCmd([filename])
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\core\gconsole.py", line 555, in RunCmd

task = gtask.parse_interface(EncodeString(command[0]))
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\etc\python\grass\script\task.py", line 509, in
parse_interface

tree = etree.fromstring(get_interface_description(name))
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\xml\etree\ElementTree.py", line 1300,
in XML

parser.feed(text)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\xml\etree\ElementTree.py", line 1642,
in feed

self._raiseerror(v)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\xml\etree\ElementTree.py", line 1506,
in _raiseerror

raise err
xml.etree.ElementTree
.
ParseError
:
syntax error: line 1, column 0

It seems that get_interface_description returns empty xml. I didn't have time to look into it further.

comment:14 in reply to:  13 ; Changed 5 years ago by glynn

Replying to annakrat:

The command[0] is Unicode. It seems Popen in Python 2.7 can't handle non-ascii characters.

It's more accurate to say that it can't handle unicode. Or, more precisely, unicode which cannot be implicitly converted to a string. Implicit conversions use the default encoding (which is typically ASCII) rather than the locale's encoding. The default encoding is a system or user preference and cannot be changed by scripts.

So I tried to encode the command string and I get different error:

raise err
xml.etree.ElementTree
.
ParseError

It seems that get_interface_description returns empty xml

Did you confirm that?

Otherwise, my guess is that the XML is invalid due to encoding issues.

The program name is copied verbatim into the XML, in the <task name="..."> tag.

If GRASS was built with iconv support, the declared encoding of the XML will be UTF-8; text nodes will be convert from the locale's encoding to UTF-8 (and <,>,& will be converted to entities), but attribute values aren't converted:

    fprintf(stdout, "<task name=\"%s\">\n", st->pgm_name);

So, they need to be restricted to the intersection of the locale's encoding and UTF-8 (which probably means ASCII).

I'm not sure that it's worth trying to support script names which contain non-ASCII characters. However, scripts in directories whose names contain non-ASCII characters need to be supported. The same applies to other files; e.g. we can reasonably restrict map, mapset and location names to ASCII, but we should support the situation where the database path contains non-ASCII characters.

In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.

comment:15 in reply to:  14 ; Changed 5 years ago by annakrat

Replying to glynn:

Replying to annakrat:

So I tried to encode the command string and I get different error:

raise err
xml.etree.ElementTree
.
ParseError

It seems that get_interface_description returns empty xml

Did you confirm that?

No, when I print the string I get xml, seems to be valid:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE task SYSTEM "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\xml\grass-interface.dtd">
<task name="test_workshopá.py">
	<description>
		Adds the values of two rasters (A + B)
	</description>
...

I don't understand what's wrong with it.

Otherwise, my guess is that the XML is invalid due to encoding issues.

The program name is copied verbatim into the XML, in the <task name="..."> tag.

If GRASS was built with iconv support, the declared encoding of the XML will be UTF-8; text nodes will be convert from the locale's encoding to UTF-8 (and <,>,& will be converted to entities), but attribute values aren't converted:

    fprintf(stdout, "<task name=\"%s\">\n", st->pgm_name);

So, they need to be restricted to the intersection of the locale's encoding and UTF-8 (which probably means ASCII).

I'm not sure that it's worth trying to support script names which contain non-ASCII characters. However, scripts in directories whose names contain non-ASCII characters need to be supported. The same applies to other files; e.g. we can reasonably restrict map, mapset and location names to ASCII, but we should support the situation where the database path contains non-ASCII characters.

In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.

Should the be encoding moved to get_interface_description in task.py? The EncodeString function is in gui, not in python scripting library.

If I try to run the script (this time the script name is only ascii, but the path has some non-ascii characters which are in cp1252), I get the gui dialog and when I run it, I get an error:

Exception in thread Thread-28:
Traceback (most recent call last):
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\threading.py", line 810, in
__bootstrap_inner
    self.run()
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\core\gconsole.py", line 155, in run
    self.resultQ.put((requestId, self.requestCmd.run()))
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\core\gcmd.py", line 575, in run
    env = self.env)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\core\gcmd.py", line 161, in __init__
    args = map(EncodeString, args)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\core\gcmd.py", line 92, in EncodeString
    return string.encode(_enc)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\encodings\cp1252.py", line 12, in
encode
    return
codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in
position 38: ordinal not in range(128)

because in Popen class in gcmd.py some of the arguments are of type str, some are unicode. So if encode only the unicode ones, it starts to work.

            for i in range(len(args)):
                if type(args[i]) != str:
                    args[i] = EncodeString(args[i])

So I am not sure what should I do with these results.

comment:16 in reply to:  15 ; Changed 5 years ago by glynn

Replying to annakrat:

No, when I print the string I get xml, seems to be valid:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE task SYSTEM "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\xml\grass-interface.dtd">
<task name="test_workshopá.py">

I don't understand what's wrong with it.

The name= attribute will fail to decode due to not being valid UTF-8. The "á" will be encoded in cp1252 (i.e. '\xe1'); attempting to decode that as UTF-8 will fail (non-ASCII characters are encoded as multi-byte sequences; an isolated byte >= 128 can never occur in UTF-8).

In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.

Should the be encoding moved to get_interface_description in task.py?

No. The GUI shouldn't be passing unicode values to the grass.script library; it should be converting them to strings itself.

The EncodeString function is in gui, not in python scripting library.

grass.script.core has encode() and decode().

If I try to run the script (this time the script name is only ascii, but the path has some non-ascii characters which are in cp1252), I get the gui dialog and when I run it, I get an error:

  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\gui\wxpython\core\gcmd.py", line 92, in EncodeString
    return string.encode(_enc)
  File "C:\Users\akratoc\Programs\GRASS GIS
7.1.svn\Python27\lib\encodings\cp1252.py", line 12, in
encode
    return
codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in
position 38: ordinal not in range(128)

Ugh. I couldn't figure out what was happening here until I read the next sentence. It appears that str.encode() actually exists; it tries to convert the string to unicode (using the default encoding) so that it can encode it.

because in Popen class in gcmd.py some of the arguments are of type str, some are unicode. So if encode only the unicode ones, it starts to work.

That makes sense. But the encoding should ideally be done at a higher level, at the point that wxGUI "knows" that it's dealing with a unicode value.

This is the main reason why I dislike dynamically-typed languages for large-scale projects (I'd never have suggested Python if I'd have known that wxGUI was going to turn into such a behemoth). In C/C++, you'd just get a compile error if you pass a wchar_t*/std::wstring() where a char*/std::string was expected. In Python, you get something which appears to work until it starts getting decent test coverage.

I'm wondering if sys.setdefaultencoding("EBCDIC-CP-BE") would work ...

comment:17 in reply to:  16 ; Changed 5 years ago by annakrat

Replying to glynn:

Replying to annakrat:

No, when I print the string I get xml, seems to be valid:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE task SYSTEM "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\xml\grass-interface.dtd">
<task name="test_workshopá.py">

I don't understand what's wrong with it.

The name= attribute will fail to decode due to not being valid UTF-8. The "á" will be encoded in cp1252 (i.e. '\xe1'); attempting to decode that as UTF-8 will fail (non-ASCII characters are encoded as multi-byte sequences; an isolated byte >= 128 can never occur in UTF-8).

I take it that we are supporting only ascii characters in the script name.

In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.

Should the be encoding moved to get_interface_description in task.py?

No. The GUI shouldn't be passing unicode values to the grass.script library; it should be converting them to strings itself.

Ok.

The EncodeString function is in gui, not in python scripting library.

grass.script.core has encode() and decode().

If I try to run the script (this time the script name is only ascii, but the path has some non-ascii characters which are in cp1252), I get the gui dialog and when I run it, I get an error:

Ugh. I couldn't figure out what was happening here until I read the next sentence. It appears that str.encode() actually exists; it tries to convert the string to unicode (using the default encoding) so that it can encode it.

because in Popen class in gcmd.py some of the arguments are of type str, some are unicode. So if encode only the unicode ones, it starts to work.

That makes sense. But the encoding should ideally be done at a higher level, at the point that wxGUI "knows" that it's dealing with a unicode value.

I am not sure where the higher level is and why str and unicode are mixed in this case.

I'm wondering if sys.setdefaultencoding("EBCDIC-CP-BE") would work ...

Why would it? Is it easy to test?

Anyway, I think whatever we do, shouldn't get into the current release. I already fixed the important part (works with ascii path only) and I don't want to make things worse.

comment:18 in reply to:  17 Changed 5 years ago by glynn

Replying to annakrat:

That makes sense. But the encoding should ideally be done at a higher level, at the point that wxGUI "knows" that it's dealing with a unicode value.

I am not sure where the higher level is and why str and unicode are mixed in this case.

Unicode values typically come from wxWidgets, e.g. any text retrieved from a text field will be a unicode object.

I'm wondering if sys.setdefaultencoding("EBCDIC-CP-BE") would work ...

Why would it? Is it easy to test?

Sorry, that was really just thinking out loud. It wouldn't fix anything, it would just highlight any remaining implicit conversions.

EBCDIC (used on IBM mainframes) is one of the few encodings which [b]isn'tb compatible (or even mostly-compatible) with ASCII. Setting the default encoding to EBCDIC would make it obvious when implicit str<->unicode conversions were being performed, because the results would be completely wrong (e.g. even A-Z/a-z don't have the same codepoints as ASCII).

The default encoding can only be set in site.py; site.py deletes the setdefaultencoding() function from the sys module to prevent the default encoding from being changed after start-up.

comment:19 Changed 4 years ago by martinl

Milestone: 7.0.07.0.5

comment:20 Changed 3 years ago by neteler

Milestone: 7.0.57.0.6

comment:21 Changed 2 years ago by neteler

Milestone: 7.0.67.0.7

comment:22 Changed 13 months ago by martinl

Resolution: wontfix
Status: newclosed

No activity.

Note: See TracTickets for help on using tickets.