Opened 8 years ago

Last modified 4 years ago

#1447 new defect

wxGUI wingrass scripts need whitespace in path

Reported by: mmetz Owned by: grass-dev@…
Priority: major Milestone: 6.4.6
Component: wxGUI Version: svn-releasebranch64
Keywords: wingrass, spaces Cc:
CPU: All Platform: MSWindows XP

Description

In wingrass wxGUI, whenever a file path is entered as input for a script, the script will fail if the path does not contain whitespace and succeeds only if the path does contain whitespace.

Example:

# define options in wxGUI:
v.in.geonames input=D:\GRASS_test_data\IT\IT.txt out=italy_geonames
# output
ERROR: File 'D:GRASS_test_dataITIT.txt' not found

# that works in wxGUI:
v.in.geonames input=D:\GRASS test data\IT\IT.txt out=italy_geonames

Interestingly, both commands

v.in.geonames input=D:\GRASS_test_data\IT\IT.txt out=italy_geonames

v.in.geonames input="D:\GRASS test data\IT\IT.txt" out=italy_geonames

work from the command line.

Also interestingly, this affects only scripts, not C modules, e.g. v.in.ogr always works whereas e.g. v.in.geonames fails from wxGUI if the command line does not contain at least one whitespace in the absolute path.

g.parser receives from the wxGUI

D:GRASS_test_dataITIT.txt

and

D:\\GRASS test data\\IT\\IT.txt

Within the wxGUI, as far as I was able to trace the strings, they were

D:\\GRASS_test_data\\IT\\IT.txt

and

D:\\GRASS test data\\IT\\IT.txt

that is, ok.

At some stage, the backslashes "\" are removed if no whitespace is in the string and correctly converted to "
" if a whitespace is in the string. I was not able to find this stage in wxGUI and lib/python. I guess this is one tiny single line of python code messing up the string. Any suggestions?

Change History (16)

comment:1 Changed 8 years ago by mmetz

More info:

I was able to trace it to this line in goutput.py:

https://trac.osgeo.org/grass/browser/grass/branches/releasebranch_6_4/gui/wxpython/gui_modules/goutput.py#L541

                    # process GRASS command with argument
                    self.cmdThread.RunCmd(command, stdout = self.cmd_stdout, stderr = self.cmd_stderr,
                                          onDone = onDone)

Up to here the command looked ok to me (adding lots and lots of debug messages).

BTW, IMHO the wxGUI should check more return codes and could do with more debug messages to help with debugging.

comment:2 Changed 8 years ago by hamish

Priority: normalmajor

see also comments in #1280 and #1198.

comment:3 Changed 8 years ago by benducke

What I find strange is that I see the same behavior on Windows with MSYS' sh.exe, without the wxGUI. I am running GRASS shell scripts through the SEXTANTE/GRASS interface of gvSIG. What this interface does is set up the GRASS env variables, create a GRASS mapset, then import the data and call sh.exe to parse the script. The GRASS I use is CLI only and the system does not even have Python installed. Thus my initial guess that the problem goes deeper than wxGUI.

comment:4 in reply to:  3 Changed 8 years ago by hamish

Keywords: wingrass spaces added

Replying to benducke:

What I find strange is that I see the same behavior on Windows with MSYS' sh.exe, without the wxGUI. I am running GRASS shell scripts through the SEXTANTE/GRASS interface of gvSIG. What this interface does is set up the GRASS env variables, create a GRASS mapset, then import the data and call sh.exe to parse the script. The GRASS I use is CLI only and the system does not even have Python installed. Thus my initial guess that the problem goes deeper than wxGUI.

you are right, it does, I just merged all these tickets as if we solve for one sh.exe there's a good chance we know how to solve for all cases.

  • Is gvSIG quoting the path names before invoking GRASS?

Hamish

comment:5 in reply to:  3 ; Changed 8 years ago by mmetz

Replying to benducke:

What I find strange is that I see the same behavior on Windows with MSYS' sh.exe, without the wxGUI. I am running GRASS shell scripts through the SEXTANTE/GRASS interface of gvSIG. What this interface does is set up the GRASS env variables, create a GRASS mapset, then import the data and call sh.exe to parse the script. The GRASS I use is CLI only and the system does not even have Python installed. Thus my initial guess that the problem goes deeper than wxGUI.

Hmm, with wingrass CLI it always works for me, as long as file paths are quoted if they contain spaces. Quoting is not necessary for file paths without whitespace.

In wxGUI, the command plus args disappears in a Python Queue, up to there it seems ok.

Markus M

comment:6 in reply to:  5 ; Changed 8 years ago by glynn

Replying to mmetz:

In wxGUI, the command plus args disappears in a Python Queue, up to there it seems ok.

AFAICT, after entering goutput.CmdThread.requestQ, it gets pulled out in goutput.CmdThread.run(), to goutput.GrassCmd, to gcmd.CommandThread.run().

At the moment, I don't have a working WinGRASS or even the environment to build it (I've gotten as far as installing MinGW/MSys on my current PC, but not the various libraries), so I can't do any debugging myself.

comment:7 in reply to:  6 ; Changed 8 years ago by mmetz

Replying to glynn:

Replying to mmetz:

In wxGUI, the command plus args disappears in a Python Queue, up to there it seems ok.

AFAICT, after entering goutput.CmdThread.requestQ, it gets pulled out in goutput.CmdThread.run(), to goutput.GrassCmd, to gcmd.CommandThread.run().

OK. Then it goes apparently into gcmd.!Popen and still looks ok, i.e. D:
GRASS test data
IT
IT.txt

I guess it should be \"D:
GRASS test data
IT
IT.txt\", but I did not manage to get it like this. I was putting file arguments into quotes and got within wxGUI "D:
GRASS test data
IT
IT.txt", not \"D:
GRASS test data
IT
IT.txt\", even when explicitely using '\"'. "D:
GRASS test data
IT
IT.txt" is reduced to "D:GRASS test dataITIT.txt", all backslashes gone, quotes kept.

Trying
"D:
GRASS test data
IT
IT.txt
", gets converted to \D:\GRASS test data\IT\IT.txt\ when arriving at v.in.geonames. That is, the backslashes in the file path are preserved, the quotes have been stripped, but the leading and ending backslashes have been preserved too...

Trying the equivalents not in the wxGUI code but in the wxGUI dialog did not work either.

BTW, I noticed that wxGUI is calling $GISBASE/scripts/v.in.geonames, not $GISBASE/bin/v.in.geonames.bat. I gave that a try as well, no difference. Then I tried using Popen without shell, changing this line [0], that works with v.in.geonames.bat insofar as I got the same errors.

[0] https://trac.osgeo.org/grass/browser/grass/branches/releasebranch_6_4/gui/wxpython/gui_modules/gcmd.py#L490

It's not g.parser, when it arrives there, the damage is already done.

Mysterious. Maybe someone can make some sense out of it.

Markus M

comment:8 in reply to:  7 ; Changed 8 years ago by mmetz

Replying to mmetz:

Mysterious. Maybe someone can make some sense out of it.

I found a workaround but I am not sure if this is correct:

The *args argument to subprocess.Popen [0] is a list of arguments. subprocess.Popen converts the arguments differently, depending on whether there is whitespace in an argument or not. The result is meant as input for programs and apparently sometimes incompatible with "sh -c <grass.module> <arguments>". The problem for file input arguments arises if there is no whitespace in there because sh removes all backslashes left over by subprocess.Popen. The workaround is to convert the *args list into one long string with whitespaces, file paths quoted earlier when creating the command:

v.in.geonames input="D:\GRASS_test_data\IT\IT.txt" out=italy_geonames 

and use that string instead of *args for

subprocess.Popen.__init__(self, *args, **kwargs)

This workaround is obviously only needed for wingrass. And looks like a terrible hack to me. Unfortunately this breaks C modules, so scripts and C modules would need to be treated differently...

[0] https://trac.osgeo.org/grass/browser/grass/branches/releasebranch_6_4/gui/wxpython/gui_modules/gcmd.py#L124

Out of ideas,

Markus M

comment:9 in reply to:  8 Changed 8 years ago by mmetz

Replying to mmetz:

This workaround is obviously only needed for wingrass. And looks like a terrible hack to me. Unfortunately this breaks C modules, so scripts and C modules would need to be treated differently...

No, it doesn't break C modules, it was just a user error.

Markus M

comment:10 in reply to:  8 ; Changed 8 years ago by glynn

Replying to mmetz:

The *args argument to subprocess.Popen [0] is a list of arguments.

Note: args is a list of Popen's non-keyword arguments. args[0] should be a list containing the program's arguments. len(args) should always be 1; all arguments except for the first one ("args") should be passed as keyword arguments, not positional arguments.

While subprocess.Popen()'s first argument can be a string, this feature probably shouldn't be used. It's too easy to get it wrong.

subprocess.Popen converts the arguments differently, depending on whether there is whitespace in an argument or not.

subprocess.Popen() has to convert the argument list to a command string according to the rules by which MSVCRT parses the command string into arguments, so that the program's main() ends up seeing the correct argv[]. One of those rules is that strings containing whitespace must be quoted.

Because the underlying Windows interface (CreateProcess) treats a command line as a string, rather than as a list of strings (as is the case for Unix' execve()), there isn't any 100% reliable way of ensuring that the program gets the correct argv[] passed to its main() function (historically, DOS programs were often written in assembler and didn't have a main(), while Windows programs use WinMain rather than main()). The best that you can do is to assume that the program parses the command line according to the MSVCRT rules (there's a URL in the subprocess.py file). If it doesn't, you lose.

The result is meant as input for programs and apparently sometimes incompatible with "sh -c <grass.module> <arguments>". The problem for file input arguments arises if there is no whitespace in there because sh removes all backslashes left over by subprocess.Popen. The workaround is to convert the *args list into one long string with whitespaces, file paths quoted earlier when creating the command:

Before attempting this, first please confirm that the data being passed to subprocess.Popen.__init__ by gcmd.Popen() is actually correct. len(args) should be 1, args[0] should be a list of strings, none of those strings should have any quoting or escaping (beware of Python's own quoting/escaping if repr() is used). If this isn't the case, then the problem lies elsewhere, and trying to correct for it in gcmd.Popen would actually mean introducing an "equal but opposite" bug.

Second, ensure that any "fix" for shell scripts DOESN'T go into 7.0. On Windows, 7.0 needs to work with binaries, Python scripts and (cmd.exe) batch files, and it needs to do so reliably. If that means that it doesn't work with bash scripts, tough.

comment:11 in reply to:  10 ; Changed 8 years ago by mmetz

Replying to glynn:

Replying to mmetz:

The *args argument to subprocess.Popen [0] is a list of arguments.

Note: args is a list of Popen's non-keyword arguments. args[0] should be a list containing the program's arguments. len(args) should always be 1; all arguments except for the first one ("args") should be passed as keyword arguments, not positional arguments.

While subprocess.Popen()'s first argument can be a string, this feature probably shouldn't be used. It's too easy to get it wrong.

The horrible hack worksforme. But I am aware that any string protection which is normally done by Subprocess.Popen's string conversion, would then need to be done by the wxGUI, and lib/python/task.py.

subprocess.Popen converts the arguments differently, depending on whether there is whitespace in an argument or not.

subprocess.Popen() has to convert the argument list to a command string according to the rules by which MSVCRT parses the command string into arguments, so that the program's main() ends up seeing the correct argv[]. One of those rules is that strings containing whitespace must be quoted.

That's all fine for programs that have a main(), but shell scripts don't have main(). I think that is the reason why the current implementation works for real programs but not for scripts. MSVCRT rules do not (need to) cater for strings passed to cmd.exe, then to %GRASS_SH%. AFAICT, a file path without whitespaces is passed ok to the Windows shell (cmd.exe), but then further passed to %GRASS_SH%, yet another shell, and there the (by now unprotected) backslashes get lost if not quoted properly.

Because the underlying Windows interface (CreateProcess) treats a command line as a string, rather than as a list of strings (as is the case for Unix' execve()), there isn't any 100% reliable way of ensuring that the program gets the correct argv[] passed to its main() function (historically, DOS programs were often written in assembler and didn't have a main(), while Windows programs use WinMain rather than main()). The best that you can do is to assume that the program parses the command line according to the MSVCRT rules (there's a URL in the subprocess.py file). If it doesn't, you lose.

Same as above, 1) worksforme, 2) we could do that horrible hack for scripts only.

The result is meant as input for programs and apparently sometimes incompatible with "sh -c <grass.module> <arguments>". The problem for file input arguments arises if there is no whitespace in there because sh removes all backslashes left over by subprocess.Popen. The workaround is to convert the *args list into one long string with whitespaces, file paths quoted earlier when creating the command:

Before attempting this, first please confirm that the data being passed to subprocess.Popen.__init__ by gcmd.Popen() is actually correct. len(args) should be 1, args[0] should be a list of strings, none of those strings should have any quoting or escaping (beware of Python's own quoting/escaping if repr() is used). If this isn't the case, then the problem lies elsewhere, and trying to correct for it in gcmd.Popen would actually mean introducing an "equal but opposite" bug.

The strings indicating file paths do have escaping under Windows within the wxGUI, but it's \ that is escaped to
, looks fine to me. These strings are converted somewhere inside subprocess.Popen to protect whitespaces, but not to protect backslashes (escapes). I'm pretty sure that this is the problem for scripts because what the sh shell sees is e.g. \I and not
I, i.e. the backslash gets lost (see above). I have tested in wxGUI with v.in.geonames (script) and v.in.ogr (real program) and there is no quoting before subprocess.Popen is called. Quoting appears as output of subprocess.Popen if an argument contains whitespace.

The fix causes subprocess.Popen to protect everything because the program and its args are passed as one single string which means that protection of single arguments, e.g. input="D:\Grass_test_data\IT\It.txt" must be done manually.

I am aware that this is not looking ideal. Any better ideas?

Second, ensure that any "fix" for shell scripts DOESN'T go into 7.0. On Windows, 7.0 needs to work with binaries, Python scripts and (cmd.exe) batch files, and it needs to do so reliably. If that means that it doesn't work with bash scripts, tough.

I am aware of that ;-)

Markus M

comment:12 in reply to:  11 ; Changed 8 years ago by glynn

Replying to mmetz:

subprocess.Popen() has to convert the argument list to a command string according to the rules by which MSVCRT parses the command string into arguments, so that the program's main() ends up seeing the correct argv[]. One of those rules is that strings containing whitespace must be quoted.

That's all fine for programs that have a main(), but shell scripts don't have main().

sh.exe does, and it uses argv[i] as the basis for the $<i>. It may perform conversions (e.g. filename translation) on the individual arguments, but it doesn't merge argv[] back into a string then parse it back into individual arguments. However, with "-c", it will end up parsing the resulting string into a command and arguments.

I think that is the reason why the current implementation works for real programs but not for scripts. MSVCRT rules do not (need to) cater for strings passed to cmd.exe, then to %GRASS_SH%.

cmd.exe has the same argument parsing rules as MSVCRT. Once an argument list has been converted to a string, prepending "cmd.exe /c " to the beginning of the string is supposed to be a no-op, except that the "program" is no longer restricted to binary executables.

AFAICT, a file path without whitespaces is passed ok to the Windows shell (cmd.exe), but then further passed to %GRASS_SH%, yet another shell, and there the (by now unprotected) backslashes get lost if not quoted properly.

Aha. The problem is with scripts/windows_launch.bat:

@"%GRASS_SH%" -c '"%GISBASE%/scripts/SCRIPT_NAME" %*'

It should be:

@"%GRASS_SH%" "%GISBASE%/scripts/SCRIPT_NAME" %*

The former approach can't be made to work. There is no practical way to write a batch file which will reliably turn a program and a list of arguments into a Bourne-shell command which will execute that program with those arguments. Any attempt will work for simple cases but fail for arguments which contain shell metacharacters (space, backslash, quote, etc). cmd.exe's string handling is too primitive to do it right.

The latter approach simply runs the shell with the argument list which was passed to the script, but with the path to the script "pushed" onto the front (i.e. the original arguments are shifted along one place). This is exactly how Unix handles a shebang. If this doesn't work, it's because of "workarounds" elsewhere which try to compensate for this.

The strings indicating file paths do have escaping under Windows within the wxGUI, but it's \ that is escaped to
, looks fine to me.

Are you sure that it isn't Python doing this? Just to be sure, can you add e.g.:

f = open('debug.txt', 'w')
f.write(repr((args, kwargs)))
f.close()

immediately before the subprocess.Popen.init call?

If those doubled backslashes are actually in the strings, that's a bug which needs to be fixed. subprocess.Popen() does the required quoting, and it gets it right. Note that backslashes will be doubled by repr(); if there are quad backslashes, that means they've really been doubled.

These strings are converted somewhere inside subprocess.Popen to protect whitespaces, but not to protect backslashes (escapes).

The only conversion which subprocess.Popen() performs is to convert the argument list to a string according to the MSVCRT/cmd.exe rules. If you write a C program which just prints out main()'s argv[], it will show the exact argument list which was passed to subprocess.Popen().

I'm pretty sure that this is the problem for scripts

I'm pretty sure that the core problem is with windows_launch.bat, as described above. There may well be other bugs which were introduced to attempt to "cancel out" the bug in windows_launch.bat.

The fix causes subprocess.Popen to protect everything because the program and its args are passed as one single string which means that protection of single arguments, e.g. input="D:\Grass_test_data\IT\It.txt" must be done manually.

This is an attempt (probably not the first) to "cancel out" the actual bug.

comment:13 in reply to:  12 ; Changed 8 years ago by mmetz

Replying to glynn:

Replying to mmetz:

AFAICT, a file path without whitespaces is passed ok to the Windows shell (cmd.exe), but then further passed to %GRASS_SH%, yet another shell, and there the (by now unprotected) backslashes get lost if not quoted properly.

Aha. The problem is with scripts/windows_launch.bat:

> @"%GRASS_SH%" -c '"%GISBASE%/scripts/SCRIPT_NAME" %*'

It should be:

> @"%GRASS_SH%" "%GISBASE%/scripts/SCRIPT_NAME" %*

The former approach can't be made to work. There is no practical way to write a batch file which will reliably turn a program and a list of arguments into a Bourne-shell command which will execute that program with those arguments. Any attempt will work for simple cases but fail for arguments which contain shell metacharacters (space, backslash, quote, etc). cmd.exe's string handling is too primitive to do it right.

That was it! I tested and it works fine. A wonderfully small change to fix this obscure bug.

The strings indicating file paths do have escaping under Windows within the wxGUI, but it's \ that is escaped to
, looks fine to me.

Are you sure that it isn't Python doing this? Just to be sure, can you add e.g.:

> f = open('debug.txt', 'w')
> f.write(repr((args, kwargs)))
> f.close()

immediately before the subprocess.Popen.init call?

If those doubled backslashes are actually in the strings, that's a bug which needs to be fixed. subprocess.Popen() does the required quoting, and it gets it right. Note that backslashes will be doubled by repr(); if there are quad backslashes, that means they've really been doubled.

The result has double, not quad backslashes, i.e. seems to be ok:

((['v.in.geonames', 'input=D:\\GRASS_test_data\\IT\\IT.txt out=italy_geonames', 'output=IT'])), {'shell': True, 'stdin': -1, 'stderr': -1, 'stdout': -1}] 

I'm pretty sure that the core problem is with windows_launch.bat, as described above.

You're right, thanks a lot for your help! I have committed the change to windows_launch.bat in 6.4 and 6.5, r48448 and r48449, respectively.

Markus M

comment:14 in reply to:  13 Changed 8 years ago by glynn

Replying to mmetz:

The result has double, not quad backslashes, i.e. seems to be ok:

((['v.in.geonames', 'input=D:\\GRASS_test_data\\IT\\IT.txt out=italy_geonames', 'output=IT'])), {'shell': True, 'stdin': -1, 'stderr': -1, 'stdout': -1}] 

That looks fine.

I'm pretty sure that the core problem is with windows_launch.bat, as described above.

You're right, thanks a lot for your help! I have committed the change to windows_launch.bat in 6.4 and 6.5, r48448 and r48449, respectively.

One minor caveat: v.krige will need to create its own batch file on Windows, as it's a Python script, not a shell script.

"sh -c 'prog arg arg'" will run arbitrary commands: executables, shell built-ins, shell scripts, or other scripts (for which it will use the shebang). The cost is that the argument to -c will be parsed, so it must be properly constructed (i.e. quoted) according to bash syntax. "sh prog arg arg" will only run shell scripts, but will pass the arguments verbatim, without having to worry about parsing.

As it's a "mixed mode" script (GUI or non-GUI, depending upon whether there are any arguments), we can't rely upon the .py extension; it needs to be run via GRASS_PYTHON, e.g.:

@"%GRASS_PYTHON%" "%GISBASE%\scripts\v.krige.py" %*

(This means that v.krige is broken for a different reason in 7.0; it needs to be split into separate Python and Python+wxPython scripts, as is done for wxpyimgview).

comment:15 Changed 8 years ago by hamish

see also #1310

comment:16 Changed 4 years ago by neteler

Milestone: 6.4.26.4.6
Note: See TracTickets for help on using tickets.