#2579 closed enhancement (fixed)
Specify command to be exectued as parameter of grass command
Reported by: | wenzeslaus | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.2.0 |
Component: | Startup | Version: | svn-trunk |
Keywords: | batch job, GRASS_BATCH_JOB, init | Cc: | |
CPU: | Unspecified | Platform: | All |
Description
To run some modules from outside of GRASS you currently have to either setup the environment yourself which is hard, error prone and you won't get it right anyway or you can use grass
command in a batch mode. For this you have to specify GRASS_BATCH_JOB
environmental variable and then call GRASS GIS:
export GRASS_BATCH_JOB=.../test_script.sh grass7 ~/grassdata/location/mapset
Although this works it might be quite cumbersome especially in some languages. For example Python has much smoother interface where you just specify the script and its arguments:
python .../test_script.py arg1 arg2 ...
The attached patch is introducing an additional interface for the grass
command which allows to call scripts like this:
grass7 --mapset ~/grassdata/location/mapset --batch .../test_script.sh
But it actually allows to also use parameters, GRASS modules, and generally any commands:
grass7 --mapset ~/grassdata/location/mapset --batch .../test_script.sh some parameters grass7 --mapset ~/grassdata/location/mapset --batch r.info map=elevation
If you are fine with what is in the rc file, you can use just:
grass7 --batch r.info map=elevation
But I'm not sure if it is a best practice.
I wrote the patch in the way that you don't get any additional output, just the output from the module, unless something unusual is happening (e.g., creation of a new location):
$ grass71 --mapset ~/grassdata/location/mapset --batch r.info map="elevation" -g north=228500 south=215000 east=645000 west=630000 nsres=10 ewres=10 rows=1350 cols=1500 cells=2025000 datatype=FCELL ncats=255
I tried to preserve the functionality of GRASS_BATCH_JOB
including the GRASS textual output and sanity checks.
When both GRASS_BATCH_JOB
and --batch
are provided --batch
is used and GRASS_BATCH_JOB
is ignored as Python documentation says: it is customary that command-line switches override environmental variables where there is a conflict (e.g. gcc
follows the same practice).
The names --mapset
and --batch
seemed to me at best choice, although there are other good options too such as --run
.
To test, try something like:
cat > test_script.sh <<EOF #!/bin/bash echo "Hello from GRASS GIS (`date`)" echo "This is what was called: $0 $@" EOF
grass7 --mapset ~/grassdata/location/mapset --batch test_script.sh some parameters
grass7 --mapset ~/grassdata/location/mapset --batch r.mapcalc "aaa = 5" grass7 --mapset ~/grassdata/location/mapset --batch r.info aaa
GUI works too, although I'm not sure if it is useful (could be even inconvenient for scripting).
grass7 --mapset ~/grassdata/location/mapset --batch r.info
From what I see now, the only issue with calling individual modules is that you cannot (or should not) parallelize the calls of grass
command in the same mapset.
Additional ideas
This is out of scope of this ticket but there is a potential to create one even more powerful interface similar let's say to git
.
mkdir some_project cd some_project # init connects to existing database, location and mapset or creates a new one # creates .grassrc (.rc or .gisrc) file current directory grass7 init ~/grassdata/location/mapset [-c | -c geofile | -c EPSG:code[:datum_trans]] grass7 import .../some_image.tiff grass7 run r.info some_image grass7 run r.mapcalc "improved_image = 5 * some_image" grass7 export improved_image .../improved_image.tiff # next time you can cd into some_project directory and commands will work right away # because .grassrc file will be already there
Some commands such as grass7 link
or grass7 external
might be quite useful, although they would be, similarly to grass7 import
and grass7 export
just appropriate r.in.gdal
, r.in.proj
, etc. calls.
It would be even more interesting to have:
grass7 run r.slope.aspect elevation=file://.../elevation.tiff aspect=file://...aspect.tiff
The grass
command would have to parse the command line, find the files which should be maps and link them. And perhaps if it wouldn't be grass7 run
but something different such as grass7 runonly
, we could even skip the .grassrc
and create location on the fly in /tmp
and delete it after execution. If data would be just linked, not imported and exported, it could be pretty fast. (But obviously we could be hitting issues with projection and topology here, so it is a bit tricky.)
Attachments (1)
Change History (17)
by , 10 years ago
Attachment: | batch_job_from_cmd_line.diff added |
---|
follow-up: 2 comment:1 by , 10 years ago
Why do you need the --mapset as in:
grass7 --mapset ~/grassdata/location/mapset --batch r.mapcalc "aaa = 5"
?
The current GRASS startupt script already allows to define a mapset at startup, so this seems redundant.
comment:2 by , 10 years ago
Replying to mlennert:
Why do you need the --mapset as in:
grass7 --mapset ~/grassdata/location/mapset --batch r.mapcalc "aaa = 5"
?
The current GRASS startupt script already allows to define a mapset at startup, so this seems redundant.
This is a thing I'm not sure about. The current syntax is
grass ~/grassdata/nc_spm_08_grass7/user1 grass -c ~/grassdata/nc_spm_08_grass7/user1
which follows the following general pattern
name options/flags files name [option]... [file]... name [option]... [file] [arg]...
where options/flags are distinguished by -
or --
and first thing which is not an option starts a file list. The last row actually describes python
and Rscript
:
python [option] ... [-c cmd | -m mod | file | -] [arg] ... Rscript [--options] [-e expr [-e expr2 ...] | file] [args]
So it seems that they actually leave out (equivalent of) --batch
. If we would allow not to specify db+l+mapset then one could use:
grass .../test_script.sh grass r.info grass r.mapcalc "aaa = 5"
which could be hard to distinguish from (current):
grass ~/grassdata/nc_spm_08_grass7/user1
We can also say that db+l+mapset is always required when passing module or script because the usual use case is when GRASS GIS is used as processing backend and in this case you rarely want to use db+l+mapset from rc file. This gives us:
grass ~/grassdata/nc_spm_08_grass7/user1 .../test_script.sh grass ~/grassdata/nc_spm_08_grass7/user1 r.info grass ~/grassdata/nc_spm_08_grass7/user1 r.mapcalc "aaa = 5"
In this case, I'm not sure how well we can distinguish different cases (standard/batch) when parsing and how we can provide good error messages.
Similar, but not the same, case is grep
. With grep
you always have to provide the PATTERN
parameter:
grep [OPTION]... PATTERN [FILE]...
In our case all parameters would be optional. The order is really important in this case and identification of options can become tricky, although hopefully not as much as with grep
(try to search for a string which starts with -
). If we decide for something like this, the command line parsing in grass.py
will have to be reimplemented, at least I think according to the code. For example, you have to recognize where the actual command starts because then everything else should not be used (trivial with --batch
, hard without).
follow-up: 4 comment:3 by , 10 years ago
This syntax would open many possibilities and make life much easier for accessing GRASS GIS from other languages (e.g. R).
I guess there is no chance that it can be included into GRASS 7?
follow-up: 7 comment:4 by , 10 years ago
Replying to rkrug:
I guess there is no chance that it can be included into GRASS 7?
I would say no for GRASS 7.0, the issue is focused on GRASS 7.1 I would say.
comment:6 by , 10 years ago
See also older ticket with the same idea #1660 (leaving this one open as the discussion is more developed here).
follow-up: 8 comment:7 by , 10 years ago
Replying to martinl:
Replying to rkrug:
I guess there is no chance that it can be included into GRASS 7?
I would say no for GRASS 7.0, the issue is focused on GRASS 7.1 I would say.
Right, the I've already set the milestone to 7.1.0.
Feedback for the command line syntax would be appreciated to move this forward and is necessary before including to trunk (which should be done and before the patch becomes incompatible).
follow-up: 10 comment:8 by , 10 years ago
Some remarks concerning the syntax:
1) I don't like the idea of having to repeat the mapset each time as it will require quite a bit of typing and possible errors in longer scripts. So the assumption to use the mapset which is in the rc file (i.e., if I am correct, the one used before), would be quite useful.
2) Instead of using the normal grass
command, I would suggest to introduce a new command(e.g. ggrassbatch
), which is taking only one parameter: the command to be executed including the parameter. So it could be used as
grassbatch r.info
3) The function grassbatch
should accept, in addition to the normal grass commands, one more command named e.g. set.mapset
which is only doing one thing, setting the mapset in the rc file, so this would be the mapset to be used for all following grassbatch
commands, unless the mapset is changed.
So a script could look as followed:
grassbatch set.mapset ~/grassdata/nc_spm_08_grass7/user1 grassbatch .../test_script.sh grassbatch r.info grassbatch r.mapcalc "aaa = 5"
comment:9 by , 10 years ago
Setting up the session manually is really challenging, setting up addons path is yet another step which should be done if you want have fully working session:
+ # add path to GRASS addons + home = os.path.expanduser("~") + os.environ['PATH'] += os.pathsep + os.path.join(home, '.grass7', 'addons', 'scripts')
Any other opinions about how the command line interface should look like? Use cases would be also appreciated.
comment:10 by , 10 years ago
Replying to rkrug:
Some remarks concerning the syntax:
1) I don't like the idea of having to repeat the mapset each time as it will require quite a bit of typing and possible errors in longer scripts. So the assumption to use the mapset which is in the rc file (i.e., if I am correct, the one used before), would be quite useful.
I have two use cases which were not mentioned. Testing framework which does not have particluar requirements on command line syntax but you run just one command/script, so you probably want to set Database/Location/Mapset
in one command. And then it is Docker where setting of the Database/Location/Mapset
in/with the command itself seems to be really important because you create a new instance with the command:
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 .../script.sh with parameters docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 g.gisenv docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 r.mapcalc "aaa = 5" docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 r.info map=aaa
or following my initial suggestion:
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch .../script.sh with parameters docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch g.gisenv docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch r.mapcalc "aaa = 5" docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch r.info map=aaa
or similarly to Docker:
docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 .../script.sh with parameters docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 g.gisenv docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 r.mapcalc "aaa = 5" docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 r.info map=aaa
where run
can be replaced by something different (exec
, batch
, do
, cmd
, script
) in order to provide nice syntax also for the alternative usage described see below. Note that --mapset
is not (does not have to be) mandatory, it can just use the last Mapset as stored in $HOME/.grass7/rc
.
Docker general run syntax is by the way:
docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]
Docker has the advantage that IMAGE
is mandatory while in GRASS GIS, Mapset
or command
would be required (none, one, or both are possible).
2) Instead of using the normal
grass
command, I would suggest to introduce a new command(e.g.ggrassbatch
), which is taking only one parameter: the command to be executed including the parameter. So it could be used as
grassbatch r.info
This would be similar to Rscript
command (as opposed to R CMD
syntax). However, the nested commands seems to be quite common (although now I can remember just revision control systems and docker). But basically grassbatch
and grass batch
are very similar.
3) The function
grassbatch
should accept, in addition to the normal grass commands, one more command named e.g.set.mapset
which is only doing one thing, setting the mapset in the rc file, so this would be the mapset to be used for all followinggrassbatch
commands, unless the mapset is changed.So a script could look as followed:
grassbatch set.mapset ~/grassdata/nc_spm_08_grass7/user1 grassbatch .../test_script.sh grassbatch r.info grassbatch r.mapcalc "aaa = 5"
This goes to my other suggestion (in description or in GSoC ideas), particularly a basic version of it. I think that there are good reasons to have both of them. Some use cases (Docker, testing framework, Cron jobs) push more for the single-command syntax, while user scripts and things which typically needs to import and export data (QGIS, WPS) would benefit more from this multi-command syntax. Where would you expect the current GISRC
file to be? The runtime one is now in "/tmp
" (/tmp/grass7-user-number/gisrc
) while the initial one (from where Database/Location/Mapset
is taken if not provided in command line) is in $HOME/.grass7/rc
. My idea was to have it in the current directory (or possibly also specified in command line), so that it does not interfere with the one in $HOME
which is the use case I expect.
If we spend some time figuring out this, I think we can save a lot of time in a long run on the support, see for example recent post on the grass-user mailing list (nabble link).
follow-up: 12 comment:11 by , 10 years ago
First version of executing command specified as parameter implemented in r65252. Please try:
grass71 ~/grassdata/nc_spm_08_grass7/user1/ exec g.region -p
If you are interested more, something like:
grass71 -c EPGS:4545 ~/grassdata/nc_spm_08_grass7/user1/ exec g.region -p
should work too. See commit message for details (there is no doc yet). Alternative syntax would be:
grass71 exec ~/grassdata/nc_spm_08_grass7/user1/ g.region -p grass71 exec -c EPSG:4545 ~/grassdata/nc_spm_08_grass7/user1/ g.region -p
which supposes that both mapset path and command are mandatory and appear together (and exec
is always the first one, mapset path is last from standard parameters). This is quite nice in relation to the other potential extension discussed in the this ticket.
However, the option I used seems to me a little bit better when considered alone. Perhaps I should change exec
to --exec
(I believe we should not use -exec
) and leave the "subcommand syntax" to their actual implementation. I used exec
rather than batch
(or even batch-job
) because the word batch is used in far too many different meanings.
It required some refactoring in r65238, r65241, r65246, r65247, r65248, r65250, r65251 and although I split it into small portions and tested what I could, the code is quite tricky and critical, so it would be to get it tested throughly (I would like to see some automated tests too but I had no idea how to do it).
GRASS_BATCH_JOB
is still supported and should be supported at least in the 7.x series (can be removed for 8 if not useful).
comment:12 by , 10 years ago
Replying to wenzeslaus:
First version of executing command specified as parameter implemented in r65252. Please try:
> grass71 ~/grassdata/nc_spm_08_grass7/user1/ exec g.region -p
Absolutely cool! Martin
comment:13 by , 10 years ago
comment:14 by , 10 years ago
In r65294 I changed exec
to --exec
. This follows general long flag syntax which is already partially used in grass.py
. I think it will be better to leave the "subcommand syntax" for the actual implementation of it. So now we have:
grass71 ~/grassdata/nc_spm_08_grass7/user1/ --exec g.region -p
The only alternative would be to go with majority there and use short flags, for example -x
(-e
is already taken for exit after creating mapset). But I'm not adding it because it should be primary used in script and programming and there you should use long flags anyway.
It would be possible to go just with the short flags, as we do now with -c
, -e
and -f
but this would go against the general trend of readability and I expect that we might change it in the future as we already support --version
, --config
and also all GUI selection flags can be used with two dashes.
comment:15 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Parsing of parameters should be refactored and improved and behavior when both GRASS_BATCH_JOB
and --exec
are used should be revised but the goal of this ticket was fulfilled. Closing as fixed. This shouldn't be backported to 7.0 branch (or at lease not any time soon) because it makes a lot of changes to a crucial code.
Two other major issues were discussed together with this topic and these are direct execution with imports (grass run r.slope.aspect elevation=file:///path/to/file.tiff...
) and subcommand interface (grass start .../mapset; grass run r.slope.aspect...;
). The later should be easy to implement now. The hardest part in implementation is to read the "gisrc" file from current directory and decide the name for the run
subcommand considering newly added --exec
and potential direct execution with imports which would probably need another subcommand or parameter. How to call individual interfaces (including this one) should be revised as well.
There is an unresolved issue with code duplication between lib/init/grass.py
and lib/python/script/setup.py
. There is some code duplication now but script.setup(.init)
would benefit from some code in grass.py
which would even increase the duplication. The question is if we can safely import grass.script.setup
during startup in grass.py
.
Finally, to make this --exec
interface really convenient (and script universal), it would be nice to ensure that on system there is grass
available on path. This is true for Linux distributions but it is not true for MS Windows. Perhaps it also needs to be documented more including distinction between grass
, grass7
and grass71
.
None of the issues above has currently its ticket.
Direct execution with imports GSoC idea:
Subcommand interface mailing list discussions:
- Announcing new command line interface of `grass` program in trunk (Vaclav)
- Announcing new command line interface of `grass` program in trunk (Rainer)
- Announcing new command line interface of `grass` program in trunk (Vaclav)
- Announcing new command line interface of `grass` program in trunk (Rainer)
- Announcing new command line interface of `grass` program in trunk (Pietro)
- R interface to grass 7.1
- On GSoC Proposal "New easy-to-use command line interface for GRASS GIS" (Moritz)
- QGIS Processing & GRASS
Related tickets:
First implementation of batch job which could be specified from command line