Opened 8 years ago

Closed 6 years ago

#2873 closed enhancement (fixed)

Simplify usage of GRASS in Python from outside

Reported by: wenzeslaus Owned by: grass-dev@…
Priority: major Milestone: 7.4.2
Component: Python Version: svn-trunk
Keywords: startup, installation, scripts, interpreter, windows installer, pygrass, temporal, bootstrap, boilerplate Cc:
CPU: Unspecified Platform: All

Description

To use GRASS GIS functionality in Python from outside of a GRASS session, i.e. without starting GRASS GIS explicitly and running a script (or an actual module) in the session, one needs to include approximately 50 lines as described in the grass.script.setup manual (or in the lengthy related wiki page).

Ideally, one would just do import and then one or two lines of initialization, for example:

import grass.script as gscript
rcfile = gscript.init_data("~/grassdata", "nc_spm", "user1")
do_what_ever_with_grass()
os.remove(rcfile)

Suggestion

The attached code is a prototype implementation of the Python part which would allow something like this. The code can only work if the following variables are set:

Error: Failed to load processor bash
No macro or processor named 'bash' found

The GRASS_EXECUTABLE does not have to be set if grass is on the path or, in theory, if it is in on some standard path (e.g. C:\Program Files (x86)\GRASS GIS 7.0.0\grass70.bat on MS Windows). However, dynamic library path must always be set ahead (as described in #2424) for ctypes to work (both PyGRASS and temporal depends on ctypes). The path to Python packages must be set ahead as well if you want to use the initialization functions from the package.

Making it simple

The three lines above might be good enough on Linux where you just dump it to command line or .bashrc but for MS Windows users it is too complicated. The QGIS project also considers this too much work (PyQGIS bootstrap is complicated).

GRASS Python packages could go to the system packages directory, so that we avoid the need for setting PYTHONPATH. This might work well on MS Windows when usage of system Python is implemented as described in #2333.

On Linux, LD_LIBRARY_PATH can be avoided if the libraries are installed into the system path. On MS Windows, putting more things on PATH is standard procedure from what I have seen. Mac OS X, DYLD_LIBRARY_PATH can't be used anyway since El Capitan.

GRASS GIS executable should be on path on all platforms in the same way as it is on path in Linux. This is maybe not standard on MS Windows but at the end this is what users want (they want GRASS to be available right away).

Challenges

When putting dynamic libraries and Python packages directly into system paths, installing more than one GRASS version becomes more complicated. However, that's OK because only advanced users would have more than one version, so the hard work of making it work (perhaps just not using the default settings in the installer) will be on them. Beginners will likely have just one version. The exception might be on MS Windows where it is possible that beginner has standalone GRASS GIS, the one from QGIS and one from OSGeo4W.

GRASS Python packages are not prepared to be imported when GISBASE is not set and may require even more. We would need to change the code to not require anything from GRASS session at import time. So far, I needed to add lazy initialization for the translate function (underscore) to be able to import grass.script.core (patch attached).

There is already some duplication between grass.script.setup and grass.py executable. To create a full session (e.g. Mapset locking) we would need even more duplication. We could move some parts from grass.py to grass.script.setup if we are sure that we can import the right grass.script.setup during the initialization phase (we already rely on in when creating a Location).

Attachments (5)

lazy_gettext.patch (1.1 KB ) - added by wenzeslaus 8 years ago.
Patch for the lazy initialization of the underscore function (does not require GISBASE at import time)
run_grass.py (4.4 KB ) - added by wenzeslaus 8 years ago.
First prototype of API for simplified GRASS startup of standalone scripts
lib_python_script_init_data.patch (6.7 KB ) - added by wenzeslaus 8 years ago.
Second prototype of API for simplified GRASS startup of standalone scripts as a patch for lib/python/script
test_no_session.py (611 bytes ) - added by wenzeslaus 8 years ago.
Test code for thge second prototype of API
grass_session_package.patch (27.9 KB ) - added by wenzeslaus 8 years ago.
Third prototype with a separate package, location creation functionality and some documentation (patch for lib/python directory)

Download all attachments as: .zip

Change History (21)

by wenzeslaus, 8 years ago

Attachment: lazy_gettext.patch added

Patch for the lazy initialization of the underscore function (does not require GISBASE at import time)

by wenzeslaus, 8 years ago

Attachment: run_grass.py added

First prototype of API for simplified GRASS startup of standalone scripts

in reply to:  description ; comment:1 by hellik, 8 years ago

Replying to wenzeslaus:

To use GRASS GIS functionality in Python from outside of a GRASS session, i.e. without starting GRASS GIS explicitly and running a script (or an actual module) in the session, one needs to include approximately 50 lines as described in the grass.script.setup manual (or in the lengthy related wiki page).

Ideally, one would just do import and then one or two lines of initialization, for example:

import grass.script as gscript
rcfile = gscript.init_data("~/grassdata", "nc_spm", "user1")
do_what_ever_with_grass()
os.remove(rcfile)

Suggestion

The attached code is a prototype implementation of the Python part which would allow something like this. The code can only work if the following variables are set:

Error: Failed to load processor bash
No macro or processor named 'bash' found

The GRASS_EXECUTABLE does not have to be set if grass is on the path or, in theory, if it is in on some standard path (e.g. C:\Program Files (x86)\GRASS GIS 7.0.0\grass70.bat on MS Windows). However, dynamic library path must always be set ahead (as described in #2424) for ctypes to work (both PyGRASS and temporal depends on ctypes). The path to Python packages must be set ahead as well if you want to use the initialization functions from the package.

Making it simple

The three lines above might be good enough on Linux where you just dump it to command line or .bashrc but for MS Windows users it is too complicated. The QGIS project also considers this too much work (PyQGIS bootstrap is complicated).

GRASS Python packages could go to the system packages directory, so that we avoid the need for setting PYTHONPATH. This might work well on MS Windows when usage of system Python is implemented as described in #2333.

there is no system Python in windows. users always has to install python manually systemwide, possibly interfering with other python installations installed by other software.

On Linux, LD_LIBRARY_PATH can be avoided if the libraries are installed into the system path. On MS Windows, putting more things on PATH is standard procedure from what I have seen.

[...] Mac OS X, DYLD_LIBRARY_PATH can't be used anyway since El Capitan.

GRASS GIS executable should be on path on all platforms in the same way as it is on path in Linux. This is maybe not standard on MS Windows but at the end this is what users want (they want GRASS to be available right away).

IMHO it is not a good practice to put everything in %PATH% in windows. poisoning %PATH% should be avoided IMHO.

in reply to:  1 comment:2 by wenzeslaus, 8 years ago

Replying to hellik:

Replying to wenzeslaus:

GRASS Python packages could go to the system packages directory, so that we avoid the need for setting PYTHONPATH. This might work well on MS Windows when usage of system Python is implemented as described in #2333.

there is no system Python in windows. users always has to install python manually systemwide, possibly interfering with other python installations installed by other software.

By system Python I mostly mean what #2333 is talking about. Possible interference is the inherent issue of Windows operating system. I'm OK with including multiple options in the installer giving the choice to the user (with the last option being "Don't know what to choose? Install Ubuntu and let the package managers solve it for you." ;-).

On MS Windows, putting more things on PATH is standard procedure from what I have seen... GRASS GIS executable should be on path on all platforms in the same way as it is on path in Linux. This is maybe not standard on MS Windows but at the end this is what users want (they want GRASS to be available right away).

IMHO it is not a good practice to put everything in %PATH% in windows. poisoning %PATH% should be avoided IMHO.

I'm not sure if we can do it. Is there a another way how to set path to dynamic libraries for a process (so that Python scripts using GRASS ctypes work)?

comment:3 by wenzeslaus, 8 years ago

See also Glynn's comment in ticket 580 speaking about "fixing the installation process to make GRASS 'sessions' an optional feature".

by wenzeslaus, 8 years ago

Second prototype of API for simplified GRASS startup of standalone scripts as a patch for lib/python/script

by wenzeslaus, 8 years ago

Attachment: test_no_session.py added

Test code for thge second prototype of API

comment:4 by wenzeslaus, 8 years ago

I uploaded a second prototype of the API as a patch. Before executing the code outside of GRASS GIS, the following is needed:

Error: Failed to load processor bash
No macro or processor named 'bash' found

The minimal script is:

import grass.script as gscript
import grass.script.setup as gsetup
session = gsetup.init_data("~/grassdata", "nc_spm", "user1")
# code goes here, e.g. gscript.run_command(...)
session.close()

The names and the overall API is not final, nor is the implementation and it might be nicer to have one import instead two. However, I think the basic structure is right. Please comment.

in reply to:  4 comment:5 by wenzeslaus, 8 years ago

Replying to wenzeslaus:

...it might be nicer to have one import instead two.

On the other hand, separating the stuff into its own module might have some advantages. The API could look like:

import grass.script as gscript
import grass.session as gsession
session = gsession.create_session("~/grassdata", "nc_spm", "user1")
gscript.run_command(...)
session.close()

And in future perhaps allow:

session_a = gsession.create_session("~/grassdata", "nc_spm", "user1")
session_b = gsession.create_session("~/grassdata", "nc_spf", "PERMANENT")
session_a.run_command(...)
session_b.run_command(...)
session_b.close()
session_a.close()

Now uploading a third prototype which is in the separate package, you can do something like:

session = gsession.init_data(location="test_xy", mapset="test1", geostring='XY')
# ...
session.close()

Code is still messy and naming is not final but should work. Some documentation provided. Please test.

by wenzeslaus, 8 years ago

Attachment: grass_session_package.patch added

Third prototype with a separate package, location creation functionality and some documentation (patch for lib/python directory)

in reply to:  description ; comment:6 by glynn, 8 years ago

Replying to wenzeslaus:

The GRASS_EXECUTABLE does not have to be set if grass is on the path or, in theory, if it is in on some standard path

IMHO, this "solution" is yet more duct tape on top of the existing heap. Any real solution would start by simply deleting the GRASS startup script then figuring out what needs to be done to make everything still work without it.

It should not be necessary to "start" GRASS.

On Unix, GRASS modules and libraries should be installed in system directories. Python packages should go in Python's site-packages directory. Environment variables should be set in /etc/profile (or whatever mechanism the distribution uses, e.g. /etc/profile.d).

On Windows, configuration settings should probably be stored in the registry.

GISRC should have a system-wide default setting such as $(HOME)/.grass/rc. Users who never need more than one session at a time can just use that file always, changing the database, location or mapset with g.mapset.

comment:7 by pmav99, 8 years ago

I think that the scope of this proposal is very wide. IMHO importing dynamic libraries in a cross platform way and providing an official API are different issues.

WRT to providing an official API for working with GRASS Locations / Mapsets I believe that the proper python idiom is to use a context manager. In other words, the user should not have to do anything E.g.:

import grass.some_namespace.GrassSession

with GrassSession("/path/to/gisdb/location/mapset"):
    # work with the specified Location/Mapset

This could be expanded to creating temporary Locations / Mapsets:

import grass.some_namespace.GrassSession

# not cleaning up might make sense when you debug a script.
with GrassSession.temporary(cleanup=False):
    # create a temporary location/mapset and optionally clean up when exiting the context

or even creating new Locations / Mapsets:

import grass.some_namespace.GrassSession

with GrassSession.create_from_epsg(mapset_path, epsg):
    # create a new location/mapset 

with GrassSession.create_from_geofile(mapset_path, geofile_path):
    # create a new location/mapset
Version 1, edited 8 years ago by pmav99 (previous) (next) (diff)

in reply to:  6 comment:8 by wenzeslaus, 8 years ago

Replying to glynn:

Replying to wenzeslaus:

The GRASS_EXECUTABLE does not have to be set if grass is on the path or, in theory, if it is in on some standard path

IMHO, this "solution" is yet more duct tape on top of the existing heap.

I'll try to prepare some patch without this workaround which will expect grass executable, Python pakcages and libraries already on path. The API is the easy part here I guess.

Any real solution would start by simply deleting the GRASS startup script then figuring out what needs to be done to make everything still work without it.

It should not be necessary to "start" GRASS.

Back when you suggested that in #580 it seemed strange to me, but now I think it would be much better than the current situation.

On Unix, GRASS modules and libraries should be installed in system directories. Python packages should go in Python's site-packages directory. Environment variables should be set in /etc/profile (or whatever mechanism the distribution uses, e.g. /etc/profile.d).

I can see this working for dynamic libraries and Python packages and I think this would be good enough for now, but how this would work for modules? For example, I have 141 PCL tools installed (pcl_*) but we have >500 modules plus addons which actually have to be on a separate site. It would be good to get opinions from some packagers.

Anyway, dynamic libraries and Python packages in system paths are place to start. Can the compile/install process be set to this now?

GISRC should have a system-wide default setting such as $(HOME)/.grass/rc. Users who never need more than one session at a time can just use that file always, changing the database, location or mapset with g.mapset.

For me this is different because it is related to the data being used and I'm not so convinced about it in comparison to the need for ready to use runtime environment. I often work in more then one Mapset and I actually use Mapset locks to see where I have already open sessions. However, I can see that for many users permanent connection to given Mapset plus a additional sessions (GISRCs) upon request (starting GRASS application or API in some special way) might work, although particular details matter a lot here.

comment:9 by neteler, 8 years ago

Milestone: 7.1.07.2.0

Milestone renamed

comment:10 by wenzeslaus, 8 years ago

Support for scripting GRASS GIS in Ruby (github.com/jgoizueta/grassgis) is something to take some inspiration from when creating something like a Session class:

GrassGis.session configuration do
  g.list 'vect'
  puts output # will print list of vector maps
end

comment:11 by martinl, 8 years ago

Milestone: 7.2.07.3.0

comment:12 by martinl, 8 years ago

Milestone: 7.3.07.4.0

Milestone renamed

comment:13 by neteler, 6 years ago

Milestone: 7.4.07.4.1

Ticket retargeted after milestone closed

comment:14 by neteler, 6 years ago

Milestone: 7.4.17.4.2

comment:15 by neteler, 6 years ago

Meanwhile pip install grass-session is available (source: https://github.com/zarch/grass-session), can the ticket be closed?

comment:16 by wenzeslaus, 6 years ago

Resolution: fixed
Status: newclosed

Now we have pip install grass-session for Python (and grass ... --exec in command line). New suggestions and requests would need to specify the relation to these and their evaluation. A new ticket would be more appropriate. Closing as fixed since there is pip install grass-session.

Trac management: bash processor for Trac which worked at one point seems to be missing now (Error: Failed to load processor bash)

Note: See TracTickets for help on using tickets.