Opened 5 years ago

Closed 5 years ago

#2134 closed enhancement (wontfix)

Create a general exit-safe interface to C libraries

Reported by: wenzeslaus Owned by: grass-dev@…
Priority: normal Milestone: 7.0.0
Component: Python ctypes Version: svn-trunk
Keywords: G_fatal_error, exit, multiprocessing Cc:
CPU: Unspecified Platform: All

Description

After discussion with Soeren, I'm opening a ticket with a request for a general exit-safe interface to C libraries.

The purpose of this is (citing Soeren)

to reduce the direct ctypes usage in the temporal framework, so that GUI elements can easily be implemented on top of it, without the risk to be exited by a G_fatal_error() call. But i don't want to use the script interface to avoid its module execution and parsing overhead, which slows things massively down.

Moreover, it protects the main application in case of segmentation faults.

The r58201 (#2127) and r58249 revisions provide Messenger (source:grass/trunk/lib/python/pygrass/messages/__init__.py) and CLibrariesInterface (source:grass/trunk/lib/python/temporal/c_libraries_interface.py) classes which implements the idea of server (a separate process) executing ctypes (C library) functions (remote procedure call (RPC)). One is PyGRASS interface for GRASS messages the other is the interface for GRASS C functions needed by temporal framework.

One issue I see is that we need to implement such a class for each use case. One for PyGRASS messages, other maybe for PyGRASS itself, for temporal library, for GUI vector digitizer, for scatter plot... Is there a way to create a general interface, or generated interface or make the implementation simple?

The other issue is with implementation itself. Each class use a bit different implementation to call functions. However, generally, some identifiers are mapped to the ctypes functions. One issue is that it slows down the function call. The other is that it is tedious (and error prone) to implement (you need to create the mapping). And finally what is always judged in Python code: it is not Pythonic (whatever that means, e.g. nice, intuitive, straightforward). This of course highly relates to the first issue.

Related documentation currently accesible from:

Attachments (3)

c_library_interface.py (19.0 KB) - added by huhabla 5 years ago.
Fast and exit-safe interface to call Python functions that wrap GRASS C-functions
pygrass_rpc_interface.py (6.4 KB) - added by huhabla 5 years ago.
PyGRASS RPC interface
pygrass_raster.diff (2.8 KB) - added by huhabla 5 years ago.
PyGRASS patch to remove pointer from info object

Download all attachments as: .zip

Change History (21)

comment:1 Changed 5 years ago by huhabla

I have implemented a new RPC server that can call any kind of Python function directly in the server subprocess. The server subprocess will restart in case the provided Python function was terminated (segfault, exception, G_fatal_error, ...).

The Python file with the new RPC design is attached. Please have a look at the doctest examples that demonstrate how to use the RPCServer.

The RPC server supports two methods to call remote functions call(function, args) and call_no_wait(function, args). The call() function will always wait for the server to commit the return value of the called Python function although the function may have no return value. The call_no_wait() function will not wait for the Python function to finish and therefor not return any return value of the called Python function.

Example:

def info(args):
    message = args[0]
    libgis.G_message(message)

def raster_map_exists(args)
    name = args[0]
    mapset = args[1]
    mapset = libgis.G_find_raster(name, mapset)

    if mapset:
        return True

    return False

rpc = RPCServer()
rpc.call_no_wait(function=info, args=("This is a message",))

ret = rpc.call(function=raster_map_exists, args=("test", "PERMANENT"))

I am still not sure about the best Python caller function design. Should the error return values of GRASS C-library functions be returned, or should an exception be raised?

Any suggestions are welcome.

comment:2 Changed 5 years ago by wenzeslaus

Thanks Soeren, this is what I hoped for and I'm looking forward how this will continue. I just quickly looked at this topic for now. So, just some small notes now.

I needed to do this changes to get it working:

@@ -19,12 +19,13 @@
 from multiprocessing import Process, Lock, Pipe
 import logging
 from ctypes import *
-from core import *
+from grass.script.core import *
 import grass.lib.gis as libgis
 import grass.lib.raster as libraster
 import grass.lib.vector as libvector
 import grass.lib.date as libdate
 import grass.lib.raster3d as libraster3d
+from datetime import datetime

I though I understand how it works with exceptions and doctest but apparently not, even I tried verbose mode:

python -m doctest -v c_library_interface.py

Anyway test passes. Would it be possible to write a test also for the case of waiting, killing and restarting the server? I mean the case when I run something with call_no_wait(function=aaa), it runs for a long time, meanwhile I call call(function=bbb) but then bbb fails and kills the server. Does this make sense?

About the interface. I'm not sure about how to report errors. The functions might need to accept keyword arguments in some way. How to deal with objects, e.g. how to use pyGRASS through this?

I was not thinking about how to deal with wrapping all C functions. But maybe manual wrapping it is inevitable if we want really Python(ic) interface. We already have pyGRASS for this reason.

The last not is about naming. Although it is a well established practice in GRASS that things are named with two or more different names, I would suggest to use only one name for newly created things. We are at the beginning, we can always rename but it would be safer to use one name at time. *C*Interface is a nice name (although it it might by also *Python*Interface) but it is more for the particular implementation (interface for temporal framework or to messages) than for the general thing (server). RPCServer is very general, not sure about that. What about CFunctionCaller? I also heard suggestion NoGFatal_Huray!!! from someone here but that's not a valid Python identifier, although, with unicode identifiers, there might be a way. No strong opinion here.

comment:3 Changed 5 years ago by wenzeslaus

I forgot to link related a note on Trac wiki wiki:wxGUIDevelopment/VDigit where I'm proposing the factory pattern as a way of creation of objects which are an object-oriented interface to C vector modules but in fact are connected to the RPCServer by that factory, so user does does not have to pass the RPCServer to every object (but he or she has to use the factory).

comment:4 in reply to:  description ; Changed 5 years ago by zarch

Replying to wenzeslaus:

The r58201 (#2127) and r58249 revisions provide Messenger (source:grass/trunk/lib/python/pygrass/messages/__init__.py) and CLibrariesInterface (source:grass/trunk/lib/python/temporal/c_libraries_interface.py) classes which implements the idea of server (a separate process) executing ctypes (C library) functions (remote procedure call (RPC)). One is PyGRASS interface for GRASS messages the other is the interface for GRASS C functions needed by temporal framework.

One issue I see is that we need to implement such a class for each use case. One for PyGRASS messages, other maybe for PyGRASS itself, for temporal library, for GUI vector digitizer, for scatter plot... Is there a way to create a general interface, or generated interface or make the implementation simple?

I think that a is great to have a RPC integrated in GRASS, and could be useful for many reasons, but personally, I think that to solve the problem of the G_fatal, we should handling directly the C signals. Of course it will not work in case of occurring of a Segmentation Fault, and in this sense I think that we need a RPC approach, but speaking about pygrass, that it is a lower API, I think that would be great to be able to manage the C signals.

We already discuss about this topic (#1646), and I don't know If I should discuss this here or in the previous ticket, any way, maybe we should consider to integrate in the GRASS source a fault handler, I found this project [0] that maybe could be integrated in GRASS, I believe there is some issue with the License [1] that is under "BSD (2-clause)"[3], so I'm not sure it is possible to copy directly inside GRASS, but maybe we can write something from scratch with the same logic.

Or as already suggested by Glynn (#1646) wrap the G_add_error_handler, maybe given a more abstract and easy to use interface, and possibly that avoid to expose the users to ctypes. Do you think that would be feasible?

Pietro

[0] https://github.com/haypo/faulthandler [1] https://github.com/haypo/faulthandler/blob/master/COPYING [3] http://opensource.org/licenses/BSD-2-Clause

comment:5 in reply to:  4 Changed 5 years ago by zarch

Replying to zarch:

I found this project [0] that maybe could be integrated in GRASS, I believe there is some issue with the License [1] that is under "BSD (2-clause)"[3], so I'm not sure it is possible to copy directly inside GRASS, but maybe we can write something from scratch with the same logic.

Sorry for the noise, but I found out that is already integrated in the python standard library, so maybe we should just understand how to use it.

http://docs.python.org/dev/library/faulthandler.html

Pietro

comment:6 in reply to:  4 Changed 5 years ago by glynn

Replying to zarch:

Or as already suggested by Glynn (#1646) wrap the G_add_error_handler,

Using an error handler allows you to avoid process termination. But once a fatal error has occurred, you cannot safely call any GRASS function; doing so may well result in a segfault.

comment:7 Changed 5 years ago by huhabla

In my opinion the RPC approach is only meaningful for persistent applications that need fast access to C-library functions, or that need low level API access for data modification (like digitizing).

My intention to write the RPC server was to make the temporal framework usable in persistent applications and to be as fast as possible.

Extending the PyGRASS framework with signal handling will not solve the problem: that memory structures may be corrupted in case of a G_fatal_error() call or a segfault, leading to incorrect computation results or to segmentation faults in the persistent application.

Implementation of persistent applications that use the GRASS C-library functions in their process memory is IMHO not possible. We have had this discussion for years.

The PyGRASS interface is well designed for module programming not for persistent applications. Otherwise each C-function call should be handled via RPC. From my point of view and some tests that i made slows the RPC approach the processing significantly down.

Example:

from multiprocessing import Process, Pipe, Lock
import time

num = 50000

def func(a):
    b = a + 1
    c = b + 1
    d = c + b
    return d

# DICT
def dict_func():
    a = {}
    for i in xrange(num):
        a["%i%i%i%i"%(i,i,i,i)] = func

    start = time.time()

    for i in xrange(num):
        a["%i%i%i%i"%(i,i,i,i)](5)

    end = time.time()

    print end - start


def target(lock, conn):
    a = [func]*num
    while True:
        # Avoid busy waiting
        conn.poll(4)
        data = conn.recv()
        lock.acquire()
        ret = a[data[0]](data[1])
        conn.send(ret)
        lock.release()

def rpc_func():
    client_conn, server_conn = Pipe()
    lock = Lock()
    server = Process(target=target, args=(lock, server_conn))
    server.daemon = True

    server.start()
    

    start = time.time()
    for i in xrange(num):
        if server.is_alive() is True:
            data = [i, 5]   
            client_conn.send(data)
            client_conn.recv()

    end = time.time()

    print end - start



if __name__ == "__main__":
    # Measuring the time to call of 50000 functions that are
    # registered in a dictionary
    dict_func()    
    # Measuring the time to call 50000 functions 
    # using an RPC approach
    rpc_func()

Running the script will show that calling functions from a dict is 50 time faster than using a pipe with a subprocess.

python rpc_test.py 
0.0984511375427
4.59985399246

comment:8 in reply to:  7 Changed 5 years ago by zarch

Replying to huhabla:

[big cut]

Thank you Soeren for the clarification and to share your knowledge, I didn't understood the problem, sorry for the noise.

Pietro

comment:9 in reply to:  7 ; Changed 5 years ago by wenzeslaus

Replying to glynn:

Replying to zarch:

Or as already suggested by Glynn (#1646) wrap the G_add_error_handler,

Using an error handler allows you to avoid process termination. But once a fatal error has occurred, you cannot safely call any GRASS function; doing so may well result in a segfault.

One of the issues which G_add_error_handler is trying to solve is to provide meaningful error message to the user. For example, failing to open some temporary file causes exit with "No such file /tmp/kjewbf8d38dj". This does not help user nor programmer to understand that the error occurred (when does this happened, what is the stack trace, what are consequences and what are suggestions to solve it). In other words, sometimes the message provided by G_fatal_error caller is too low level.

Python RPCServer with wrapper functions throwing exceptions would help to solve this issue. But it seems to me that #1646 remains valid for pyGRASS (and possibly others) and C code itself.

Replying to huhabla:

In my opinion the RPC approach is only meaningful for persistent applications that need fast access to C-library functions, or that need low level API access for data modification (like digitizing).

And this is not only vector and raster digitizing, this is also new scatter plot tool and in fact the whole g.gui.iclass, nviz (which is unfortunately more complicated) and of course everything temporal-related (everything started to be temporal-related).

My intention to write the RPC server was to make the temporal framework usable in persistent applications and to be as fast as possible.

I'm not sure how the speed of RPCServer compares to module call but the speed is not the only advantage. Fine control of what is called and a smoother interface (possibly, depending on wrappers) is the other advantage. Calling subprocess from GUI for every single task and parsing its output is cumbersome.

The pyGRASS interface is well designed for module programming not for persistent applications. Otherwise each C-function call should be handled via RPC. From my point of view and some tests that i made slows the RPC approach the processing significantly down.

So, in the next step, we need pyGRASS-like interface which is fail safe and temporal library which is faster?

Running the script will show that calling functions from a dict is 50 time faster than using a pipe with a subprocess.

It seems that this is something we need to take care of. And this is something what my factory pattern suggestions is trying to address.

We would need to create two sets of class with identical interface. One using RPCServer (safe) the other calling ctypes directly (fast). Objects should be created by factory, so that the factory will put the RPCServer into the objects, so the user does not take care of it. Maybe in Python we can go beyond the classic factory pattern and create also the RPCServer-dependent classes from the classes calling ctypes directly.

I realize that some code would be appreciated but I cannot dive into this more now.

comment:10 in reply to:  9 ; Changed 5 years ago by huhabla

Replying to wenzeslaus:

Replying to glynn:

Replying to zarch:

Or as already suggested by Glynn (#1646) wrap the G_add_error_handler,

Using an error handler allows you to avoid process termination. But once a fatal error has occurred, you cannot safely call any GRASS function; doing so may well result in a segfault.

One of the issues which G_add_error_handler is trying to solve is to provide meaningful error message to the user. For example, failing to open some temporary file causes exit with "No such file /tmp/kjewbf8d38dj". This does not help user nor programmer to understand that the error occurred (when does this happened, what is the stack trace, what are consequences and what are suggestions to solve it). In other words, sometimes the message provided by G_fatal_error caller is too low level.

Python RPCServer with wrapper functions throwing exceptions would help to solve this issue. But it seems to me that #1646 remains valid for pyGRASS (and possibly others) and C code itself.

I have re-designed the RPC interface, now the Python function wrapper will return an exception and the result of the function calls, so that the RPC server interface that provides the call() functions can raise these exceptions (exceptions raised in the subprocess will kill the subprocess and will not be catch'd in the parent process). Hence, the Python wrapper functions transform the C-function return values into meaningful exceptions that will be raised in the parent process.

While re-designing i concluded that a no wait function call call_no_wait() is not meaningful when mixed with calls that wait to receive data. There is only a limited number of C-functions that do not return values or return states. It is better to wait for a function call to finish, than risking a race condition in case a fatal error occur'd meanwhile. An exception is the messaging interface, which should stay as is.

However, maybe two RPC interfaces are meaningful: one that waits for functions to return (expecting return values including exceptions) and one that does not wait?

Replying to huhabla:

In my opinion the RPC approach is only meaningful for persistent applications that need fast access to C-library functions, or that need low level API access for data modification (like digitizing).

And this is not only vector and raster digitizing, this is also new scatter plot tool and in fact the whole g.gui.iclass, nviz (which is unfortunately more complicated) and of course everything temporal-related (everything started to be temporal-related).

My intention to write the RPC server was to make the temporal framework usable in persistent applications and to be as fast as possible.

I'm not sure how the speed of RPCServer compares to module call but the speed is not the only advantage. Fine control of what is called and a smoother interface (possibly, depending on wrappers) is the other advantage. Calling subprocess from GUI for every single task and parsing its output is cumbersome.

I have added benchmark runs to the rpc server script, to get an idea what the performance loss and gain of the RPC interface is:

GRASS 7.0.svn (Test XY):~ > python c_library_interface.py 
##############################################################
TESTS
ERROR: A fatal error
WARNING:root:Needed to restart the rpc server
ERROR: A fatal error
WARNING:root:Needed to restart the rpc server
##############################################################
##############################################################
Raster map exists benchmark
Time to call 1000 functions directly: 0.017043s
Time to call 1000 functions via RPC: 0.178600s
Time to perform 1000 g.findfile module runs: 30.343877s
##############################################################
##############################################################
Raster map info benchmark
Time to call 10000 functions directly: 0.856104s
Time to call 10000 functions via RPC: 7.189188s
Time to perform 10000 r.info module runs: 120.261683s
##############################################################

As you can see the RPC interface is for the two tested functions about 10 times slower then the direct Python function calls that wrap the GRASS C-functions. But the RPC interface is about 17 to 600 times faster then using the grass.script interface that calls GRASS modules (g.findfile and r.info).

The pyGRASS interface is well designed for module programming not for persistent applications. Otherwise each C-function call should be handled via RPC. From my point of view and some tests that i made slows the RPC approach the processing significantly down.

So, in the next step, we need pyGRASS-like interface which is fail safe and temporal library which is faster?

So you want to wrap all C-function calls in PyGRASS to be wrapped using the RPC interface?

Running the script will show that calling functions from a dict is 50 time faster than using a pipe with a subprocess.

It seems that this is something we need to take care of. And this is something what my factory pattern suggestions is trying to address.

We would need to create two sets of class with identical interface. One using RPCServer (safe) the other calling ctypes directly (fast). Objects should be created by factory, so that the factory will put the RPCServer into the objects, so the user does not take care of it. Maybe in Python we can go beyond the classic factory pattern and create also the RPCServer-dependent classes from the classes calling ctypes directly.

I realize that some code would be appreciated but I cannot dive into this more now.

I am not sure if i understand your approach, so code examples would be very helpful here. :)

Changed 5 years ago by huhabla

Attachment: c_library_interface.py added

Fast and exit-safe interface to call Python functions that wrap GRASS C-functions

comment:11 in reply to:  2 Changed 5 years ago by huhabla

Replying to wenzeslaus:

Thanks Soeren, this is what I hoped for and I'm looking forward how this will continue. I just quickly looked at this topic for now. So, just some small notes now.

I needed to do this changes to get it working:

@@ -19,12 +19,13 @@
 from multiprocessing import Process, Lock, Pipe
 import logging
 from ctypes import *
-from core import *
+from grass.script.core import *
 import grass.lib.gis as libgis
 import grass.lib.raster as libraster
 import grass.lib.vector as libvector
 import grass.lib.date as libdate
 import grass.lib.raster3d as libraster3d
+from datetime import datetime

Thanks for the fix, the new version should work now in any directory.

I though I understand how it works with exceptions and doctest but apparently not, even I tried verbose mode:

python -m doctest -v c_library_interface.py

Please try:

python c_library_interface.py -v

Anyway test passes. Would it be possible to write a test also for the case of waiting, killing and restarting the server? I mean the case when I run something with call_no_wait(function=aaa), it runs for a long time, meanwhile I call call(function=bbb) but then bbb fails and kills the server. Does this make sense?

Yes it does, but i was not able to fix the deadlock that appeared when mixing "waiting calls" with "no waiting calls" and a fatal error killed the subprocess. So i removed the no wait calls from the server interface.

About the interface. I'm not sure about how to report errors. The functions might need to accept keyword arguments in some way. How to deal with objects, e.g. how to use pyGRASS through this?

In the latest implementation the subprocess creates an exception that is send through the pipe to the server and is raised there. If an object is picklable then it can be send between processes.

I was not thinking about how to deal with wrapping all C functions. But maybe manual wrapping it is inevitable if we want really Python(ic) interface. We already have pyGRASS for this reason.

The last not is about naming. Although it is a well established practice in GRASS that things are named with two or more different names, I would suggest to use only one name for newly created things. We are at the beginning, we can always rename but it would be safer to use one name at time. *C*Interface is a nice name (although it it might by also *Python*Interface) but it is more for the particular implementation (interface for temporal framework or to messages) than for the general thing (server). RPCServer is very general, not sure about that. What about CFunctionCaller? I also heard suggestion NoGFatal_Huray!!! from someone here but that's not a valid Python identifier, although, with unicode identifiers, there might be a way. No strong opinion here.

The latest RPC server incarnation does not care what kind of functions it should call, hence it is very generic and takes care about killed subprocesses. You can define any arbitrary function with arguments that are picklable and pass it to the RPC server to call it in the subprocess.

But maybe it would be better to use sharedctypes memory in case of digitizing rather than a pipe?

comment:12 in reply to:  10 ; Changed 5 years ago by wenzeslaus

Replying to huhabla:

Replying to wenzeslaus:

Replying to huhabla:

Running the script will show that calling functions from a dict is 50 time faster than using a pipe with a subprocess.

It seems that this is something we need to take care of. And this is something what my factory pattern suggestions is trying to address.

We would need to create two sets of class with identical interface. One using RPCServer (safe) the other calling ctypes directly (fast). Objects should be created by factory, so that the factory will put the RPCServer into the objects, so the user does not take care of it. Maybe in Python we can go beyond the classic factory pattern and create also the RPCServer-dependent classes from the classes calling ctypes directly.

I realize that some code would be appreciated but I cannot dive into this more now.

I am not sure if i understand your approach, so code examples would be very helpful here. :)

My original idea does not involved RCP server, it was about ctypes calls versus module calls. It would require rewrite the whole pyGRASS using modules. Here is sample pseudo code:

# in module/package pygrass
class Point(object):
    def __init__(self, x=0, y=0, z=None, **kargs):
        # ...
        libvect.Vect_append_point(self.c_points, x, y, z)

# in module/package failsafegrass
class Point(object):
    def __init__(self, x=0, y=0, z=None)
        data = "P 1 1\n{x} {y}\n1 1".format(x=x, y=y)
        # ... (fill file /tmp/ef5s with data)
        run_command('v.edit' -n tool='add' map=vectmap input="/tmp/ef5s")

# in some other modules/packages
class FailSafeFactory(object):
    def create_Point(self, args, kwargs):
        return failsafegrass.Point(*args, **kwargs)

class FastFactory(object):
    def create_Point(self, args, kwargs):
        return pygrass.Point(*args, **kwargs)

# in the user code (probably GUI)

# at the beginning
if be_safe:
    factory = FailSafeFactory()
else:
    factory = FastFactory()

# usage in code
point1 = factory.create_Point(x=500, y=600)

# now I don't know which point it is but the interface is the same

I'm not sure if factory pattern is the right thing to use with RCP server (to hide it). But I think that it is the option which should be considered. There are two things I don't understand yet. First, how to work objects. I can pass them but how I call the methods and access attributes in general?

class Point:
    def get_x(self):
       return rcpserver.call(self._unsafe_get_x())

But this cannot work for object containing pointers, so there actually cannot be any _unsafe_get_x method. So, the second thing is that I probably need some ProxyPoint which can be passed between processes. And using some identifier it would be associated with the object on the other site.

# in the GUI process
class ProxyPoint:
    def get_x(self):
       return rcpserver.method_call(self, 'get_x')

# somewhere in the other unsafe process
# proxy_object == self from above
# method == 'get_x' from above
real_object = objects[proxy_object.uid]
return getattr(real_object, method)()

# maybe pass only the uid is better idea,
# since server should not be passed through itself

I don't have more now. It seems to me that this might be possible. But my understanding of problem is not complete and even now I see some problems such as function/method parameters which are proxy objects.

comment:13 in reply to:  12 Changed 5 years ago by zarch

Replying to wenzeslaus:

Replying to huhabla:

I am not sure if i understand your approach, so code examples would be very helpful here. :)

My original idea does not involved RCP server, it was about ctypes calls versus module calls. It would require rewrite the whole pyGRASS using modules. Here is sample pseudo code:

(IMHO) I don't see the point to make a python API using the modules interface. It will be only slow and useless for any real task, already pygrass for some operations it consume too much memory (probably it is due to some circular references, that need to be solve...) or is too slow.

off-topic { Personally if I have to do the effort of rewriting pygrass I would use Cython (or CPython), moreover I think that we should work to reduce and clean the high redundancy of the code, a lot of Python objects and functions are duplicate between script/pygrass/temporal/wxgui simply using slightly different logic and this is unmaintainable, unfortunately I have not the global view of the ~111000 rows (removing blank lines and comments) of the code. But I do think that we should work on that direction. }

Why not simply define your function (containing ctypes or ctypes-based function/objects) and call through the RPC interface? In this way, from my little understanding of the problem, you should be able to preserve the GUI from a crash, avoiding at the same time to make another duplicate, don't you?

Something like (not tested):

from grass.pygrass.vector import VectorTopo
from grass.pygrass.vector.geometry import Point
from grass.pygrass.function import get_mapset_vector


def add_points(vname, vmapset='', *points):
    """
    >>> add_points('new', (1, 2), (2, 3), (3, 4))
    """
    mapset = get_mapset_vector(vname, vmapset)  # chek if already exist
    with VectorTopo(vname, mapset, mode='rw' if mapset else 'w') as vct:
        for x, y in points:
            vct.write(Point(x, y))


ciface = RPCServer()
check = ciface.call(function=add_points, args=('new', (1, 2), (2, 3), (3, 4)))

comment:14 Changed 5 years ago by zarch

Sorry the track lost the quote characters.

Replying to wenzeslaus:

Replying to huhabla:

I am not sure if i understand your approach, so code examples would be very helpful here. :)

My original idea does not involved RCP server, it was about ctypes calls versus module calls. It would require rewrite the whole pyGRASS using modules. Here is sample pseudo code: [cut]

(IMHO) I don't see the point to make a python API using the modules interface. It will be only slow and useless for any real task, already pygrass for some operations it consume too much memory (probably it is due to some circular references, that need to be solve...) or is too slow.

off-topic { Personally if I have to do the effort of rewriting pygrass I would use Cython (or CPython), moreover I think that we should work to reduce and clean the high redundancy of the code, a lot of Python objects and functions are duplicate between script/pygrass/temporal/wxgui simply using slightly different logic and this is unmaintainable, unfortunately I have not the global view of the ~111000 rows (removing blank lines and comments) of the code. But I do think that we should work on that direction. }

Why not simply define your function (containing ctypes or ctypes-based function/objects) and call through the RPC interface? In this way, from my little understanding of the problem, you should be able to preserve the GUI from a crash, avoiding at the same time to make another duplicate, don't you?

Something like (not tested):

from grass.pygrass.vector import VectorTopo
from grass.pygrass.vector.geometry import Point
from grass.pygrass.function import get_mapset_vector


def add_points(vname, vmapset='', *points):
    """
    >>> add_points('new', (1, 2), (2, 3), (3, 4))
    """
    mapset = get_mapset_vector(vname, vmapset)  # check if already exist
    with VectorTopo(vname, mapset, mode='rw' if mapset else 'w') as vct:
        for x, y in points:
            vct.write(Point(x, y))


ciface = RPCServer()
check = ciface.call(function=add_points, args=('new', (1, 2), (2, 3), (3, 4)))

Changed 5 years ago by huhabla

Attachment: pygrass_rpc_interface.py added

PyGRASS RPC interface

Changed 5 years ago by huhabla

Attachment: pygrass_raster.diff added

PyGRASS patch to remove pointer from info object

comment:15 Changed 5 years ago by huhabla

I have attached a new RPC server that is able to instantiate a persistent PyGRASS object in a remote process that can be accessed using an RPC interface, have a look at the attached file pygrass_rpc_interface.py.

To get it work i needed to patch the PyGRASS raster info object to avoid C-pointer that can not be pickled, see pygrass_raster.diff. IMHO PyGRASS should avoid pointer and should use ctypes.byref() to pass pointers to GRASS C-functions. I think (i am not sure about this) that avoiding ctypes.pointer() may reduce memory leaks as well if objects like Cell_head() of Range() are managed by the Python garbage collector?

Here an example how to use the PyGRASS RPC interface, reading some raster map info's and all rows:

if __name__ == '__main__':
    import grass.script as grassscript
    import grass.pygrass.raster as pygrass

    grassscript.use_temp_region()
    grassscript.run_command("g.region", n=80.0, s=0.0, e=120.0, w=0.0,
                      t=1.0, b=0.0, res=10.0, res3=10.0)
    grassscript.run_command("r.mapcalc", expression="test = row()",
                            overwrite=True, quiet=True)

    # Start the server and create a RasterRow remote object
    # that is initialized with the raster map name
    kargs = {"name":"test"}
    server = RPCServer(pygrass.RasterRow, kargs)

    # Get some raster map info
    info = server.get_property("info")
    print "Rows:", info.rows
    print "Cols:", info.cols
    print "N:", info.north
    print "S:", info.south
    print "E:", info.east
    print "W:", info.west

    # Open the raster map for row access
    kargs = {}
    server.call("open", kargs)

    # Read all rows
    for row in xrange(info.rows):
        kargs = {"row":row}
        ret = server.call("get_row", kargs)
        print "Row:", row, ret

    # Close the raster map
    kargs = {}
    server.call("close", kargs)

    grassscript.del_temp_region()

Result:

GRASS 7.0.svn (nc_spm_08):~/src > python pygrass_rpc_interface.py 
Rows: 8
Cols: 12
N: 80.0
S: 0.0
E: 120.0
W: 0.0
Row: 0 [1 1 1 1 1 1 1 1 1 1 1 1]
Row: 1 [2 2 2 2 2 2 2 2 2 2 2 2]
Row: 2 [3 3 3 3 3 3 3 3 3 3 3 3]
Row: 3 [4 4 4 4 4 4 4 4 4 4 4 4]
Row: 4 [5 5 5 5 5 5 5 5 5 5 5 5]
Row: 5 [6 6 6 6 6 6 6 6 6 6 6 6]
Row: 6 [7 7 7 7 7 7 7 7 7 7 7 7]
Row: 7 [8 8 8 8 8 8 8 8 8 8 8 8]

IMHO the RPC interface design should provide a blueprint how to implement a remote digitizer using PyGRASS to access vector or raster maps. I hope that most of the PyGRASS functionality will be accessible using this approach.

Note: with this approach you will create a subprocess for each raster or vector map that should be modified. Each subprocess will manage only one object. The subprocess can manage several objects by using a session id. But IMHO this is not a good idea. In case one object calls G_fatal_error() or segfaults, all other objects will be killed as well.

PyGRASS may need more patching to make the objects that will be exchanged between client and server picklable (see PyGRASS raster info object patch pygrass_raster.diff).

This RPC interface will not replace the new PyGRASS messenger interface nor the tgis specific raster info access RPC implementation. Calling simple C-functions for each map in a space time raster datasets (STRDS) using an RPC interface is much faster than creating a subprocess and a raster object for each map in a STRDS.

Hence, there are different solutions for different needs. :)

comment:16 in reply to:  15 Changed 5 years ago by zarch

Hi Soeren,

sorry for the late reply.

Replying to huhabla:

I have attached a new RPC server that is able to instantiate a persistent PyGRASS object in a remote process that can be accessed using an RPC interface

Thank you for that, I think it is a good solution to the problem.

To get it work i needed to patch the PyGRASS raster info object to avoid C-pointer that can not be pickled, IMHO PyGRASS should avoid pointer and should use ctypes.byref() to pass pointers to GRASS C-functions. I think (i am not sure about this) that avoiding ctypes.pointer() may reduce memory leaks as well if objects like Cell_head() of Range() are managed by the Python garbage collector?

Yes, exactly, remove the pointer to the C structs will make independent all the objects to each other, so should solve the problem of the circular reference and memory consumption that I cited in a previous email.

So I can remove all the pointers, or leave the pointer but instantiate an independent C struct internally on the object, like looking at your diff pygrass_raster.diff?, I could do in the init:

def __init__(self, ...):
    ...
    self.cs_region = libgis.Cell_head()  # C struct region
    self.c_region = ctypes.pointer(self.cs_region)  # C pointer
    ...

And all the time that I use a C struct pointer to instantiate a class I need to copy the struct internally with memcpy. In this way, when the python object is dereferenced, the garbage collector should be able to manage and free the memory. Of course from a performance point of view we loose something because every time that we instantiate a geometry feature we are copying the "MapInfo?" struct.

Do you have a better idea?

comment:17 Changed 5 years ago by huhabla

Pietro and i investigated the possibility of a PyGRASS vector RPC interface in more detail off-list. We came to the conclusion that the design of PyGRASS does not allow the implementation of a RPC interface as suggested in this ticket. The main problem is that the most important PyGRASS vector geometry classes make use of C-library structures internally to massively speed up processing. The vector C-library structures (Map_info, line_pnts, line_cats, ...) are managed by the vector C-library using dynamic memory allocation. Such structures can not be automatically serialized (pickled) by Python. But instances of the geometry classes (Point, Line, Isle, Area, ...) must be exchanged by the RPC interface for reading and writing of vector maps. To allow the exchange of geometries all these classes must be rewritten and reduced in their functionality. All functions that may call G_fatal_error() must be removed from the geometry classes. An additional abstraction layer must be implemented to convert the line_pnts and line_cats data into picklable Python objects and vice verse.

Making PyGRASS RPC conform will result in much slower vector processing and reduced functionality. The only benefit would be that persistent applications, like the GUI, can use PyGRASS via RPC.

However, Pietro and i came to the conclusion that the effort to modify PyGRASS to be RPC conform is to large and the benefits to low. We suggest a dedicated vector editing RPC interface that calls the vector C-library functions in the RPC subprocess directly, or that make use of PyGRASS in the RPC subprocess but converts the geometry objects into picklebale Python objects.

comment:18 Changed 5 years ago by wenzeslaus

Resolution: wontfix
Status: newclosed

The simpler solution with specialized interfaces for each use case was implemented (e.g., for reporting messages to user in PyGRASS) and used in r58201, r58555, r58386, r58575, r58579 and others.

According to comment:17 closing as wontfix. Use specialized solutions as suggested in comment:17. Thanks Soeren and Pietro for the hard work and thinking.

Note: See TracTickets for help on using tickets.