Ticket #3672 (new defect)

Opened 4 years ago

Last modified 2 years ago

When called from ArcGIS 9.3 Python geoprocessing script, band.ReadAsArray() fails with TypeError: in method 'BandRasterIONumPy', argument 1 of type 'GDALRasterBandShadow *'

Reported by: jjr8 Owned by: hobu
Priority: normal Milestone: 1.8.1
Component: PythonBindings Version: 1.7.2
Severity: normal Keywords:
Cc: kyle

Description

Background

ArcGIS allows users to write Python scripts that can be invoked from geoprocessing workflows within the ArcGIS GUI environment. Python is now the primary scripting language for ArcGIS, replacing Avenue, AML, Visual Basic, and others that ESRI tried in the past.

But ArcGIS currently suffers from a major deficiency: it does not provide a Python API for reading and writing raster data. To do this using ArcGIS and Python alone, you must call an ArcGIS API to export to a format such as ArcInfo ASCII Grid and read it using Python's built-in functions. Alternatively, you can use GDAL's Python bindings and read and write rasters directly. Thus GDAL provides a huge advantage in convenience and performance to ArcGIS users who write Python scripts. I consider GDAL essential technology for any large-scale Python-based ArcGIS extension, such as the  Marine Geospatial Ecology Tools package produced by my lab.

The problem

With ArcGIS 9.3, the following script will run successfully once from ArcGIS but then fail on subsequent attempts:

# Instantiate the ArcGIS geoprocessor so we can report messages.

import arcgisscripting, os, sys
gp = arcgisscripting.create()

# Import gdal and read values from a dataset. It does not matter what
# dataset you use.

from osgeo import gdal

gp.AddMessage('Opening the dataset')
testFile = os.path.join(os.path.dirname(sys.argv[0]), 'RandomData.img')
dataset = gdal.Open(testFile)

gp.AddMessage('Getting band 1')
band = dataset.GetRasterBand(1)

gp.AddMessage('Reading the data')
try:
    data = band.ReadAsArray()        # Fails here
except:
    import traceback
    gp.AddMessage(traceback.format_exc())
    raise

gp.AddMessage('Succeeded!')

The first time you run it, you get this output from ArcGIS:

Executing: ArrayTest
Start Time: Thu Jul 01 11:45:44 2010
Running script ArrayTest...
Opening the dataset
Getting band 1
Reading the data
Succeeded!
Completed script ArrayTest...
Executed (ArrayTest) successfully.
End Time: Thu Jul 01 11:45:46 2010 (Elapsed Time: 2.00 seconds)

The second and following times you get this:

Executing: ArrayTest
Start Time: Wed Jun 30 18:19:12 2010
Running script ArrayTest...
Opening the dataset
Getting band 1
Reading the data
Traceback (most recent call last):
  File "<string>", line 20, in <module>
  File "C:\Python25\lib\site-packages\osgeo\gdal.py", line 850, in ReadAsArray
  File "C:\Python25\lib\site-packages\osgeo\gdal_array.py", line 239, in BandReadAsArray
  File "C:\Python25\lib\site-packages\osgeo\gdal_array.py", line 88, in BandRasterIONumPy
TypeError: in method 'BandRasterIONumPy', argument 1 of type 'GDALRasterBandShadow *'

<type 'exceptions.TypeError'>: in method 'BandRasterIONumPy', argument 1 of type 'GDALRasterBandShadow *'
Failed to execute (ArrayTest).
End Time: Wed Jun 30 18:19:13 2010 (Elapsed Time: 1.00 seconds)

To repro

  1. MS Windows. I'm using Windows 7 x64 but XP SP3 or later should (32 or 64) work fine.
  1. ArcGIS 9.3. I'm using the latest, 9.3.1 SP1, but I doubt it matters which version of 9.3 you use. Note that ArcGIS is a 32-bit program and since everything will be running within ArcGIS processes, everything below needs to be 32-bit.
  1. Python 2.5. I'm using the latest binary available for Windows, Python 2.5.4.
  1. numpy for Python 2.5. I'm using 1.3.0 although any build in the 1.x series should work except 1.4.0, which had a breaking ABI change, retracted by 1.4.1 which came out only very recently.
  1. GDAL with Python bindings. I am using  http://vbkto.dyndns.org/sdk/release-1310-gdal-1-7-mapserver-5-6.zip, downloaded today.
  1. Download the attached ArrayTest.zip and unzip it to someplace.
  1. Start ArcCatalog. Navigate to the ArrayTest directory you decompressed. Open the Toolbox you see there. Run the ArrayTest tool.
  1. The first time you run the tool, it will work. Run it again. Now it will fail.

I needed I can provide a remotely-accessible Windows environment that is configured this way. I will be gone July 3-12, 2010 so it would probably have to wait until after I return.

Analysis

I suspect this is a compatibility issue between SWIG and ArcGIS 9.3. ArcGIS 9.2 and earlier versions ran Python scripts by creating a child python.exe process. ArcGIS 9.3 runs them by creating a Python interpreter directly within the ArcGIS process (ArcCatalog.exe or ArcMap.exe) using Python's C API. This new technique provides major performance advantages but raises the issue of how to isolate Python scripts from each other when they run in succession in a workflow. ESRI apparently chose one of the following, I am not sure which:

  1. Instantiate a brand-new Python interpreter each time a script is run.
  1. Instantiate a Python new "sub-interpreter" each time a script is run (see Python documentation for info on sub-interpreters).

In any case, this raises issues for Python extension modules such as GDAL. As you probably know, Python 2.x does not provide a mechanism for cleanly unloading modules. On Windows at least, the DLL is left in memory. Therefore, if it allocates static variables, these will remain in memory. When ArcGIS allocates a new interpreter or a sub-interpreter, those variables are still there. Depending on which way ArcGIS does it, the module's Python "init" function may or may not be called.

The Python documentation cautions extension writers about this problem (see section titled "Initialization, Finalization, and Threads"). It apparently took some time for SWIG to be aware of it and handle it. You may recall that I mentioned that I thought that the adoption of SWIG 1.3.39 solved problems relating to this (see  this thread). SWIG 1.3.39 indeed helped: it allowed some of GDAL and OGR to work properly. You can, for example, open and close datasets and retrieve some properties. I thought it was all working. But apparently six months ago I did not actually test the reading and writing of data from bands. That appears to be still broken.

I looked into the C code generated by SWIG. This is the code that is failing, in gdal_array_wrap.cpp:

SWIGINTERN PyObject *_wrap_BandRasterIONumPy(PyObject *SWIGUNUSEDPARM(self), PyObject *args, PyObject *kwargs) {
  PyObject *resultobj = 0;
  GDALRasterBandShadow *arg1 = (GDALRasterBandShadow *) 0 ;
  int arg2 ;
  int arg3 ;
  int arg4 ;
  int arg5 ;
  int arg6 ;
  PyArrayObject *arg7 = (PyArrayObject *) 0 ;
  int arg8 ;
  void *argp1 = 0 ;
  int res1 = 0 ;
  int val2 ;
  int ecode2 = 0 ;
  int val3 ;
  int ecode3 = 0 ;
  int val4 ;
  int ecode4 = 0 ;
  int val5 ;
  int ecode5 = 0 ;
  int val6 ;
  int ecode6 = 0 ;
  int val8 ;
  int ecode8 = 0 ;
  PyObject * obj0 = 0 ;
  PyObject * obj1 = 0 ;
  PyObject * obj2 = 0 ;
  PyObject * obj3 = 0 ;
  PyObject * obj4 = 0 ;
  PyObject * obj5 = 0 ;
  PyObject * obj6 = 0 ;
  PyObject * obj7 = 0 ;
  char *  kwnames[] = {
    (char *) "band",(char *) "bWrite",(char *) "xoff",(char *) "yoff",(char *) "xsize",(char *) "ysize",(char *) "psArray",(char *) "buf_type", NULL 
  };
  CPLErr result;
  
  if (!PyArg_ParseTupleAndKeywords(args,kwargs,(char *)"OOOOOOOO:BandRasterIONumPy",kwnames,&obj0,&obj1,&obj2,&obj3,&obj4,&obj5,&obj6,&obj7)) SWIG_fail;
  res1 = SWIG_ConvertPtr(obj0, &argp1,SWIGTYPE_p_GDALRasterBandShadow, 0 |  0 );
  if (!SWIG_IsOK(res1)) {
    SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "BandRasterIONumPy" "', argument " "1"" of type '" "GDALRasterBandShadow *""'"); 
  }

The problem is that SWIG_ConvertPtr is failing to convert the first argument into a GDALRasterBandShadow pointer. The argument is a band and this should work. I suspect that SWIGTYPE_p_GDALRasterBandShadow was initialized the first time that the Python script ran and resolves to a Python type that was instantiated during that first run. When the second script runs, using a new interpreter or sub-interpreter, the type is allocated again by Python but SWIG is still storing a pointer, as a module static variable, to the type from the first run. When it does type comparison using pointers, it decides the types are not the same.

Important note

Sometimes the second run will not fail with the nice traceback, but instead with this:

Executing: ArrayTest
Start Time: Thu Jul 01 11:46:01 2010
Running script ArrayTest...
Opening the dataset
Getting band 1
Reading the data
ERROR 999998: There are no more files.
Failed to execute (ArrayTest).
End Time: Thu Jul 01 11:46:02 2010 (Elapsed Time: 1.00 seconds)

A third or subsequent attempt will fail even sooner:

Executing: ArrayTest
Start Time: Thu Jul 01 11:46:10 2010
Running script ArrayTest...
Opening the dataset
ERROR 999998: Unexpected Error.
Failed to execute (ArrayTest).
End Time: Thu Jul 01 11:46:11 2010 (Elapsed Time: 1.00 seconds)

I'm not sure why that happens rather than the traceback. It could be that SWIG is corrupting memory under this scenario. When I put a Python try/except around it, I was not able to get any exception. It is like the Python interpreter just stopped running. Eventually, after this happens, the ArcGIS process will crash--additional evidence that memory corruption has happened.

Thanks for looking at this issue

I know it is nasty. If there is any way I can help, please let me know. It is critical for my project that this work. I emphasize that anyone else trying to use GDAL from ArcGIS Python scripts will encounter this problem, and GDAL is the best way to read rasters from ArcGIS Python scripts.

As mentioned, I will be gone July 3-12, 2010. No access to email. Outside of that window I will respond promptly to email.

Thanks again, Jason

Attachments

ArrayTest.zip Download (0.7 MB) - added by jjr8 4 years ago.

Change History

Changed 4 years ago by jjr8

Changed 4 years ago by jjr8

Howard followed up with me about this and some folks at ESRI. We currently believe it is a SWIG issue that affects compatibility of the GDAL bindings with ArcGIS 9.3 Python geoprocessing scripts. I am about to leave for vacation but will return July 12 and continue investigating the issue. Once we have a better idea I'll update the ticket with more info...

Changed 4 years ago by rouault

I had a quick look at this. On Linux with python 2.5, I can reproduce crashes with the following code (at the second iteration) that does not involve ArcGIS, but that starts new python interpreter. If I comment the import numpy line, it doesn't crash. But variants of that code that don't crash show weird results, if I do a print gdal.GetDriverByName?('GTiff') instead of ds = gdal.Open('byte.tif') for example

#include "Python.h"

int main(int argc, char* argv[])
{
    Py_Initialize();

    while(1)
    {
        PyThreadState* global_state = PyThreadState_Get();
        PyThreadState* ts = Py_NewInterpreter();
        PyThreadState_Swap(global_state);
        PyThreadState_Swap(ts);

        PyRun_SimpleString("from osgeo import gdal");
        PyRun_SimpleString("print 'Open(byte.tif)'");
        PyRun_SimpleString("ds = gdal.Open('byte.tif')");
        PyRun_SimpleString("print 'import numpy'");
        PyRun_SimpleString("import numpy");

        PyThreadState_Swap(global_state);
        PyThreadState_Swap(ts);
        Py_EndInterpreter( ts );
        PyThreadState_Swap(global_state);
    }

    Py_Finalize();

    return 0;
}

When instrumenting gdal_warp.cpp, it shows that the init_gdal() method is loaded only once, so it really looks like an issue with types and SWIG

Changed 4 years ago by jjr8

I have been working on this issue for a while with Jason Scheirer at ESRI. In summary, we both believe this is a bug in ArcGIS that affects SWIG projects such as GDAL. For now, I would like to keep this ticket open until I have final confirmation from him that it is, in fact, an ArcGIS issue. Until then, here are the details so far.

Jason S. indicated that, contrary to my analysis above, ArcGIS creates a single instance of the Python interpreter and reuses it each time it runs a Python script, without creating sub-interpreters. This is similar to Even's code above. Jason S. provided the following example code to illustrate how to imitate what ArcGIS does:

Py_Initialize();
PyObject* builtins = PyImport_ImportModule("__builtin__");
PyObject* globals1 = PyDict_New();
PyObject* globals2 = PyDict_New();
PyDict_SetItemString(globals1, "__builtins__", builtins);
PyDict_SetItemString(globals2, "__builtins__", builtins);
Py_DECREF(builtins);
PyObject *value1 = PyRun_String("import gdal", Py_file_input, globals1, globals1);
if (!value1)
  PyErr_Print();
PyObject *value2 = PyRun_String("import gdal", Py_file_input, globals2, globals2);
if (!value2)
  PyErr_Print();
Py_DECREF(globals1);
Py_DECREF(globals2);
Py_XDECREF(value1);
Py_XDECREF(value2);
Py_Finalize();

He said he expected the failure to occur with the second PyRun_String call. Using his example, I replaced replacing the simple import gdal with code that called band.ReadAsArray(), which is where the original failure occurred. It did *not* fail.

After digging into this more using debug builds of Python and GDAL, I discovered that Python's sys.modules was empty each time ArcGIS called my script. This was not expected given that ArcGIS was supposedly reusing the same Python interpreter each time. sys.modules should contain all of the modules ever imported by scripts run previously in the ArcGIS process.

I demonstrated this to Jason S. and he looked into ArcGIS a bit more and eventually replied: "After digging into the [ArcGIS] code a bit more, I was able to figure out that it’s a bad interaction due to multiple calls (in our script tool framework) to PySys_SetArgv and Py_Initialize, which then clobber sys.modules. I’m going to set up some unit tests and hopefully get around this without breaking anything for SP1 [of ArcGIS 10]."

Using this knowledge, I developed a workaround for my Python scripts. Before each script exits, it caches the contents of sys.modules in a static variable of a Python extension module (a .pyd). This module, like all Python extension modules, remains in memory until the process exits. When the scripts run again, the first thing they do is to compare the contents of that static variable to the contents of sys.modules and restore any modules to sys.modules that are missing from it. This is very kludgy but it seems to work. I can now use GDAL without any problems from ArcGIS geoprocessing scripts that are configured to run "in process". I reviewed the Python 2.5.5 source and could not see a problem with it, after inspecting the module importing code for an hour.

I'll provide another update when I have more information...

Changed 2 years ago by kyle

  • cc kyle added
Note: See TracTickets for help on using tickets.