Changes between Version 18 and Version 19 of PythonGotchas


Ignore:
Timestamp:
Dec 1, 2019, 2:13:34 PM (4 years ago)
Author:
Even Rouault
Comment:

Migration of Python gotchas

Legend:

Unmodified
Added
Removed
Modified
  • PythonGotchas

    v18 v19  
    11= Gotchas in the GDAL and OGR Python Bindings =
    22
    3 This page lists aspects of GDAL's and OGR's Python bindings that may catch Python programmers by surprise. If you find something new, feel free to add it to the list, but consider discussing it on the [http://www.osgeo.org/mailman/listinfo/gdal-dev/ gdal-dev mailing list] first, to make sure you fully understand the issue and that others agree that it is unexpected, "non-Pythonic", or something that would catch many Python programmers by surprise. Be sure to reference email threads, Trac tickets, and other sources of additional information.
    4 
    5 This list is not the place to report bugs. If you believe something is a bug, please [http://trac.osgeo.org/gdal/newticket open a ticket] and report the problem to gdal-dev. Then consider listing it here if it is something related to Python specifically. Do not list it here if it relates to GDAL or OGR generally, and not the Python bindings specifically.
    6 
    7 Not all items listed here are bugs. Some of these are just how GDAL and OGR work and cannot be fixed easily without breaking existing code. If you don't like how something works and think it should be changed, feel free to discuss it on gdal-dev and see what can be done.
    8 
    9 = Gotchas that are by design... or per history =
    10 
    11 These are unexpected behaviors that are not considered by the GDAL and OGR teams to be bugs and are unlikely to be changed due to effort required, or whose fixing might affect backward compatibility, etc.
    12 
    13 === Python bindings do not raise exceptions unless you explicitly call {{{UseExceptions()}}} ===
    14 
    15 By default, the GDAL and OGR Python bindings do not raise exceptions when errors occur. Instead they return an error value such as {{{None}}} and write an error message to {{{sys.stdout}}}. For example, when you try to open a non-existing dataset with GDAL:
    16 
    17 {{{
    18 #!python
    19 >>> from osgeo import gdal
    20 >>> gdal.Open('C:\\foo.img')
    21 ERROR 4: `C:\foo.img' does not exist in the file system,
    22 and is not recognised as a supported dataset name.
    23 
    24 >>>
    25 }}}
    26 
    27 In Python, it is traditional to report errors by raising exceptions. You can enable this behavior in GDAL and OGR by calling the {{{UseExceptions()}}} function:
    28 
    29 {{{
    30 #!python
    31 >>> from osgeo import gdal
    32 >>> gdal.UseExceptions()        # Enable exceptions
    33 >>> gdal.Open('C:\\foo.img')
    34 Traceback (most recent call last):
    35   File "<stdin>", line 1, in <module>
    36 RuntimeError: `C:\foo.img' does not exist in the file system,
    37 and is not recognised as a supported dataset name.
    38 
    39 >>>
    40 }}}
    41 
    42 The GDAL team acknowledges that Python programmers expect exceptions to be enabled by default, but says that exceptions are disabled by default to [http://lists.osgeo.org/pipermail/gdal-dev/2010-September/026031.html preserve backward compatibility].
    43 
    44 === Python crashes if you use an object after deleting an object it has a relationship with ===
    45 
    46 Consider this example:
    47 
    48 {{{
    49 #!python
    50 >>> from osgeo import gdal
    51 >>> dataset = gdal.Open('C:\\RandomData.img')
    52 >>> band = dataset.GetRasterBand(1)
    53 >>> print(band.Checksum())
    54 31212
    55 }}}
    56 
    57 In this example, {{{band}}} has a relationship with {{{dataset}}} that requires {{{dataset}}} to remain allocated in order for {{{band}}} to work. If we delete {{{dataset}}} and then try to use {{{band}}}, Python will crash:
    58 
    59 {{{
    60 #!python
    61 >>> from osgeo import gdal
    62 >>> dataset = gdal.Open('C:\\RandomData.img')
    63 >>> band = dataset.GetRasterBand(1)
    64 >>> del dataset           # This will cause the Python garbage collector to deallocate dataset
    65 >>> band.GetChecksum()    # This will now crash Python because the band's dataset is gone
    66 < Python crashes >
    67 }}}
    68 
    69 This problem can manifest itself in subtle ways. For example, can occur if you try to instantiate a temporary dataset instance within a single line of code:
    70 
    71 {{{
    72 #!python
    73 >>> from osgeo import gdal
    74 >>> print(gdal.Open('C:\\RandomData.img').GetRasterBand(1).Checksum())
    75 < Python crashes >
    76 }}}
    77 
    78 In this example, the dataset instance was no longer needed after the call to {{{GetRasterBand()}}} so Python deallocated it ''before'' calling {{{Checksum()}}}.
    79 
    80 This problem occurs because the GDAL and OGR objects are implemented in C++ and the relationships between them are maintained in C++ using pointers. When you delete the dataset instance in Python it causes the C++ object behind it to be deallocated. But the C++ object behind the band instance does not know that this happened, so it contains a pointer to the C++ dataset object that no longer exists. When the band tries to access the non-existing object, the process crashes.
    81 
    82 The GDAL team knows that this design is not what Python programmers expect. Unfortunately the design is difficult to correct so it is likely to remain for some time. Please consult the GDAL team for more information.
    83 
    84 The problem is not restricted to the GDAL band and dataset objects. It happens in other areas where objects have relationships with each other. Unfortunately there is no complete list, so you have to watch for it yourself. One other known place involves the OGR {{{GetGeometryRef()}}} function:
    85 
    86 {{{
    87 #!python
    88 >>> feat = lyr.GetNextFeature()
    89 >>> geom = feat.GetGeometryRef()     # geom contains a reference into the C++ geometry object maintained by the C++ feature object
    90 >>> del feat                         # This deallocates the C++ feature object, and its C++ geometry
    91 >>> print(geom.ExportToWkt())        # Crash here. The C++ geometry no longer exists
    92 < Python crashes >
    93 }}}
    94 
    95 If you read the GDAL and OGR API documentation carefully, you will see that the functions that end in "Ref" obtain references to internal objects, rather than making new copies. This is a clue that the problem could occur. Be careful when using the "Ref" functions. Also watch out for functions that end in "Directly", such as {{{SetGeometryDirectly()}}}, which transfer ownership of internal objects:
    96 
    97 {{{
    98 #!python
    99 >>> point = ogr.Geometry(ogr.wkbPoint)
    100 >>> feature = ogr.Feature(layer_defn)
    101 >>> feature.SetGeometryDirectly(point)    # Transfers ownership of the C++ geometry from point to feature
    102 >>> del feature                           # point becomes implicitly invalid, because feature owns the C++ geometry
    103 >>> print(point.ExportToWkt())            # Crash here
    104 < Python crashes >
    105 }}}
    106 
    107 The advantage of the "Ref" and "Directly" functions is they provide faster performance because a duplicate object does not need to be created. The disadvantage is that you have to watch out for this problem.
    108 
    109 The information above is based on [http://lists.osgeo.org/pipermail/gdal-dev/2010-September/026027.html email from Even Rouault].
    110 
    111 === Python crashes if you add a new field to an OGR layer when features deriving from this layer definition are still active ===
    112 
    113 For example:
    114 
    115 {{{
    116 #!python
    117 >>> feature = lyr.GetNextFeature()
    118 >>> field_defn = ogr.FieldDefn("foo", ogr.OFTString)
    119 >>> lyr.CreateField(field_defn)                       # now, existing features deriving from this layer are invalid
    120 >>> feature.DumpReadable()                            # segfault
    121 < Python crashes >
    122 }}}
    123 
    124 For more information, please see #3552.
    125 
    126 === Layers with attribute filters ({{{SetAttributeFilter()}}}) will only return filtered features when using {{{GetNextFeature()}}} ===
    127 
    128 If you read the documentation for {{{SetAttributeFilter()}}} carefully you will see the caveat about {{{OGR_L_GetNextFeature()}}}. This means that if you use {{{GetFeature()}}}, instead of {{{GetNextFeature()}}}, then you can still access and work with features from the layer that are not covered by the filter. {{{GetFeatureCount()}}} will respect the filter and show the correct number of features filtered. However, working with {{{GetFeatureCount()}}} in a loop can lead to some subtle confusion. Iterating over the {{{Layer}}} object or using {{{GetNextFeature()}}} should be the default method for accessing features:
    129 
    130 {{{
    131 #!python
    132 >>> lyr = inDataSource.GetLayer()
    133 >>> lyr.SetAttributeFilter("PIN = '0000200001'")      # this is a unique filter for only one record
    134 >>> for i in range( 0, lyr.GetFeatureCount() ):
    135 ...    feat = lyr.GetFeature( i )
    136 ...    print(feat)                                    # this will print one feat, but it's the first feat in the Layer and not the filtered feat 
    137 ...
    138 }}}
    139 
    140 
    141 === Certain objects contain a {{{Destroy()}}} method, but you should never use it ===
    142 
    143 You may come across examples that call the {{{Destroy()}}} method. [http://www.gis.usu.edu/~chrisg/python/2009/lectures/ospy_slides2.pdf This tutorial] even gives specific advice on page 12 about when to call {{{Destroy()}}}. But according to [http://lists.osgeo.org/pipermail/gdal-dev/2010-September/026027.html email from Even Rouault], {{{Destroy()}}} never needs to be called:
    144 
    145 {{{
    146 > I have some Python code that uses OGR geometry objects internally, creating
    147 > them like this:
    148 >
    149 > point = ogr.Geometry(ogr.wkbPoint)
    150 >
    151 > Does this code need to explicitly destroy these geometries, like the
    152 > following, to avoid leaks, or can it simply allow them to go out of scope
    153 > and have Python's reference counting and garbage collector clean them up?
    154 >
    155 > point.Destroy()
    156 
    157 There's no reason to call Destroy(), at all. Native object gets destroyed when
    158 Python object goes out of scope, or when they are assigned to None. So replace
    159 foo.Destroy() by foo = None if you really want to control when the underlying
    160 C++ object is destroyed.
    161 
    162 > I'm sorry for my ignorance here. I found a nice GDAL tutorial that seems to
    163 > say they *should* be explicitly destroyed in certain circumstances (see
    164 > http://www.gis.usu.edu/~chrisg/python/2009/lectures/ospy_slides2.pdf, page
    165 > 12). But I have not really seen any other examples of this.
    166 >
    167 
    168 Destroy() was perhaps necessary with old-gen bindings, but I'm not even sure
    169 of that... Perhaps this shouldn't have been exposed at all... But, as
    170 mentionned in the slides, it is true that there are situations where you
    171 shouldn't call Destroy() at all.
    172 }}}
    173 
    174 
    175 === Saving and closing datasets/datasources ===
    176 
    177 To save and close GDAL raster datasets or OGR vector datasources, the object needs to be dereferenced, such as setting it to {{{None}}}, a different value, or deleting the object. If there are more than one copies of the dataset or datasource object, then each copy needs to be dereferenced.
    178 
    179 For example, creating and saving a raster dataset:
    180 {{{
    181 #!python
    182 >>> from osgeo import gdal
    183 >>> driver = gdal.GetDriverByName('GTiff')
    184 >>> dst_ds = driver.Create('new.tif', 10, 15)
    185 >>> band = dst_ds.GetRasterBand(1)
    186 >>> arr = band.ReadAsArray()  # raster values are all zero
    187 >>> arr[2, 4:] = 50  # modify some data
    188 >>> band.WriteArray(arr)  # raster file still unmodified
    189 >>> band = None  # dereference band to avoid gotcha described previously
    190 >>> dst_ds = None  # save, close
    191 }}}
    192 The last dereference to the raster dataset writes the data modifications and closes the raster file. {{{WriteArray(arr)}}} does not write the array to disk, unless the GDAL block cache is full (typically 40 MB).
    193 
    194 With some drivers, raster datasets can be intermittently saved without closing using {{{FlushCache()}}}. Similarly, vector datasets can be saved using {{{SyncToDisk()}}}. However, neither of these methods guarantee that the data are written to disk, so the preferred method is to deallocate as shown above.
    195 
    196 
    197 === Exceptions raised in custom error handlers do not get caught ===
    198 If using GDAL 1.10+ the python bindings allow you to specify a python callable as an error handler (#4993). However, these error handlers appear to be called in a separate thread and any exceptions raised do not propagate back to the main thread (#5186).
    199 
    200 So if you want to [http://gis.stackexchange.com/questions/43404/how-to-detect-a-gdal-ogr-warning/68042 catch warnings as well as errors], something like this won't work:
    201 {{{
    202 def error_handler(err_level, err_no, err_msg):
    203 
    204     if err_class >= gdal.CE_Warning:
    205         raise RuntimeError(err_level, err_no, err_msg) #this exception does not propagate back to main thread!
    206 
    207 if __name__=='__main__':
    208 
    209     #Test custom error handler
    210     gdal.PushErrorHandler(error_handler)
    211     gdal.Error(gdal.CE_Warning,2,'test warning message')
    212     gdal.PopErrorHandler()
    213 }}}
    214 
    215 But you can do something like this instead:
    216 {{{
    217 
    218 class GdalErrorHandler(object):
    219     def __init__(self):
    220         self.err_level=gdal.CE_None
    221         self.err_no=0
    222         self.err_msg=''
    223 
    224     def handler(self, err_level, err_no, err_msg):
    225         self.err_level=err_level
    226         self.err_no=err_no
    227         self.err_msg=err_msg
    228 
    229 if __name__=='__main__':
    230 
    231     err=GdalErrorHandler()
    232     handler=err.handler # Note don't pass class method directly or python segfaults
    233                         # due to a reference counting bug
    234                         # http://trac.osgeo.org/gdal/ticket/5186#comment:4
    235 
    236     gdal.PushErrorHandler(handler)
    237     gdal.UseExceptions() #Exceptions will get raised on anything >= gdal.CE_Failure
    238 
    239     try:
    240         gdal.Error(gdal.CE_Warning,1,'Test warning message')
    241     except Exception as e:
    242         print('Operation raised an exception')
    243         raise e
    244     else:
    245         if err.err_level >= gdal.CE_Warning:
    246             print('Operation raised an warning')
    247             raise RuntimeError(err.err_level, err.err_no, err.err_msg)
    248     finally:
    249         gdal.PopErrorHandler()
    250 }}}
    251 
    252 = Gotchas fixed in GDAL 1.8.0 =
    253 
    254 These are bugs that were fixed or designs that were changed in GDAL 1.8.0. If you use an older version, watch out for these.
    255 
    256 === {{{gdal.ErrorReset()}}} must be called after an error occurs, or it will keep happening ===
    257 
    258 In this example, we use OGR to create a geometry object using valid WKT. Then we try some invalid WKT, which is expected to fail, and then try the valid WKT again but it fails. Other OGR functions will also fail.
    259 
    260 {{{
    261 #!python
    262 >>> from osgeo import ogr
    263 >>> ogr.UseExceptions()
    264 >>> ogr.CreateGeometryFromWkt('POINT(1 2)')          # Create a point using valid WKT
    265 <osgeo.ogr.Geometry; proxy of <Swig Object of type 'OGRGeometryShadow *' at 0x244b658> >
    266 >>> ogr.CreateGeometryFromWkt('blah blah blah')      # Now try to create one using invalid WKT
    267 Traceback (most recent call last):
    268   File "<stdin>", line 1, in <module>
    269   File "C:\Python25\lib\site-packages\osgeo\ogr.py", line 2885, in CreateGeometryFromWkt
    270     return _ogr.CreateGeometryFromWkt(*args, **kwargs)
    271 RuntimeError: OGR Error: Unsupported geometry type
    272 >>> ogr.CreateGeometryFromWkt('POINT(1 2)')          # Now try to to create one using valid WKT again
    273 Traceback (most recent call last):
    274   File "<stdin>", line 1, in <module>
    275   File "C:\Python25\lib\site-packages\osgeo\ogr.py", line 2885, in CreateGeometryFromWkt
    276     return _ogr.CreateGeometryFromWkt(*args, **kwargs)
    277 RuntimeError: OGR Error: Unsupported geometry type
    278 }}}
    279 
    280 The problem is that OGR and GDAL maintain an internal state variable that tracks whether an error occurred during the last operation, but that this variable is not automatically cleared by OGR or GDAL. You must manually clear it by calling the {{{gdal.ErrorReset()}}} function:
    281 
    282 {{{
    283 #!python
    284 >>> ogr.CreateGeometryFromWkt('blah blah blah')
    285 Traceback (most recent call last):
    286   File "<stdin>", line 1, in <module>
    287   File "C:\Python25\lib\site-packages\osgeo\ogr.py", line 2885, in CreateGeometryFromWkt
    288     return _ogr.CreateGeometryFromWkt(*args, **kwargs)
    289 RuntimeError: OGR Error: Unsupported geometry type
    290 >>> from osgeo import gdal
    291 >>> gdal.ErrorReset()
    292 >>> ogr.CreateGeometryFromWkt('POINT(1 2)')
    293 <osgeo.ogr.Geometry; proxy of <Swig Object of type 'OGRGeometryShadow *' at 0x244b7a8> >
    294 }}}
    295 
    296 This function only appears in the GDAL Python bindings, not the OGR Python bindings. Even if you are only using OGR, you must use GDAL to clear the error.
    297 
    298 This problem is ackowledged by the GDAL team as a bug. Please see #3077.
    299 
    300 = Gotchas that result from bugs or behaviors of other software =
    301 
    302 === Python crashes in GDAL functions when you upgrade or downgrade numpy ===
    303 
    304 Much of GDAL's Python bindings are implemented in C++. Much of the core of numpy is implemented in C. The C++ part of GDAL's Python bindings interacts with the C part of numpy through numpy's ABI (application binary interface). This requires GDAL's Python bindings to be compiled using numpy header files that define numpy C data structures. Those data structures sometimes change between numpy versions. When this happens, the new version of numpy is not be compatible at the binary level with the old version, and the GDAL Python bindings must be recompiled before they will work with the new verison of numpy. And when they are recompiled, they probably won't work with the old version.
    305 
    306 If you obtain a precompiled version of GDAL's Python bindings, such as the Windows packages from http://vbkto.dyndns.org/sdk/, be sure you look up what version of numpy was used to compile them, and install that version of numpy on your machine.
    307 
    308 === Python bindings cannot be used successfully from ArcGIS in-process geoprocessing tools (ArcGIS 9.3 and later) ===
    309 
    310 ArcGIS allows the creation of custom, Python-based geoprocessing tools. Until ArcGIS 10, there was no easy way to read raster data into memory. GDAL provides such a mechanism.
    311 
    312 Starting with ArcGIS 9.3, geoprocessing tools can either run in the ArcGIS process itself (!ArcCatalog.exe or !ArcMap.exe) or run in a separate python.exe worker process. Unfortunately ArcGIS contains a bug in how it runs in-process tools. Thus, if you use GDAL from an in-process tool, it will run fine the first time but after that it may fail with {{{TypeError}}} exceptions until you restart the ArcGIS process. For example, band.!ReadAsArray() fails with:
    313 
    314     {{{TypeError: in method 'BandRasterIONumPy', argument 1 of type 'GDALRasterBandShadow *'}}}
    315 
    316 This is a bug in ArcGIS. Please see #3672 for complete details and advice on workarounds.
     3This page has now been migrated to the official documentation at https://gdal.org/api/python_gotchas.html