Opened 15 years ago

Closed 15 years ago

Last modified 15 years ago

#2686 closed defect (wontfix)

OGRShapeDriver::Open should not return NULL if poDS->GetLayerCount() == 0

Reported by: jjr8 Owned by: warmerdam
Priority: normal Milestone: 1.6.0
Component: OGR_SF Version: 1.5.2
Severity: normal Keywords: shapefile
Cc:

Description

The current revision of OGRShapeDriver::Open will return NULL in two circumstances:

  • It failed to open the data source.
  • It opened the data source but the layer count is zero.

From trunk/gdal/ogr/ogrsf_frmts/shape/ogrshapedriver.cpp@10645#L59:

OGRDataSource *OGRShapeDriver::Open( const char * pszFilename,
                                     int bUpdate )

{
    OGRShapeDataSource  *poDS;

    poDS = new OGRShapeDataSource();

    if( !poDS->Open( pszFilename, bUpdate, TRUE )
        || poDS->GetLayerCount() == 0 )
    {
        delete poDS;
        return NULL;
    }
    else
        return poDS;
}

IMHO this second case not good behavior. If a directory has zero shapefiles, the function should return a data source that has zero layers, rather than returning NULL. This would be consistent with the capability of creating a shapefile data source with zero layers, e.g.:

>>> from osgeo import ogr
>>> driver = ogr.GetDriverByName("ESRI Shapefile")
>>> ds = driver.CreateDataSource("C:\\Test")
>>> ds.GetLayerCount()
0
>>> ds = driver.Open("C:\\Test")    # This will return None
>>> ds.GetLayerCount()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'GetLayerCount'
>>> ds is None
True

This would be consistent with other OGR drivers, such as the GMT driver trunk/gdal/ogr/ogrsf_frmts/gmt/ogrgmtdriver.cpp@11020#L59:

OGRDataSource *OGRGmtDriver::CreateDataSource( const char * pszName,
                                               char **papszOptions )

{
    OGRGmtDataSource *poDS = new OGRGmtDataSource();

    if( poDS->Create( pszName, papszOptions ) )
        return poDS;
    else
    {
        delete poDS;
        return NULL;
    }
}

and the PGeo driver trunk/gdal/ogr/ogrsf_frmts/pgeo/ogrpgeodriver.cpp@10645#L58:

OGRDataSource *OGRPGeoDriver::Open( const char * pszFilename,
                                    int bUpdate )

{
    OGRPGeoDataSource     *poDS;

    if( !EQUALN(pszFilename,"PGEO:",5) 
        && !EQUAL(CPLGetExtension(pszFilename),"mdb") )
        return NULL;

#ifndef WIN32
    // Try to register MDB Tools driver
    //
    // ODBCINST.INI NOTE:
    // This operation requires write access to odbcinst.ini file
    // located in directory pointed by ODBCINISYS variable.
    // Usually, it points to /etc, so non-root users can overwrite this
    // setting ODBCINISYS with location they have write access to, e.g.:
    // $ export ODBCINISYS=$HOME/etc
    // $ touch $ODBCINISYS/odbcinst.ini
    //
    // See: http://www.unixodbc.org/internals.html
    //
    if ( !InstallMdbDriver() )
    {
        CPLError( CE_Failure, CPLE_AppDefined, 
                  "Unable to install MDB driver for ODBC!\n" );
        return NULL;
    }

    CPLDebug( "PGeo", "MDB Tools driver installed successfully!");

#endif /* ndef WIN32 */

    // Open data source
    poDS = new OGRPGeoDataSource();

    if( !poDS->Open( pszFilename, bUpdate, TRUE ) )
    {
        delete poDS;
        return NULL;
    }
    else
        return poDS;
}

Please change the function to return data sources with zero layers.

I opened this ticket against milestone 1.6.0 because that appears to be the next release and the fix for this is easy. I'd love to see this fixed for that release but I do not presume to know your priorities. Please prioritize as appropriate. I can probably work around this in my own code.

Change History (9)

comment:1 by Even Rouault, 15 years ago

For the record, the return of OGRPGDataSource::Open() is "return nLayers > 0
bUpdate", so that's basically the same behaviour than shapefiles, except that in update mode, the PG driver accepts to return an empty datasource.

I can't think for the moment on a good reason for the current behaviour of shapefile driver, but there's maybe one

comment:2 by jjr8, 15 years ago

Thanks for taking a look at this.

I took a look at several other database drivers for the "nLayers > 0
bUpdate" behavior implemented by PG. In the following table, "yes" means the driver appears to implement the PG behavior, and "no" means it appears to implement the PGeo behavior (i.e. Driver::Open will return a data source that has zero layers):
  • IDB: no
  • Ingres: no
  • MySql: yes
  • Oracle Spatial: no
  • PGeo: no
  • SDE: no
  • SQLite: no

So, of these, most implement what I expected: Driver::Open will return a data source that has zero layers. But MySql implements the same thing as PG.

I am not sure which is the most "correct" behavior. You guys should probably figure that out, look at all the drivers and make them consistent, and update the API docs. That might be a big job :-) For now, I'd be happy if you just fixed the shapefile and PGeo drivers, since those are what I work with the most.

Thanks again!

comment:3 by warmerdam, 15 years ago

Resolution: wontfix
Status: newclosed

Shapefile datasets are usually directories. If we allow OGRShapeDriver::Open() to succeed on a directory with no shapefiles in it how will we allow any other driver that is directory oriented to work? I just don't see how we can know the directory is for shapefiles if there aren't any in it.

I'm going to close this ticket now, but feel free to reopen it if a rationale can be provided.

We should be quite careful about changes in this regard for 1.6.

comment:4 by warmerdam, 15 years ago

Keywords: shapefile added

comment:5 by jjr8, 15 years ago

Resolution: wontfix
Status: closedreopened

Thanks for responding. I'm reopening it but I expect that you will resolve it wontfix again.

Why would OGRShapeDriver::Open() succeeding on a directory with no shapefiles affect other drivers? Is it because of the ability of OGRSFDriverRegistrar::Open() to determine the correct driver for a given directory? As it stands, OGRSFDriverRegistrar::Open() works reliably only when the directory contains a single class of files. If it contains multiple classes of files (e.g. shapefiles and CSVs), it depends on which driver is attempted first. And it might be worse than that: the CSV driver, for example, works like this:

/* -------------------------------------------------------------------- */ 
/*      We presume that this is indeed intended to be a CSV             */ 
/*      datasource if over half the files were .csv files.              */ 
/* -------------------------------------------------------------------- */ 
    return bForceOpen || nNotCSVCount < nLayers; 
} 

Even if OGRSFDriverRegistrar::Open() exhibits unpredictable behavior when the directory contains multiple types of files, I can see how you'd want it to succeed with the directory contains only a single type of file. For this reason, I am betting you Want to stick with the current behavior.

What about this design:

  • For all drivers, OGRDataSource::Open() succeeds when the data source exists, even when it contains no layers. This is how most database drivers appear to work.
  • For all drivers, OGRDriver::Open() succeeds when OGRDataSource::Open() succeeds.
  • When OGRSFDriverRegistrar::Open() loops through the driver list and finds a driver for which OGRDriver::Open() succeeds, it checks the layer count. If the layer count > 0, it returns. If not, it proceeds to the next driver. If it reaches the end of the driver list without finding a driver with layers, it returns the first driver that had zero layers (if one was found).

Many drivers already enumerate layers in OGRDataSource::Open() already, so I don't think there is a huge new perf hit, except in the case of opening empty data sources. In this case, OGRSFDriverRegistrar::Open() will loop through all the drivers; today it does not. But that should be a corner case.

I expect this design constitutes too much of a change for 1.6.0, assuming you like it at all, so I expect you'll resolve as won't fix. But I urge you to consider a design like this in the future.

comment:6 by warmerdam, 15 years ago

Resolution: wontfix
Status: reopenedclosed

I'm not sure I see the need for the extra proposed complication. If we really need a way of creating a shapefile datasource, and closing it without creating any layers then I think it would be cleanest to modify the create method to create some sort of marker file in the directory to indicate it is intended to be a shapefile datasource. I prefer this to the two pass solution. One reason I don't like the two pass solution is that even if the directory contains no other files I don't think it ought to be opened successfully as a shapefile datasource. And there is reason to believe there could still be several formats which an empty directory could be treated as (shapefile, csv, etc).

Thanks for your ideas!

comment:7 by jjr8, 15 years ago

Ok, fair enough.

For the record:

It is interesting that you don't think an empty directory ought to be opened successfully as a shapefile datasource. It is certainly possible to create a shapefile datasource and then close it without creating any shapefiles. It is interesting that you think this directory should then not be openable. I think it should. I guess I just have a different view of how OGR should work (of course I concede to you guys as the original designers).

I imagined that the ability to enumerate the layers of a given type in a given data source would be a core scenario in OGR, even when no layers exist. As it stands, this scenario is not fully supported. The level of support depends on the driver. From my review of a few drivers, it appears that:

  • The scenario is supported for most database-oriented drivers
  • But for some of these, the OGR caller cannot differentiate between databases that do not exist and databases that contain no layers, (at least via the Python bindings)
  • The scenario is supported for most directory-oriented drivers, so long as the directory contains files recognized by just one driver
  • Again, for some of these, the OGR caller cannot differentiate between directories that do not exist and directories that do not contain any layers
  • If the directory contains files recognized by two drivers, or additional files not recognized by any driver, the results are unpredictable

in reply to:  7 ; comment:8 by warmerdam, 15 years ago

Replying to jjr8:

It is interesting that you don't think an empty directory ought to be opened successfully as a shapefile datasource. It is certainly possible to create a shapefile datasource and then close it without creating any shapefiles.

My suggestion - if we care enough - is for the shapefile driver to create a special marker file in the directory indicating it is a "shapefile directory". And use this to decide whether an otherwise empty directory should be treated as a shapefile datasource.

It is interesting that you think this directory should then not be openable. I think it should. I guess I just have a different view of how OGR should work (of course I concede to you guys as the original designers).

My concern is that any empty directory would be treated as a shapefile datasource. I think this is inappropriate.

I imagined that the ability to enumerate the layers of a given type in a given data source would be a core scenario in OGR, even when no layers exist. As it stands, this scenario is not fully supported. The level of support depends on the driver. From my review of a few drivers, it appears that:

  • The scenario is supported for most database-oriented drivers
  • But for some of these, the OGR caller cannot differentiate between databases that do not exist and databases that contain no layers, (at least via the Python bindings)

A failure to connect to the database should result in an error. If that is not the case then it likely ought to be corrected.

  • The scenario is supported for most directory-oriented drivers, so long as the directory contains files recognized by just one driver
  • Again, for some of these, the OGR caller cannot differentiate between directories that do not exist and directories that do not contain any layers

True, though I would claim there is no reason to think of an empty directory as being a datasource and if you do it could be of several different types (shapefile, csv, etc)

  • If the directory contains files recognized by two drivers, or additional files not recognized by any driver, the results are unpredictable

True. Here we try to muddle along with a best guess sort of logic and are somewhat subject to the order in which the drivers are called.

in reply to:  8 comment:9 by jjr8, 15 years ago

Replying to warmerdam:

Just a minor follow-up re:

A failure to connect to the database should result in an error. If that is not the case then it likely ought to be corrected.

Yes, it does result in an error, at least with the driver I'm currently working with (PGeo):

>>> from GeoEco.AssimilatedModules.osgeo import ogr
>>> driver = ogr.GetDriverByName('PGeo')
>>> driver.Open('C:\\foo.mdb')
ERROR 1: Unable to initialize ODBC connection to DSN for DRIVER=Microsoft Access Driver (*.mdb);DBQ=C:\foo.mdb,
[Microsoft][ODBC Microsoft Access Driver] Could not find file '(unknown)'.
>>>

The issue is that you cannot detect it with the Python bindings, due to the problem discussed in #2688. All the Python caller has is a None returned by driver.Open. But you indicated that #2688 should be fixed, so this specific problem will eventually go away.

Note: See TracTickets for help on using tickets.