Opened 8 years ago

Closed 8 years ago

#6541 closed task (fixed)

Explanation of what is going on in ogrgeojsonlayer.cpp

Reported by: Kurt Schwehr Owned by: Even Rouault
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: FID
Cc:

Description (last modified by Kurt Schwehr)

Hi Even,

Another Google engineer and I were taking a look at AddFeature and could use some explanation of what the code is doing. Can you help us better understand this code and how FIDs work? Maybe we can add some comments to several places in code based on this ticket.

This came up because of r34333 and r34330.

void OGRGeoJSONLayer::AddFeature( OGRFeature* poFeature )
{
    GIntBig nFID = poFeature->GetFID();

    // Detect potential FID duplicates and make sure they are eventually
    // unique.
    if( -1 == nFID )
    {
        nFID = GetFeatureCount(FALSE);
        OGRFeature* poTryFeature = NULL;
        while( (poTryFeature = GetFeature(nFID) ) != NULL )
        {
            nFID++;
            delete poTryFeature;
        }

GetFeature() returns a copy of the feature that we must delete, yes? Looking at OGRMemLayer::GetNextFeature() I see a return poFeature->Clone().

I have no idea why this block does what the comment says. It is looking for the first available FID that will be SetFID below?

Here's how he interpret it:

  • start with "nFID <-- feature count", i.e., if there is an imaginary list of features, nFID should now be the first empty position after the end of this list.
  • start a while loop with the first thing: ask for the feature at nFID
    • wait what? That must be an invalid position right after the list-of-features end
    • OK, so maybe there is a left over feature past the end of the list, and GetFeature does not care that it is asked elements past GetFeatureCount
      • hm, maybe this makes sense, seems like we proceed with nFID++ and delete every non-null element we get
  • so maybe this while loop is to clean up left-over features past GetFeatureCount?
  • But these pointers .. do they ever get nullified after they get deleted? If we ask GetFeature(GetFeatureCount()) again, it will return non-NULL value again, right?
    }
    else
    {
        OGRFeature* poTryFeature = NULL;
        if( (poTryFeature = GetFeature(nFID) ) != NULL )
        {
            if( !bOriginalIdModified_ )
            {
                CPLError(
                    CE_Warning, CPLE_AppDefined,
                    "Several features with id = " CPL_FRMT_GIB " have been "
                    "found. Altering it to be unique. This warning will not "
                    "be emitted for this layer",
                    nFID );
                bOriginalIdModified_ = true;
            }
            delete poTryFeature;
            nFID = GetFeatureCount(FALSE);
            while( (poTryFeature = GetFeature(nFID) ) != NULL )

He has some questions here:

oops, .. do we expect nFID here to be different from the nFID we have from the initial poTryFeature = GetFeature(nFID)?

If it happens to be the same, then poTryFeature = GetFeature(nFID) will return the same pointer we just deleted in line 144, no? and guess what? it won't be == NULL, so we will try to delete it again?

            {
                nFID++;
                delete poTryFeature;
            }
        }
    }
    poFeature->SetFID( nFID );

    if( !CPL_INT64_FITS_ON_INT32(nFID) )
        SetMetadataItem(OLMD_FID64, "YES");

    SetUpdatable( true );  // Temporary toggle on updatable flag.
    CPL_IGNORE_RET_VAL(OGRMemLayer::SetFeature(poFeature));
    SetUpdatable( poDS_->IsUpdatable() );
    SetUpdated( false );
}

How do FID's work and where is it documented? I don't see any explanation in ogr_feature.h/ogrfeature.cpp or ogrsf_frmts.h, which seem to be the roots of FID definition. Or in ogr_api.h (which seems a less logical location being the indirect C layer). Or these

The best I found was from SetFID() in ogrfeature.cpp.

/**
 * For specific types of features this operation may fail on illegal
 * features ids.  Generally it always succeeds.  Feature ids should be
 * greater than or equal to zero, with the exception of OGRNullFID (-1)
 * indicating that the feature id is unknown.
 */

Change History (3)

comment:1 by Kurt Schwehr, 8 years ago

Description: modified (diff)

comment:2 by Even Rouault, 8 years ago

Hum, I can see the confusion this code is causing in your brain, but I'm afraid I'm not going to be able to respond to each point ;-). I'll try to make it synthetic. In a layer, each feature is supposed to have a unique FID (Feature ID). For some datasources, this is just a serial number attributed by OGR (0,1,... or 1,2,... depending on drivers). For other datasources, the FID is stored in the dataset itself (integer primary key of a RDBMS). For GeoJSON, both situation can happen. A GeoJSON feature can have a "id" member, or not. Hence the main test if ( -1 == nFID ). Since GDAL 2.1, the GeoJSON layer relies on the Memory layer, which uses the FID has the key to store the features. If you call OGRMemLayer::SetFeature() with a feature that has the same FID than a feature already inserted with this FID, the last one will replace the previous one. The logic here tries to detect if that's going to happen and avoids that by finding a free FID. This is to accomodate for incorrect GeoJSON files where the "id" wouldn't be correctly set and there would be duplicates (this shouldn't happen for correct GeoJSON files). In the ( =1 == nFID) case using the feature count as the FID should generally succeed (if all features have no FID set). The detection of existing FID is just to take into account the case where some features would have a "id" field and others not. Hope that makes things clearer.

comment:3 by Kurt Schwehr, 8 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.