Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#4953 closed defect (fixed)

WFS Paging Ignored

Reported by: warmerdam Owned by: Even Rouault
Priority: normal Milestone: 1.10.0
Component: OGR_SF Version: svn-trunk
Severity: normal Keywords: wfs
Cc:

Description

I have tried something like:

  ogrinfo --config OGR_WFS_PAGING_ALLOWED ON 
          --debug off http://xxx/ -al -so

However, when inspect the WFS GetFeature requests produced to complete GetFeatureCount they do not include the paging (MAXFEATURES) item.

Reviewing the code, MakeGetFeatureURL() does not appear to try and using paging when nMaxFeatures is zero which seems to be used from higher code for any unconstrained request including those from regular GetNextFeature() requests.

Briefly hacking around, this change seems to give the behavior I expected though I'm having trouble understanding some other aspects of the code intent so I'm not confident it is the right thing to do.

Index: ogr/ogrsf_frmts/wfs/ogrwfslayer.cpp
===================================================================
--- ogr/ogrsf_frmts/wfs/ogrwfslayer.cpp	(revision 25480)
+++ ogr/ogrsf_frmts/wfs/ogrwfslayer.cpp	(working copy)
@@ -390,7 +390,8 @@
                 nFeatures = ExecuteGetFeatureResultTypeHits();
             }
         }
-        if (nFeatures >= poDS->GetPageSize())
+        if (nFeatures >= poDS->GetPageSize() 
+            || (poDS->GetPageSize() != 0 && nMaxFeatures == 0))
         {
             osURL = CPLURLAddKVP(osURL, "STARTINDEX",
                 CPLSPrintf("%d", nPagingStartIndex +

Change History (5)

comment:1 by Even Rouault, 11 years ago

Milestone: 1.10.0
Resolution: fixed
Status: newclosed

I've completely removed the test, that was a useless attempt at optimizing for not issuing STARTINDEX=x&MAXFEATURES=y when it wasn't necessary

trunk r25516 "WFS: honour paging when running GetFeatureCount() and that RESULTTYPE=HITS isn't available (e.g. WFS 1.0.0) (#4953)"

comment:2 by Jukka Rahkonen, 11 years ago

I have said this before and I do understand that GDAL wants to treat WFS just as any other data source, but even this request may look light and simple it is actually a very heavy request for the WFS server.

ogrinfo wfs:http://xxx/ -al -so

I checked from my WFS server logs that this commands is leading to two GetFeatures for each feature type in the service

[QUERY] SERVICE=WFS&VERSION=1.1.0&REQUEST=GetFeature&TYPENAME=tows:osm_point&RESULTTYPE=hits
[QUERY] SERVICE=WFS&VERSION=1.1.0&REQUEST=GetFeature&TYPENAME=tows:osm_point

I have 106 feature types in my service, many of them loaded with attribute data and containing more than 100000 features which is the server side maxFeatures limit. So, 212 GetFeature requests, 106 of those of resulttype=results and total number of features sent to ogrinfo must be several millions. It is rather a lot for a simple "all layers, summary only" request.

comment:3 by warmerdam, 11 years ago

I'm not sure this ticket is the best place to discuss this topic, but a few things occur to me.

1) There may be cheaper ways to get some of the required information (schema, feature count and bounds).

2) Perhaps we should use a model like WCS where there is an XML service file in which much of this information is cached between sessions instead of always starting blind.

3) It would be nice to be able to indicate the layer(s) of interest in the datasource name to restrict access to avoid having to build detailed information about them all.

Possibly some of the above already exist and I don't know. I have only a very thin knowledge of the OGR WFS driver.

comment:4 by Even Rouault, 11 years ago

1) Schema and feature count are relatively cheap to obtain normally (DescribeFeatureType and GetFeature&resulttype=hits for WFS 1.1). Bounds are expensive. They can be obtained from the GetCapabilities, but they are in WGS84, so when the default SRS is not WGS84, we could reproject the WGS84 bounds to the SRS bounds, but that's not an accurate value, hence the current full analysis of GetFeature. Furthermore, I've often noticed quite a few times that bounds returned by servers weren't really reliable. Well, perhaps a config option could be used to tell "trust the advertized bounds and reproject them if needed so that GetExtent() is fast" ?

2) This already exists and is indeed really similar to the WCS model. See the following § of the doc

 It is also possible to specify the name of an XML file whose content matches the following syntax (the <OGRWFSDataSource> element must be the first bytes of the file):

<OGRWFSDataSource>
    <URL>http://path/to/WFS/service[?OPTIONAL_PARAMETER1=VALUE[&amp;OPTIONNAL_PARAMETER2=VALUE]]</URL>
</OGRWFSDataSource>

Note: the URL must be XML-escaped, for example the & character must be written as &amp;

At the first opening, the content of the result of the GetCapabilities request will be appended to the file, so that it can be cached for later openings of the dataset. The same applies for the DescribeFeatureType request issued to discover the field definition of each layer.

3) Possible by appending "&TYPENAME=the_layer_name" and/or editing the XML service file by removing the unwanted layers from the cached GetCapabilities

comment:5 by Jukka Rahkonen, 11 years ago

My point it that while "ogrinfo -al -so" is the command that I automatically write when I want the check a shapefile, with the current behavior it is too heavy tool for checking big WFS services. It is also inevitably slow with big layers. From the service admin point of view it would be good to make is a bit harder for the users to fire a full scan of the whole service. "Summary only" should try to be kind for the server if not used with a "detailed_summary=true" option.

1) My experience is that WFS feature type bounds are more often wrong than exactly right in GetCapabilities. However, it is not always necessary to know the exact bounds and "trust advertized" might be the default. In some cases it can also be more correct than what is obtained with the "analyze GetFeature" method. If the server does not support paging and it has MaxFeatures set on the server side GetFeature reads only n first features and very often features are sorted spatially in the datastore that is behind WFS.

3) I guess you think that there is some advantage for the programmers in having the selected layers in datasource name instead of using "datasource layer [layer]" way. For me as a user, once I know which layer names to use it is about as easy to give

ogrinfo wfs:http://xxx tows:france lv:municipalities -so

Note: See TracTickets for help on using tickets.