Opened 6 years ago
Closed 6 years ago
#7234 closed defect (fixed)
optimize Elasticsearch
Reported by: | tomkralidis | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | OGR_SF | Version: | unspecified |
Severity: | normal | Keywords: | Elasticsearch |
Cc: |
Description
Update ES driver to not scan all indices when context of the caller is for a single index.
From IRC freenode/gdal 2018-02-12:
14:04:51 EvenR: tomkralidis: this might be the time to establish the layer schema by analyzing the first documents. By default 100 documents are fetched. Try adding -oo FEATURE_COUNT_TO_ESTABLISH_FEATURE_DEFN=1 to limit to one single document if the schema is the same for all documents 14:07:39 tomkralidis: EvenR: thanks for the info. Same result. fwiw schema is identical for all documents 14:08:30 EvenR: tomkralidis: try CPL_TIMESTAMP=ON CPL_DEBUG=ON CPL_CURL_VERBOSE=ON ogrinfo ... 14:09:46 tomkralidis: what should I be looking for, here? 14:10:14 tomkralidis: looks like a bunch of requests back/forth 14:11:31 EvenR: can you paste the output so I have a look ? 14:29:42 tomkralidis: EvenR: https://bpaste.net/show/dd4a8a01e06b 14:29:43 sigq: Title: show at bpaste (at bpaste.net) 14:32:48 EvenR: tomkralidis: which GDAL version is it ? 14:35:18 tomkralidis: 2.2.2 14:38:46 EvenR: ok, with trunk, we now have millisecond accurate timestamps. Anyway what I can see is that the driver doesn't do "lazy loading" of layer definition, so even if ogrinfo is passed a single layer, at dataset opening time, it will establish the layer definitions of all indices. And if FEATURE_COUNT_TO_ESTABLISH_FEATURE_DEFN=1 is well taken into account on the client side, it could be more optimally be taken into account by decreasing the batch 14:38:47 EvenR: size asked to the server 14:39:31 tomkralidis: I can test against trunk also if that helps 14:39:56 EvenR: that won't be that useful. the driver hasn't changed 14:41:01 tomkralidis: our use case is serving via MapServer so I think the single layer use case is a common one? 14:41:29 tomkralidis: if a single layer is passed shouldn't GDAL/OGR _not_ scan other indices? 14:42:41 EvenR: that's an optimization done indeed in some drivers. Either by adding an extra argument in the connection string to mention the layer(s) of interest. Either by implement lazy resolving of layer definitions. Not done in ES driver yet 14:43:29 tomkralidis: bug? what is the value in scanning other indices? 14:43:30 EvenR: with a CLOSE_CONNECTION=DEFER strategy in the mapfile, the opening time should be less of an issue 14:44:45 EvenR: it is just that the current "naive" implementation is the most straightforward. When you do "ogrinfo datasource layername", basically this translates to ds = ogr.Open(datasource); lyr = ds.GetLayerByName(layername), so the driver has no idea initally of which layers will be requested
Note:
See TracTickets
for help on using tickets.
Implemented per https://github.com/OSGeo/gdal/commit/fe55a01bed92b4fe751c5d5f9ea7a5bd1addc95d (GDAL 2.4 dev)