Opened 6 years ago

Closed 6 years ago

#7234 closed defect (fixed)

optimize Elasticsearch

Reported by: tomkralidis Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Elasticsearch
Cc:

Description

Update ES driver to not scan all indices when context of the caller is for a single index.

From IRC freenode/gdal 2018-02-12:

14:04:51	EvenR:	tomkralidis: this might be the time to establish the layer schema by analyzing the first documents. By default 100 documents are fetched. Try adding -oo FEATURE_COUNT_TO_ESTABLISH_FEATURE_DEFN=1 to limit to one single document if the schema is the same for all documents
14:07:39	tomkralidis:	EvenR: thanks for the info. Same result. fwiw schema is identical for all documents
14:08:30	EvenR:	tomkralidis: try CPL_TIMESTAMP=ON CPL_DEBUG=ON CPL_CURL_VERBOSE=ON ogrinfo ...
14:09:46	tomkralidis:	what should I be looking for, here?
14:10:14	tomkralidis:	looks like a bunch of requests back/forth
14:11:31	EvenR:	can you paste the output so I have a look ?
14:29:42	tomkralidis:	EvenR: https://bpaste.net/show/dd4a8a01e06b
14:29:43	sigq:	Title: show at bpaste (at bpaste.net)
14:32:48	EvenR:	tomkralidis: which GDAL version is it ?
14:35:18	tomkralidis:	2.2.2
14:38:46	EvenR:	ok, with trunk, we now have millisecond accurate timestamps. Anyway what I can see is that the driver doesn't do "lazy loading" of layer definition, so even if ogrinfo is passed a single layer, at dataset opening time, it will establish the layer definitions of all indices. And if FEATURE_COUNT_TO_ESTABLISH_FEATURE_DEFN=1 is well taken into account on the client side, it could be more optimally be taken into account by decreasing the batch
14:38:47	EvenR:	size asked to the server
14:39:31	tomkralidis:	I can test against trunk also if that helps
14:39:56	EvenR:	that won't be that useful. the driver hasn't changed
14:41:01	tomkralidis:	our use case is serving via MapServer so I think the single layer use case is a common one?
14:41:29	tomkralidis:	if a single layer is passed shouldn't GDAL/OGR _not_ scan other indices?
14:42:41	EvenR:	that's an optimization done indeed in some drivers. Either by adding an extra argument in the connection string to mention the layer(s) of interest. Either by implement lazy resolving of layer definitions. Not done in ES driver yet
14:43:29	tomkralidis:	bug? what is the value in scanning other indices?
14:43:30	EvenR:	with a CLOSE_CONNECTION=DEFER strategy in the mapfile, the opening time should be less of an issue
14:44:45	EvenR:	it is just that the current "naive" implementation is the most straightforward. When you do "ogrinfo datasource layername", basically this translates to ds = ogr.Open(datasource); lyr = ds.GetLayerByName(layername), so the driver has no idea initally of which layers will be requested

Change History (1)

comment:1 by Even Rouault, 6 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.