Ticket #3453 (closed enhancement: worksforme)

Opened 2 years ago

Last modified 2 years ago

Enable "Use estimated table metadata" by default

Reported by: arencambre Owned by: nobody
Priority: major: does not work as expected Milestone: Version 1.7.0
Component: Data Provider Version: 1.6.0
Keywords: Cc: aren@…
Platform Version: 7 Platform: Windows
Must Fix for Release: No Awaiting user input: no

Description

Postgis layer loads start out with this query:

select distinct case when geometrytype("the_geom") IN ('POINT','MULTIPOINT') THEN 'POINT' when geometrytype("the_geom") IN ('LINESTRING','MULTILINESTRING') THEN 'LINESTRING' when geometrytype("the_geom") IN ('POLYGON','MULTIPOLYGON') THEN 'POLYGON' end from [databaseGoesHere]

I think this is needlessly expensive for my 12,000,000 row Postgis dataset for two reasons:

  • DISTINCT isn't necessary because my dataset has no duplicates even though some rows may have identical locations. Do GIS datasets routinely have duplicate data? At least let me override DISTINCT before loading large datasets. I am now worried that other datasets may have had objects missed that had identical locations.
  • Not sure why each geometry object is being checked for geometry type. QGIS knows the geometry type before the Postgis layer is even loaded thanks to the geometry_columns table.

Change History

in reply to: ↑ description   Changed 2 years ago by jef

Isn't that your view? views usually don't have an entry in geometry_columns. Additionally the query you mention should only run, when the geometry type of the layer is not known beforehand (ie. GEOMETRY instead of a specific type). And with large datasets you probably want to enable "use estimated data", which should limit the query to deduce the geometry type to the first 100 rows.

follow-up: ↓ 3   Changed 2 years ago by arencambre

  • platform changed from Debian to Windows
  • summary changed from Postgis layer load starts out with unnecessarily expensive query to "Add Postgis Tables(s)" dialog causes expensive query

Thanks. Further analysis yields a correction: The query happens when the Add Postgis Tables(s) dialog appears after clicking Add PostGIS Layer. While I'm waiting for this interminable query to end, the dialog's Type column says Waiting for most the candidate layers.

And yes, it is issuing that query against the view.

If the query could be limited to a reasonable timeframe (LIMIT 100?), that would be a satisfactory resolution to this issue.

The query doesn't stop if I close the dialog. QGIS hangs until I manually cancel the query through PGAdmin III's Server Status utility.

in reply to: ↑ 2   Changed 2 years ago by jef

Replying to arencambre:

If the query could be limited to a reasonable timeframe (LIMIT 100?), that would be a satisfactory resolution to this issue.

Like when checking "Use estimated table metadata" in the postgis connection settings?

follow-up: ↓ 5   Changed 2 years ago by arencambre

OK, that fixed it, but is there any reason for this not to be enabled by default, or use some kind of autosensing to make it happen for very large tables? Again, is it likely for views to contain different geometry types?

in reply to: ↑ 4   Changed 2 years ago by jef

  • status changed from new to closed
  • resolution set to fixed

Replying to arencambre:

OK, that fixed it, but is there any reason for this not to be enabled by default, or use some kind of autosensing to make it happen for very large tables? Again, is it likely for views to contain different geometry types?

It's possible. And autosensing is what the query is about.

follow-up: ↓ 7   Changed 2 years ago by arencambre

  • status changed from closed to reopened
  • type changed from bug to enhancement
  • resolution fixed deleted
  • summary changed from "Add Postgis Tables(s)" dialog causes expensive query to Enable "Use estimated table metadata" by default

Sorry, meant autosensing for very long queries. But that might get complex.

Better yet, is there any reason why Use estimated table metadata shouldn't be enabled by default? If it's likely that views will have a single geometry type in a geometry column, then this expensive query has a real cost and possibly no benefit for most users.

in reply to: ↑ 6   Changed 2 years ago by jef

  • status changed from reopened to closed
  • resolution set to worksforme

Replying to arencambre:

Better yet, is there any reason why Use estimated table metadata shouldn't be enabled by default? If it's likely that views will have a single geometry type in a geometry column, then this expensive query has a real cost and possibly no benefit for most users.

Well, it produces unreliable results - we need unique geometry types and the estimations don't assure this.

Judging from the user comments so far large datasets and views are uncommon. So we don't know about save assumptions for views.

I consider this an edge case and like to keep the defaults as is.

  Changed 2 years ago by arencambre

That's fine. Thanks for being willing to discuss.

Note: See TracTickets for help on using tickets.