Ticket #1669 (closed defect: wontfix)

Opened 16 months ago

Last modified 3 months ago

Missing street type w/ Internal component makes geocode() so SLOOOW

Reported by: mikepease Owned by: robe
Priority: medium Milestone: PostGIS 2.1.0
Component: tiger geocoder Version: trunk
Keywords: Cc: woodbri

Description

It appears there are certain types of address strings that make the geocode() function run 100 - 1000X slower than "normal addresses".

Consider these variations on
51 Nicollet Ave FL 4, Minneapolis, MN 55402

select (addy).*, rating from geocode('651 Nicollet Ave FL 4, Minneapolis, MN 55402') -- Fast[[BR]]

select (addy).*, rating from geocode('651 Nicollet FL 4, Minneapolis, MN 55402') -->SLOW! 60+sec

select (addy).*, rating from geocode('651 Nicollet, FL 4, Minneapolis, MN 55402') -->SLOW! 60+sec

select (addy).*, rating from geocode('651 Nicollet, Minneapolis, MN 55402') --Fast[[BR]]

From what I can surmise, it seems that if you have an address that DOESN'T specify the street type but DOES specify an internal component, then the big slow-down happens.

When I run a list of several thousand addresses of medium quality, I run into dozens or more of these slow addresses and it ends up taking a majority of the time to run through these addresses.

Say I have a list of 10,000 medium-quality addresses. Maybe 9,950 of them will run fine, but just 50 of them go slow. This tiny minority ends up taking the vast majority of the batch time.

9,950 x 0.1 sec = ~15 min. 50 x 90 sec = 75 min. Total ~90 min.

Given the effect this has on running through a list, I think it's important to find a fix for this.

Change History

Changed 16 months ago by robe

  • owner changed from pramsey to robe
  • component changed from postgis to tiger geocoder

Changed 13 months ago by robe

  • milestone changed from PostGIS 2.0.1 to PostGIS 2.1.0

Changed 9 months ago by robe

  • version changed from 1.5.X to trunk

I can probably put a timeout on these. I'll play with that. I recall playing with that but unfortunately I think it might rollback a whole batch process which isn't ideal.

Changed 8 months ago by woodbri

  • cc woodbri added

Changed 6 months ago by woodbri

Regina,

Using the PAGC tools that I wrapped into postgresql, I can handle all of these.  http://tinyurl.com/bxpnnvc

Changed 3 months ago by robe

  • status changed from new to closed
  • resolution set to wontfix
Note: See TracTickets for help on using tickets.