Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1991 closed defect (fixed)

geocode really slow on PostgreSQL 9.2

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 2.0.2
Component: tiger geocoder Version: master
Keywords: history Cc:

Description

I'm not sure if this is just something I'm missing in settings with the new PostgreSQL or a fundamental issue with the new PostgreSQL planner.

I have 3 parallel windows PostgreSQL instances — 8.4 32-bit running PostGIS 1.5 (on older clunky 32-bit windows 2003 box)

on new Windows 2008 R2 same box: 9.1 64-bit running PostGIS 2.1.0 SVN, 9.2 64-bit running PostGIS 2.1.0 SVN.

On the 9.1 and 8.4, my MA data standard geocodes even with supplementary augmented parcel check completes in about 30-200ms.

On 9.2 instance with same dataset. It takes 8-30 seconds. Yap.

I compared the plans and it seems 9.2 is not using a bitmap index scan for this particular query I have. I have narrowed it to this construct.

-- 10 seconds
 WITH a AS
  	( SELECT * 
  		FROM (SELECT f.*, ad.side, ad.zip, ad.fromhn, ad.tohn,
  					RANK() OVER(ORDER BY  diff_zip(ad.zip,'02124') +  CASE WHEN lower(f.name) = lower('Evans') THEN 0 ELSE levenshtein_ignore_case(f.name, lower('Evans') )  END + 
						levenshtein_ignore_case(f.fullname, lower('Evans' || ' ' || COALESCE('St','')) ) 
						+ CASE WHEN (greatest_hn(ad.fromhn,ad.tohn) % 2)::integer = ('22' % 2)::integer THEN 0 ELSE 1 END
						+ CASE WHEN '22'::integer BETWEEN least_hn(ad.fromhn,ad.tohn) AND greatest_hn(ad.fromhn, ad.tohn) 
							THEN 0 ELSE 4 END 
							+ CASE WHEN lower('St') = lower(f.suftypabrv) OR lower('St') = lower(f.pretypabrv) THEN 0 ELSE 1 END
							+ rate_attributes(NULL, f.predirabrv,    'Evans',  f.name , 'St',    suftypabrv , NULL,    sufdirabrv, prequalabr)  
							)
						As rank
                		FROM featnames As f INNER JOIN addr As ad ON (f.tlid = ad.tlid) 
                    WHERE '25' = f.statefp AND '25' = ad.statefp 
                    	 AND lower(f.name) = lower('Evans')     AND ( ad.zip = ANY('{02123,02124,02125}'::varchar[]) )   ) AS foo ORDER BY rank LIMIT 3 ) 
        SELECT   a.tlid, a.fullname, a.name, a.predirabrv, a.suftypabrv, a.sufdirabrv, a.prequalabr, a.pretypabrv, 
                b.the_geom, tfidr, tfidl,
                a.side ,
                a.fromhn,
                a.tohn,
                a.zip,
                p.name as place

                FROM  a INNER JOIN edges As b ON (a.statefp = b.statefp AND a.tlid = b.tlid  )
                    INNER JOIN faces AS f ON ('25' = f.statefp AND ( (b.tfidl = f.tfid AND a.side = 'L') OR (b.tfidr = f.tfid AND a.side = 'R' ) )) 
                    INNER JOIN place p ON ('25' = p.statefp AND f.placefp = p.placefp )
                WHERE a.statefp = '25'  AND  b.statefp = '25' ;

Which can be worked around by switching to this:

-- 92 ms --
 WITH a AS
  	( SELECT * 
  		FROM (SELECT f.*, ad.side, ad.zip, ad.fromhn, ad.tohn,
  					RANK() OVER(ORDER BY  diff_zip(ad.zip,'02124') +  CASE WHEN lower(f.name) = lower('Evans') THEN 0 ELSE levenshtein_ignore_case(f.name, lower('Evans') )  END + 
						levenshtein_ignore_case(f.fullname, lower('Evans' || ' ' || COALESCE('St','')) ) 
						+ CASE WHEN (greatest_hn(ad.fromhn,ad.tohn) % 2)::integer = ('22' % 2)::integer THEN 0 ELSE 1 END
						+ CASE WHEN '22'::integer BETWEEN least_hn(ad.fromhn,ad.tohn) AND greatest_hn(ad.fromhn, ad.tohn) 
							THEN 0 ELSE 4 END 
							+ CASE WHEN lower('St') = lower(f.suftypabrv) OR lower('St') = lower(f.pretypabrv) THEN 0 ELSE 1 END
							+ rate_attributes(NULL, f.predirabrv,    'Evans',  f.name , 'St',    suftypabrv , NULL,    sufdirabrv, prequalabr)  
							)
						As rank
                		FROM featnames As f INNER JOIN addr As ad ON (f.tlid = ad.tlid) 
                    WHERE '25' = f.statefp AND '25' = ad.statefp 
                    	 AND lower(f.name) = lower('Evans')     AND ( ad.zip = ANY('{02123,02124,02125}'::varchar[]) )   ) AS foo ORDER BY rank LIMIT 3 ) 
        SELECT   a.tlid, a.fullname, a.name, a.predirabrv, a.suftypabrv, a.sufdirabrv, a.prequalabr, a.pretypabrv, 
                b.the_geom, tfidr, tfidl,
                a.side ,
                a.fromhn,
                a.tohn,
                a.zip,
                p.name as place

                FROM  a INNER JOIN edges As b ON (a.statefp = b.statefp AND a.tlid = b.tlid  )
                    INNER JOIN faces AS f ON ('25' = f.statefp AND ( (b.tfidl = f.tfid ) OR (b.tfidr = f.tfid ) )) 
                    INNER JOIN place p ON ('25' = p.statefp AND f.placefp = p.placefp )
                WHERE a.statefp = '25'  AND  b.statefp = '25' AND
                ( (b.tfidl = f.tfid AND a.side = 'L') OR (b.tfidr = f.tfid AND a.side = 'R' ) );

I'll encode this revision if I conclude it's not just something screwy with my setup and also report it up the chain if I conclude it is a PostgreSQL 9.2 issue, which I suspect it is.

Note: there is nothing spatial in the above query, so I think PostGIS is off the hook.

Change History (3)

comment:1 by robe, 12 years ago

Okay it seems in 9.2, reducing the join collapse limit makes it behave as fast as the old versions

e.g.:

 SET join_collapse_limit = '2';

Make both queries work at the faster speed and use the bitmapor strategy again.So not sure this is a real issue or just something to be cognizant of.

The default is 8, but if I increase above 4 it gets bad.

comment:2 by robe, 12 years ago

Resolution: fixed
Status: newclosed

Okay rather than change my code, I'm just going to set join_collapse_limit='2' for both the geocode_address and geocode_intersection functions. As this yields a dramatic improvement on my PostgreSQL 9.2 instance, and doesn't seem to reduce speed of my 8.4 instance.

fixed for 2.0 at r10311 fixed for 2.1 at r10310

comment:3 by robe, 12 years ago

Keywords: history added
Note: See TracTickets for help on using tickets.