Opened 13 years ago

Closed 12 years ago

#1077 closed task (fixed)

Regression tests for tiger geocoder

Reported by: robe Owned by: robe
Priority: high Milestone: PostGIS 2.0.0
Component: tiger geocoder Version: master
Keywords: Cc:

Description (last modified by robe)

Now that we are making so many changes to tiger geocoder I fair breaking things. Thus need to start writing up regression tests.

Normalize_address is easy since most of what is needed for normalize can run with just the pre-packaged lookup tables loaded.

Geocode is harder since it needs real data so we might have to setup samples.

Speed is important since any minor change we make can significantly impact speed, and that is even trickier to regress since its so dependent on database setup.

Anyrate we'll just start setting up tests and hand test them as we make changes until we've got a better idea.

Change History (8)

comment:1 by robe, 13 years ago

Description: modified (diff)

comment:2 by robe, 13 years ago

preliminary work at r7516

comment:3 by darkblueb, 13 years ago

Version: 1.5.Xtrunk

Is there some known set of addresses that supposed to be geocode-able?

comment:4 by darkblueb, 13 years ago

I am looking at a large dataset that is applicable to test the geocoder.. Among the fields are :

"Address" "City" "State" "Zip" "UnitNumber" "HouseNumber" "StreetPrefix" "StreetName" "StreetType" "StreetSuffix"

(the division there is my own..) I am extracting a sample now ORDER BY random() What combinations are fair game for the geocoder?

comment:5 by robe, 13 years ago

There are a couple of issues. I think anything is fairgame. If you look at Mike Pease tickets — that gives you a general sense of the pitfall areas:

http://trac.osgeo.org/postgis/query?status=assigned&status=new&status=reopened&component=tiger+geocoder&order=priority&col=id&col=summary&col=component&col=status&col=type&col=priority&col=milestone

We are working on fixing these particularly Highway and misspellings. But while doing so we need to make sure that

1) We don't slow down the geocoding of things that used to be fast by adding in more checks

2) We don't break things that used to work like addresses that returned right answers now returning wrong answers.

So as long as you have a base line of some sort — which you would from above that would be good enough.

I'm not even so concerned about random access because genrally I think for speed people will sort the data in some sort of meaning full order like zip, street etc to get faster speeds.

So first stab is just to make sure the goecoding is still right Second stab hasn't lost speed (which is trickier to test because of differences in caching behavior depening on sort, speed of server, and other server stuff happening)

comment:6 by robe, 13 years ago

Okay we have a good chunk of failures, fixes and regress example takes due in great part to Mike Pease and others. These have been added already to folder regress and tests in there normalize_regress.sql, geocode_regress.sql as well as the expected outputs after fix in normalize_regress, geocode_regress.

Brian Hamlin has provided more normalize failures based on USPS cass test suite. These we should add and tackle first the ones that prevent accurate geocoding. Below is list exerted from postgis-devel. Some of these failures are not surprising and some are even relatively harmless as far as geocoding is concerned but their behavior should be fixed and/or noted in the regress tests.

---------------------------------------------------------------

400 AVENUE I, WEST POINT, GA 31833
400 AVENUE I W, POINT, GA 31833

---------------
19596 COUNTY ROAD 480, COLCORD, OK 74338
19596 480 Co Rd, COLCORD, OK 74338

29779 STATE HIGHWAY C BOX 974, POTOSI, MO 63664
29779 C State Hwy, POTOSI, MO 63664

10559 NE STATE HIGHWAY 90, PINEVILLE, MO 64856
10559 90 State Hwy NE, PINEVILLE, MO 64856

18208 N COUNTY ROAD 241, ALACHUA, FL 32615
18208 241 Co Rd N, ALACHUA, FL 32615

4345 ROUTE 353, SALAMANCA, NY 14779
4345 353 Rte, SALAMANCA, NY 14779

19799 STATE ROUTE O, COSBY, MO 64436
19799 O State Rte, COSBY, MO 64436

------------------------------------------------------

1292 NE AVENUE B, SWEETWATER, TX 79556
1292 NE Ave, SWEETWATER, TX 79556

399 WEST AVE F, JEROME, ID 83338
399 WEST Ave, JEROME, ID 83338

------------------------------------------------------

19126-20 9TH AVE, PARKER, AZ 85344
1912620 9TH Ave, PARKER, AZ 85344

1818-307 N 40TH ST, PHOENIX, AZ 85008
1818307 N 40TH St, PHOENIX, AZ 85008

------------------------------------------------------

4D 664TH ST, NEW CASTLE, AL 35119
4 664TH St, NEW CASTLE, AL 35119

------------------------------------------------------

110 CENTER COVE I, SPICEWOOD, TX 78669
110 CENTER Cv, SPICEWOOD, TX 78669

------------------------------------------------------

492 STUYVESANT AVE # 4223 # 1330, IRVINGTON, NJ 07111
492 STUYVESANT Ave, IRVINGTON, NJ 07111

114 HAYES ML RD APT B122, ATCO, NJ 08004
114 HAYES ML Rd, APT, ATCO, NJ 08004

4906 LA BR APT A, HOUSTON, TX 77004
4906 LA Br, APT, HOUSTON, TX 77004

------------------------------------------------------

900 CITY FEDERAL BUILDING # 407, BIRMINGHAM, AL 35203
900, BUILDING # 407, BIRMINGHAM, AL 35203


---------------------------------------------------------------

400 AVENUE I, WEST POINT, GA 31833
400 AVENUE I W, POINT, GA 31833

---------------
19596 COUNTY ROAD 480, COLCORD, OK 74338
19596 480 Co Rd, COLCORD, OK 74338

29779 STATE HIGHWAY C BOX 974, POTOSI, MO 63664
29779 C State Hwy, POTOSI, MO 63664

10559 NE STATE HIGHWAY 90, PINEVILLE, MO 64856
10559 90 State Hwy NE, PINEVILLE, MO 64856

18208 N COUNTY ROAD 241, ALACHUA, FL 32615
18208 241 Co Rd N, ALACHUA, FL 32615

4345 ROUTE 353, SALAMANCA, NY 14779
4345 353 Rte, SALAMANCA, NY 14779

19799 STATE ROUTE O, COSBY, MO 64436
19799 O State Rte, COSBY, MO 64436

------------------------------------------------------

1292 NE AVENUE B, SWEETWATER, TX 79556
1292 NE Ave, SWEETWATER, TX 79556

399 WEST AVE F, JEROME, ID 83338
399 WEST Ave, JEROME, ID 83338

------------------------------------------------------

19126-20 9TH AVE, PARKER, AZ 85344
1912620 9TH Ave, PARKER, AZ 85344

1818-307 N 40TH ST, PHOENIX, AZ 85008
1818307 N 40TH St, PHOENIX, AZ 85008

------------------------------------------------------

4D 664TH ST, NEW CASTLE, AL 35119
4 664TH St, NEW CASTLE, AL 35119

------------------------------------------------------

110 CENTER COVE I, SPICEWOOD, TX 78669
110 CENTER Cv, SPICEWOOD, TX 78669

------------------------------------------------------

492 STUYVESANT AVE # 4223 # 1330, IRVINGTON, NJ 07111
492 STUYVESANT Ave, IRVINGTON, NJ 07111

114 HAYES ML RD APT B122, ATCO, NJ 08004
114 HAYES ML Rd, APT, ATCO, NJ 08004

4906 LA BR APT A, HOUSTON, TX 77004
4906 LA Br, APT, HOUSTON, TX 77004

------------------------------------------------------

900 CITY FEDERAL BUILDING # 407, BIRMINGHAM, AL 35203
900, BUILDING # 407, BIRMINGHAM, AL 35203


comment:7 by darkblueb, 13 years ago

see CASS compare output for the most recent change set download.osgeo.org:/osgeo/download/postgis/geo_cmp_rev7646.txt

comment:8 by robe, 12 years ago

Milestone: PostGIS FuturePostGIS 2.0.0
Resolution: fixed
Status: newclosed

going to mark this done since have got a decent regress test suite. It just needs to be more integrated.

Note: See TracTickets for help on using tickets.