Opened 9 years ago

Closed 9 years ago

#3227 closed enhancement (fixed)

Support for Tiger 2015

Reported by: robe Owned by: robe
Priority: high Milestone: PostGIS 2.2.0
Component: tiger geocoder Version: master
Keywords: history Cc:

Description

Tiger 2015 data came out a couple of days ago

ftp://ftp2.census.gov/geo/tiger/TIGER2015/

I asked on postgis devel a while agao that if it came out before PostGIS 2.2 release, can I package it in. Bborie was the only one to respond and was okay with it. So I take it that though this is an enhancement I can push it in :)

Change History (21)

comment:1 by robe, 9 years ago

On a bit of a down note, it seems they don't have all the files up yet, but I'm confident they will before we release :).

comment:2 by robe, 9 years ago

Resolution: fixed
Status: newclosed

Looks like a good chunk of the states are up. I only tested for MA and made some corrections to tabblock. Everything else seemed same.

Committed at r13906

comment:3 by darkblueb, 9 years ago

Resolution: fixed
Status: closedreopened

I tried walking through the install, using the "no extension" version of the dev docs.. I got stuck.. and, I have done this before.. close #3227 and open a docs ticket perhaps, BUT so far no one has verified this fix, so closing the ticket is not good practice..

# Here are some notes on the process

NOTES for robe2 on TIGER 2015
14aug15  -dbb
---------------------------------------

online docs:
  http://postgis.net/docs/manual-dev/postgis_installation.html#loading_extras_tiger_geocoder
  following the no-extension part of the doc...

BUG:
  actual path is 
    extras/tiger_geocoder
  doc shows 
    postgis_trunk$ cd postgis-2.2.0dev/extras/tiger_geocoder/tiger_2011

  -- dir structure is new, doc needs updating

##-------------------------------------------------

ENHANCEMENT:

  in the user-configured script tiger_loader_2015.sql

default location on Debian/ubuntu for $BIN is
    /usr/lib/postgresql/9.4/bin/
    /usr/lib/postgresql/9.X/bin/

PSQL=${PGBIN}/psql
  shoud be 
PSQL=`which psql`

  similarly with create_geocode.sh

PSQL_CMD=`which psql`
PGCONTRIB=/usr/share/postgresql/9.4/contrib


BUG:
  create_geocode.sh should end with a newline 

   
BUG
tl_test0=# SELECT pprint_addy(normalize_address('202 East Fremont Street, Las Vegas, Nevada 89101')) As pretty_address;
ERROR:  function normalize_address(unknown) does not exist
LINE 1: SELECT pprint_addy(normalize_address('202 East Fremont Stree...
                           ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

  at this stage in the sequence, the address_standardizer has not been loaded yet..
  next, referring to :
http://postgis.net/docs/manual-dev/postgis_installation.html#installing_pagc_address_standardizer

  line shows
tl_test0=# create extension address_standardizer_data_us;
CREATE EXTENSION

  BUT, at this stage.. there is no normalize_address()  ?!?  where does it happen ?
  
tl_test0=# \df norm*
                       List of functions
 Schema | Name | Result data type | Argument data types | Type 
--------+------+------------------+---------------------+------
(0 rows)


  at this point in the instructions .. I have :

* created the database, used CREATE EXTENSION
    postgis;
    address_standardizer;
    address_standardizer_data_us

* I edited the file tiger_loader_2015.sql with local settings

* executed  ./create_geocode.sh    which executes 
     create_geocode.sql
     tiger_loader_2015.sql

  where is normalize_address() ??

##-------------------------------------------------

comment:4 by darkblueb, 9 years ago

hmm dump and restore to a 9.3 pg, normalize_address() is found.. connect back to the original test db, normalize_address() is found..

tl_test0=# \df norm*
                                   List of functions
 Schema |       Name        | Result data type |      Argument data types      |  Type  
--------+-------------------+------------------+-------------------------------+--------
 tiger  | normalize_address | norm_addy        | in_rawinput character varying | normal
(1 row)

comment:5 by robe, 9 years ago

Brian,

Thanks for the input. I'll address these issues later this weekend.

comment:6 by robe, 9 years ago

Brian the create_geogcode.sql runs the normalize_address install in addition to other things. Are you all set with that. Wasn't clear from above.

I committed changes at r13907 to address some of the issues you raised. Also changed the default paths for sh to be those of ubuntu/debian. I figure a large majority of Linux users are Ubuntu these days so might as well pattern after that. Eventually I might just add more profiles for different OS since the whole OS thing is a misnomer, and may add windows powershell for those who prefer to use that over standard windows dos shell.

I'm going to spin up my test Ubuntu server and try to test on that before release.

comment:7 by darkblueb, 9 years ago

in the newest setup, when/where does tables/lookup_tables_xxx.sql get executed ?

comment:8 by darkblueb, 9 years ago

should ${PSQL} ${PGDATABASE} -c "CREATE TABLE tiger_data.state_all(CONSTRAINT pk_state_all PRIMARY KEY (statefp),CONSTRAINT uidx_state_all_stusps UNIQUE (stusps), CONSTRAINT uidx_state_all_gid UNIQUE (gid) ) INHERITS(state); "

be ... INHERITS( tiger.state)

comment:9 by darkblueb, 9 years ago

user@debian8:/gisdata$ cd temp
user@debian8:/gisdata/temp$ ${SHP2PGSQL} -D -c -s 4269 -g the_geom   -W "latin1" tl_2015_us_state.dbf tiger_staging.state | ${PSQL} ${PGDATABASE} -q
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
                        addgeometrycolumn                         
------------------------------------------------------------------
 tiger_staging.state.the_geom SRID:4269 TYPE:MULTIPOLYGON DIMS:2 
(1 row)

ERROR:  value too long for type character varying(2)
CONTEXT:  COPY state, line 1, column region: "    [shp2pgsql-core.c:FindPolygons:567] FindPolygons[0]: allocated space for 1 rings"
ERROR:  relation "tiger_staging.state" does not exist

comment:10 by robe, 9 years ago

The tables/lookup.. part hasn't changed. It get's run as part of the create_geocode.sql and that sets the search path to tiger, so having to do tiger.state should not be necessary.

Now why you are getting the above error is a complete mystery to me. Do you get the same issues if you install tiger geocoder via extensions with :

CREATE EXTENSION postgis_tiger_geocoder;

I was thinking of getting rid of the manual scripts since for PostGIS 2.2, we only support PostgreSQL 9.1+, and the manual script is just one more set of things I got to worry about and test.

comment:11 by robe, 9 years ago

Brian,

On second thought, I guess just in case users have another state table in their search path, probably a good idea to schema qualify all these tables. I don't plan on changing the schema from tiger for a while so that seems safe.

Now regarding your issue above, I just tried reloading my state and works fine (though perhaps you have another state table and its inheriting from the wrong one). Here is my output for compare:

c:\gisdata\ftp2.census.gov\geo\tiger\TIGER2015\STATE>"C:\Program Files\7-Zip\7z.exe" e c:\gisdata\ftp2.census.gov\geo\tiger\TIGER2015\STATE\tl_2015_us_state.zip  -o\gisdata\temp\

7-Zip [64] 9.38 beta  Copyright (c) 1999-2014 Igor Pavlov  2015-01-03

Processing archive: c:\gisdata\ftp2.census.gov\geo\tiger\TIGER2015\STATE\tl_2015_us_state.zip

Extracting  tl_2015_us_state.cpg
Extracting  tl_2015_us_state.dbf
Extracting  tl_2015_us_state.prj
Extracting  tl_2015_us_state.shp
Extracting  tl_2015_us_state.shp.ea.iso.xml
Extracting  tl_2015_us_state.shp.iso.xml
Extracting  tl_2015_us_state.shp.xml
Extracting  tl_2015_us_state.shx

Everything is Ok

Files: 8
Size:       14045142
Compressed: 8780814

Kernel  Time =     0.000 =    0%
User    Time =     0.156 =   91%
Process Time =     0.156 =   91%    Virtual  Memory =      2 MB
Global  Time =     0.170 =  100%    Physical Memory =      5 MB

c:\gisdata\ftp2.census.gov\geo\tiger\TIGER2015\STATE>cd \gisdata\temp\

c:\gisdata\temp>"C:\ming64gcc48\projects\postgresql\rel\pg9.4w64gcc48edb\bin\psql" -c "CREATE TABLE tiger_data.state_all(CONSTRAINT pk_state_all PRIMARY KEY (statefp),CONSTRAINT uidx_state_all_stusps  UNIQUE (stusps), CONSTRAINT uidx_state_all_gid UNIQUE (gid) ) INHERITS(state); "
CREATE TABLE

c:\gisdata\temp>"C:\ming64gcc48\projects\postgresql\rel\pg9.4w64gcc48edb\bin\shp2pgsql" -D -c -s 4269 -g the_geom   -W "latin1" tl_2015_us_state.dbf tiger_staging.state   | "C:\ming64gcc48\projects\postgresql\rel\pg9.4w64gcc48edb\bin\psql"
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
SET
SET
BEGIN
CREATE TABLE
ALTER TABLE
                        addgeometrycolumn
------------------------------------------------------------------
 tiger_staging.state.the_geom SRID:4269 TYPE:MULTIPOLYGON DIMS:2
(1 row)

COPY 56
COMMIT
ANALYZE

c:\gisdata\temp>"C:\ming64gcc48\projects\postgresql\rel\pg9.4w64gcc48edb\bin\psql" -c "SELECT loader_load_staged_data(lower('state'), lower('state_all')); "
NOTICE:  INSERT INTO tiger_data.state_all(region,division,statefp,statens,stusps,name,lsad,mtfcc,funcstat,aland,awater,intptlat,intptlon,the_geom) SELECT region,division,statefp,statens,stusps,name,lsad,mtfcc,funcstat,aland,awater,intptlat,intptlon,the_geom FROM tiger_staging.state;
CONTEXT:  SQL function "loader_load_staged_data" statement 1
 loader_load_staged_data
-------------------------
                      56
(1 row)



c:\gisdata\temp>"C:\ming64gcc48\projects\postgresql\rel\pg9.4w64gcc48edb\bin\psql" -c "CREATE INDEX tiger_data_state_all_the_geom_gist ON tiger_data.state_all USING gist(the_geom);"
CREATE INDEX

comment:12 by robe, 9 years ago

Okay I updated the code to schema qualify the tables on creation at r13913

See if that fixes your issue.

comment:13 by darkblueb, 9 years ago

hmm problems .. here is SETUP

user@debian8:/home/shared/srcs_db8/postgis_trunk$ svn up
Updating '.':
G    extras/tiger_geocoder/tiger_loader_2015.sql
 ...
Updated to revision 13913.
 ...
user@debian8:/home/shared/srcs_db8/postgis_trunk$ make distclean
 ...
CFLAGS='-g -O0' CXXFLAGS='-g -O0' ./configure --enable-profile --enable-debug
 ...
make && sudo make install
 ...
shell$ createdb tl_test_ext ; psql tl_test_ext 
tl_test_ext=# create extension postgis;
tl_test_ext=# create extension address_standardizer;
tl_test_ext=# create extension address_standardizer_data_us ;
tl_test_ext=# create extension fuzzystrmatch ;
tl_test_ext=# create extension postgis_tiger_geocoder;

tl_test_ext=# SELECT na.address, na.streetname,na.streettypeabbrev, na.zip
tl_test_ext-# FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;
 address | streetname | streettypeabbrev |  zip  
---------+------------+------------------+-------
       1 | Devonshire | Pl               | 02109
(1 row)
tl_test_ext=# SELECT Loader_Generate_Nation_Script('sh');
...

here is RESULT execute loader script line by line.. gets a few lines down then stuck…

user@debian8:/gisdata/temp$ ls
tl_2015_us_state.cpg  tl_2015_us_state.prj  tl_2015_us_state.shp.ea.iso.xml  tl_2015_us_state.shp.xml
tl_2015_us_state.dbf  tl_2015_us_state.shp  tl_2015_us_state.shp.iso.xml     tl_2015_us_state.shx
user@debian8:/gisdata/temp$ ${PSQL} -c "CREATE TABLE tiger_data.state_all(CONSTRAINT pk_state_all PRIMARY KEY (statefp),CONSTRAINT uidx_state_all_stusps  UNIQUE (stusps), CONSTRAINT uidx_state_all_gid UNIQUE (gid) ) INHERITS(tiger.state); "
CREATE TABLE
user@debian8:/gisdata/temp$ ${SHP2PGSQL} -D -c -s 4269 -g the_geom   -W "latin1" tl_2015_us_state.dbf tiger_staging.state | ${PSQL}
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
SET
SET
BEGIN
CREATE TABLE
ALTER TABLE
                        addgeometrycolumn                         
------------------------------------------------------------------
 tiger_staging.state.the_geom SRID:4269 TYPE:MULTIPOLYGON DIMS:2 
(1 row)

ERROR:  value too long for type character varying(2)
CONTEXT:  COPY state, line 1, column region: "    [shp2pgsql-core.c:FindPolygons:567] FindPolygons[0]: allocated space for 1 rings"
ROLLBACK
ERROR:  relation "tiger_staging.state" does not exist

comment:14 by darkblueb, 9 years ago

Last edited 9 years ago by darkblueb (previous) (diff)

comment:15 by robe, 9 years ago

Hmm do you have debugging enabled in your postgis build — what's this that shows on your code:

[shp2pgsql-core.c:FindPolygons:567] FindPolygons[0]: allocated space for 1 rings"

Looks like it's echoing debug statements which perhaps are getting in the way of the copy command. I forget when I change from insert to copy mode. It might have been in 2.2 I suppose. Before they were inserts (which was much slower)

Version 0, edited 9 years ago by robe (next)

comment:16 by mattybe, 9 years ago

Could the columns in tiger.faces be updated with this? I still see cd108fp, cd111fp, statefp00, etc. Some columns missing from current faces files are cd114fp, statefp10, blkgrpce10, etc. Details of the current columns are on page 3-66 here: http://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2015/TGRSHP2015_TechDoc.pdf

comment:17 by robe, 9 years ago

mattybe,

I never bothered with those since I don't use them for geocoding and they change every year. Are you using faces for something else in addition to geocoding?

comment:18 by mattybe, 9 years ago

Yeah, we're interested in getting the census blocks etc. They were accurate in the geocoder at some point! I wouldn't be opposed to submitting a PR, but it's a bit confusing to tell where columns are defined or exempted. If it's helpful, here's the full 2015 format:

{ 
          tfid: 'INTEGER', statefp10: 'VARCHAR (2)', countyfp10: 'VARCHAR (3)', tractce10: 'VARCHAR (6)', blkgrpce10: 'VARCHAR (1)',
          blockce10: 'VARCHAR (4)', suffix1ce: 'VARCHAR(1)', zcta5ce10: 'VARCHAR (5)', uace10: 'VARCHAR(5)', pumace10: 'VARCHAR(5)',
          statefp: 'VARCHAR (2)', countyfp: 'VARCHAR (3)', tractce: 'VARCHAR (6)', blkgpce: 'VARCHAR (1)',
          cousubfp: 'VARCHAR (5)', submcdfp: 'VARCHAR (5)', estatefp: 'VARCHAR (5)', conctyfp: 'VARCHAR (5)',
          placefp: 'VARCHAR (5)', aiannhfp: 'VARCHAR (5)', aiannhfp: 'DECIMAL (5,0)', aiannhce: 'VARCHAR (4)',
          comptyp: 'VARCHAR (1)', trsubfp: 'DECIMAL (5,0)', trsubce: 'VARCHAR (3)', anrcfp: 'VARCHAR (5)',
          ttractce: 'VARCHAR (6)', tblkgpce: 'VARCHAR (1)', elsdlea: 'VARCHAR (5)', scsdlea: 'VARCHAR (5)',
          unsdlea: 'VARCHAR (5)', cd114fp: 'VARCHAR (10)', sldust: 'VARCHAR (3)', sldlst: 'VARCHAR (3)',
          csafp: 'VARCHAR (3)', cbsafp: 'VARCHAR (5)', metdivfp: 'VARCHAR (5)', cnectafp: 'VARCHAR (3)',
          nectafp: 'VARCHAR (5)', nctadvfp: 'VARCHAR (10)', lwflag: 'VARCHAR (1)', offset: 'VARCHAR (1)', 
          atotal: 'INTEGER', intptlat: 'VARCHAR (11)', intptlon: 'VARCHAR(12)' 
}

I believe it's just a matter of putting these keys in the right place, but I'm a little confused about the COALESCE(exclude_columns, … lines

Last edited 9 years ago by mattybe (previous) (diff)

comment:19 by robe, 9 years ago

Unfortunately it's a bit more complicated than that. I have to update the table structures in tiger schema as well and put in logic for upgrading.

Since we are close to release on PostGIS 2.2 (and this is a structural change), I'm afraid it's a no go for 2.2. I'll put this in as a separate ticket.

comment:20 by robe, 9 years ago

Ticketed for future consideration at #3247

comment:21 by robe, 9 years ago

Keywords: history added
Resolution: fixed
Status: reopenedclosed

I'm closing this out. I've tried several states and seems to work fine for me. Brian - after you get rid of debugging, if it still doesn't work for you, feel free to reopen.

Note: See TracTickets for help on using tickets.