Opened 3 years ago

Closed 2 years ago

#3698 closed defect (fixed)

CENSUS Download path for TABBLOCK

Reported by: EvanCarroll Owned by: robe
Priority: high Milestone: PostGIS 2.3.4
Component: tiger geocoder Version: 2.3.x
Keywords: Cc:

Description

Name | postgis_tiger_geocoder Version | 2.3.1 Schema | tiger Description | PostGIS tiger geocoder and reverse geocoder


When I generate a download scripts with

SELECT Loader_Generate_Script(ARRAYTX?, 'sh');

I'm getting

unzip: cannot find or open tl_*_48*_edges*.zip, tl_*_48*_edges*.zip.zip or tl_*_48*_edges*.zip.ZIP. No zipfiles found.

This seems to be because earlier in the download, I get

--2017-01-25 15:16:05-- http://www2.census.gov/geo/tiger/TIGER2016/TABBLOCK/tl_2016_48_tabblock.zip Resolving www2.census.gov (www2.census.gov)... 23.11.58.87 Connecting to www2.census.gov (www2.census.gov)|23.11.58.87|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2017-01-25 15:16:10 ERROR 404: Not Found.

tx.sh: 56: cd: can't cd to /gisdata/www2.census.gov/geo/tiger/TIGER2016/TABBLOCK

Checking census it's not

http://www2.census.gov/geo/tiger/TIGER2016/TABBLOCK/tl_2016_48_tabblock.zip

instead, it's

http://www2.census.gov/geo/tiger/TIGER2016/TABBLOCK/tl_2016_48_tabblock10.zip


I can probably fix this kind of stuff. I really wish the build system for this wasn't tied together with PostGIS. You're making it very hard to send a patch this way. I'm pretty capable and handy, but the learning curve is very high in getting a PostGIS development environment working.

Change History (18)

comment:1 Changed 3 years ago by robe

Milestone: PostGIS 2.4.0PostGIS 2.3.2

comment:2 Changed 3 years ago by EvanCarroll

Summary: Problem with Geocoder.CENSUS Download directory for TABBLOCK

comment:3 Changed 3 years ago by EvanCarroll

Summary: CENSUS Download directory for TABBLOCKCENSUS Download path for TABBLOCK

comment:4 Changed 3 years ago by EvanCarroll

It seems that this was changed a long time ago. In 2013 this worked

http://www2.census.gov/geo/tiger/TIGER2013/TABBLOCK/tl_2013_48_tabblock.zip

However, in 2014, and on the convention changed..

http://www2.census.gov/geo/tiger/TIGER2014/TABBLOCK/tl_2014_48_tabblock10.zip

comment:5 Changed 3 years ago by robe

Milestone: PostGIS 2.3.2PostGIS 2.4.0

This is a bit more involved than I thought since it doesn't follow the convention folder_name/..folder_name.zip. So I'll deal with this when they release 2017 data and just disable the download in the loader tables. They probably put a 10 in it because it hasn't been changed since 2010.

The tabblock tables are not used for anything so probably a useless bit of data for people only interested in geocoding.

The only reason I included it way back is the machinery to download tabblock was already there for folks who needed to do census block research. It should always have been disabled by default and require users to explicitly go into the tiger.lookup_tables to enable it.

Regarding whether tiger geocoder should be part of PostGIS or not. Historically it always has been, and before extension building, it was pretty detached in the extras/tiger_geocoder folder You can still build and install it that way by just making changes to the files in extras/tiger_geocoder and using the shell scripts.

The only part the tiger geocoder that relies on the PostGIS build is wrapping it as an extension. That itself could probably be detached so that if you really wanted to build just the geocoder extension file, you could cd into the extensions folder and just build that. I could at the very least provide instructions on how to hack the scripts in extensions folder to do that.

One great benefit of having it as part of PostGIS is that cloud offerings like Amazon RDS and Heroku build it and so do all the various package maintainers, Ubuntu, Yum etc. If it wasn't part of PostGIS I think packagers would be less inclined to go out of their way to build it and much fewer people would use it.

comment:6 Changed 3 years ago by robe

oops put the wrong number on commit again. r15293 disables the load for PostGIS 2.3.2

Last edited 3 years ago by robe (previous) (diff)

comment:7 Changed 3 years ago by robe

In 15294:

Turn off loading of tabblock by default. It's been broken since 2014 and not used by the geocoder
References #3698 for PostGIS 2.4

comment:8 Changed 3 years ago by EvanCarroll

Very cool robe, I think of a guide on how to contribute to the plugin would be very much appreciated. I had a difficulty building it the last time. I still think you'd get more users and more active development if this wasn't part of core. And, we could distribute it separately through pgxn or something rather than having to have users pry it out of a zip file to upgrade the Census data without upgrading PostGIS.

http://dba.stackexchange.com/q/154209/2639 http://dba.stackexchange.com/q/154910/2639 http://dba.stackexchange.com/a/154909/2639

In other news, Census.gov is now blacklisting if you download more than 30 GB. I'm not sure if we should do anything about it. I'm currently in touch with the Census on getting it resolved for myself. If I can get the contacts I'll see if they're down for adding a perma-whitelist for PostGIS loader based on useragent or query-string. Akamai has now blocked two of my IPS, so I'm guessing this is something new with 2017.

comment:9 Changed 3 years ago by EvanCarroll

Also, for those finding this bug on Google you can turn off loading tabblock without upgrading the plugin using this..

UPDATE loader_lookuptables SET load=false WHERE lookup_name = 'tabblock';

Last edited 3 years ago by EvanCarroll (previous) (diff)

comment:10 Changed 3 years ago by robe

I think they started to blacklist in 2015 though they seem to have gotten more aggressive.

comment:11 Changed 2 years ago by robe

Milestone: PostGIS 2.4.0PostGIS 2.3.4

comment:12 Changed 2 years ago by jeffreywescott

So, any recommendations for someone like me who has a use case that does require census block data?

In the generated load-states.sh script I tried appending the filename references like so:

sed -i "s/_tabblock\./_tabblock10./g" load-states.sh

That seemed to get it to work. However, when I try to load a state (e.g., 'CA'), there are still no rows in the tabblocks table:

postgres=# select * from tabblock;
 gid | statefp | countyfp | tractce | blockce | tabblock_id | name | mtfcc | ur | uace | funcstat | aland | awater | intptlat | intptlon | the_geom
-----+---------+----------+---------+---------+-------------+------+-------+----+------+----------+-------+--------+----------+----------+----------
(0 rows)

I noticed that for most other tables, loader_load_staged_data is used to transfer the data from the tiger_staging to the tiger_data database, but that doesn't seem to be happening for the tabblock table in the script. Notice the difference between the CA_tract and CA_tabblock handling from the script excerpt below:

${PSQL} -c "CREATE TABLE tiger_data.CA_tract(CONSTRAINT pk_CA_tract PRIMARY KEY (tract_id) ) INHERITS(tiger.tract); "
${SHP2PGSQL} -D -c -s 4269 -g the_geom   -W "latin1" tl_2013_06_tract.dbf tiger_staging.ca_tract | ${PSQL}
${PSQL} -c "ALTER TABLE tiger_staging.CA_tract RENAME geoid TO tract_id;  SELECT loader_load_staged_data(lower('CA_tract'), lower('CA_tract')); "
	${PSQL} -c "CREATE INDEX tiger_data_CA_tract_the_geom_gist ON tiger_data.CA_tract USING gist(the_geom);"
	${PSQL} -c "VACUUM ANALYZE tiger_data.CA_tract;"
	${PSQL} -c "ALTER TABLE tiger_data.CA_tract ADD CONSTRAINT chk_statefp CHECK (statefp = '06');"
cd /gisdata
#wget http://www2.census.gov/geo/tiger/TIGER2013/TABBLOCK/tl_2013_06_tabblock.zip --mirror --reject=html
cd /gisdata/www2.census.gov/geo/tiger/TIGER2013/TABBLOCK
rm -f ${TMPDIR}/*.*
${PSQL} -c "DROP SCHEMA IF EXISTS tiger_staging CASCADE;"
${PSQL} -c "CREATE SCHEMA tiger_staging;"
for z in tl_2013_06*_tabblock.zip ; do $UNZIPTOOL -o -d $TMPDIR $z; done
cd $TMPDIR;

${PSQL} -c "CREATE TABLE tiger_data.CA_tabblock(CONSTRAINT pk_CA_tabblock PRIMARY KEY (tabblock_id)) INHERITS(tiger.tabblock);"
${SHP2PGSQL} -D -c -s 4269 -g the_geom   -W "latin1" tl_2013_06_tabblock.dbf tiger_staging.ca_tabblock | ${PSQL}
${PSQL} -c "ALTER TABLE tiger_staging.CA_tabblock RENAME geoid TO tabblock_id;"
${PSQL} -c "ALTER TABLE tiger_data.CA_tabblock ADD CONSTRAINT chk_statefp CHECK (statefp = '06');"
${PSQL} -c "CREATE INDEX tiger_data_CA_tabblock_the_geom_gist ON tiger_data.CA_tabblock USING gist(the_geom);"
${PSQL} -c "vacuum analyze tiger_data.CA_tabblock;"
cd /gisdata
#wget http://www2.census.gov/geo/tiger/TIGER2013/BG/tl_2013_06_bg.zip --mirror --reject=html
cd /gisdata/www2.census.gov/geo/tiger/TIGER2013/BG
rm -f ${TMPDIR}/*.*
${PSQL} -c "DROP SCHEMA IF EXISTS tiger_staging CASCADE;"
${PSQL} -c "CREATE SCHEMA tiger_staging;"
for z in tl_2013_06*_bg.zip ; do $UNZIPTOOL -o -d $TMPDIR $z; done
cd $TMPDIR;

Any ideas on how to make sure my tabblock table gets populated using the TIGER2016 data?

comment:13 Changed 2 years ago by robe

Jeffrey,

I haven't looked at this issue in a while. I've put this on my todo to fix as part of the FOSS4G OSGeo Code Sprint this weekend. https://wiki.osgeo.org/wiki/FOSS4G_2017_Code_Sprint#PostGIS

comment:14 Changed 2 years ago by jeffreywescott

I spent a bit more time looking into this and got it to work with the following changes in the loader_lookuptables:

UPDATE loader_lookuptables
SET    load = true
,      post_load_process = '${psql} -c "ALTER TABLE ${staging_schema}.${state_abbrev}_${lookup_name} RENAME geoid10 TO tabblock_id;  SELECT loader_load_staged_data(lower(''${state_abbrev}_${table_name}''), lower(''${state_abbrev}_${lookup_name}'')); "
${psql} -c "ALTER TABLE ${data_schema}.${state_abbrev}_${lookup_name} ADD CONSTRAINT chk_statefp CHECK (statefp = ''${state_fips}'');"
${psql} -c "CREATE INDEX ${data_schema}_${state_abbrev}_${lookup_name}_the_geom_gist ON ${data_schema}.${state_abbrev}_${lookup_name} USING gist(the_geom);"
${psql} -c "vacuum analyze ${data_schema}.${state_abbrev}_${lookup_name};"'
,      columns_exclude = '{gid, uatyp10, uatype, suffix1ce}'
WHERE  lookup_name = 'tabblock'
;

I had to change the columns_exclude as well as add the loader_load_staged_data statement in the post_load_process.

Also, it was still necessary to work with the 10 suffix on the TABBLOCK filenames:

sed -i "s/_tabblock\./_tabblock10./g" load-states.sh

comment:15 Changed 2 years ago by robe

Jeffrey,

Thanks for the experiment. I've incorporated your changes into code, but to fix the tabblock10/tabblock10 issue I had to modify the loader scripts a bit.

I also realized the census script was a bit broken so fixing that as well. And then also discovered census is using SSLV3 as noted in #3816.

I'm doing some testing of states before I commit my changes.

What I will do is keep tabblock, bg off since there is nothing in the code that uses those tables. tract I'm tempted to turn off as well by default except I have get_tract function that uses it.

I'll update the docs with an update statement for people that want those tables loaded and also show how to use the loader_generate_census_script which I have set to download bg, tract, tabblock regardless of if they are turned off in the lookup_tables.

comment:16 Changed 2 years ago by jeffreywescott

Thanks -- that all sounds good.

For what it's worth, I think a refactoring of the generate scripts is in order. It would be a lot easier for end users like me if all of the downloads happened up front, and the loads happened separately. Also, on the Unix-ish versions, I think a set -e at the top of the script is essential, particularly given the fact that Akamai is blacklisting people. You can end up waiting hours only to find out that things haven't really loaded, and you pretty much need to start over in a lot of cases.

Related: has anyone from this team tried to get in touch with the Census folks? I think blacklisting people might be a violation of FOIA (https://www.foia.gov/).

comment:17 Changed 2 years ago by robe

I tried once and gave up cause I couldn't reach up to the chain of anyone technical and just got the "Works fine for me talk to your service provider" answer. I'll try again maybe after the 2017 data comes out which should be soon. Feel free to bug them yourself.

As far as refactoring the script, it's a little too late to do for 2.4.1, but will definitely try doing it for 2.4.1 or 2.5.0.

comment:18 Changed 2 years ago by robe

Resolution: fixed
Status: newclosed

In 15585:

fix issues with loading tabblock, incorporate changes from Jeffrey Wescott
change url to new census ssl one
update docs to describe how to load bg, tract, tabblock tables
Closes #3698 for PostGIS 2.3.4
Closes #3816 for PostGIS 2.3.4

Note: See TracTickets for help on using tickets.