Opened 16 years ago

Last modified 15 years ago

#34 closed enhancement (fixed)

.prj creation by pgsql2shp

Reported by: dane.springmeyer Owned by: robe
Priority: medium Milestone: PostGIS 1.3.6
Component: postgis Version: 1.4
Keywords: Cc:

Description (last modified by robe)

This is a very minor feature request.

It would be great to be able to automatically generate the .prj file needed by many shapefile readers during export from postgis.

I also made a note of this issue on the postgis-users list a while back (http://postgis.refractions.net/pipermail/postgis-users/2007-November/017698.html)

An optional flag for pgsql2shp, say -s 4326, would match the specified srid and pull from cooresponding srtext field.

For any custom projections added that did not have a cooresponding srtext field, a warning could be output by pgsql2shp. The user would then be responsible for filling it in if blank, from spatialrefererence.org or other sources.

I understand that the srtext field was likely not intended to be used this way, so there may be better solutions.

This thread notes (http://postgis.refractions.net/pipermail/postgis-users/2004-October/005958.html) that postgis would benefit from a full fledged WKT parser, but in the case of this feature request, I don't imaging that much parsing would need to be involved to output the srtext during export.

Change History (24)

comment:1 by pramsey, 16 years ago

<i>(No comment was entered for this change.)</i>

comment:2 by robe, 15 years ago

Working on this now and have a testing version working, but have not committed it yet. Will commit first draft to trunk probably within the next week.

Slight change — to above — I have the logic currently just outputting the prj based on the SRID of the geometry so no need to prompt for what SRID that I can think of.

I'm debating whether we still need a switch to denote whether to disable .prj output. Right now my logic will output a .prj file by default if a query/table has a well-defined SRID and only one SRID, but I'm thinking there are cases where users may want to manually generate their own .prj file and therefore may not want the system to overwrite that. Suggestions are welcome.

comment:3 by kneufeld, 15 years ago

I think that's a good call. Having a flag to specify the SRID would get confused with optionally wanting to transform the table/query to a new projection on the way out (which actually wouldn't be a bad option to have).

Looking forward to it. — Kevin

comment:4 by dane.springmeyer, 15 years ago

Your plan sounds great Regina. The only reason I thought of the flag option was in keeping with shp2pgsql, but you are right - the idea here is different since the srid should be known.

Great stuff!

comment:5 by mcayland, 15 years ago

My vote is output the .prj automatically (unless the column is NULL, in which case don't output it); then once it is there, the user can alter to taste. Bear in mind that the current behaviour for pgsql2shp is that if there is already a shapefile with the same name in the current directory, it will simply be overwritten - so it makes sense to keep the behaviour the same.

ATB,

Mark.

comment:6 by robe, 15 years ago

Attached is my revised patch. For now I am just outputting proj file if it can be output. I also took some of your recommendations - e.g. I think fixing buffer overflow possibility and Kevin's checking geometry_columns first to save time.

Though I have to admit my SQL you may find unconventional and you may disapprove, but does seem to short-circuit correctly with a 1,000,000 record table set I tried. Check it out.

It was the only way I could think of checking without doing 2 separate database calls which I hated the thought of more. Also I'm not escaping the table and column names — how do you do that in C? What do we do in cases where there are null geometries — I'm just taking the first srid if it has null and one srid. Actually I'm not even sure pgsql2shp handles that correctly now since I think I have had weird row shifts when outputting nulls in the past.

Also now I am checking the geometry_columns table — hmm getting the schema right is now somewhat important. So I would like use the table visible thing and just pull the schema from pg when its not given.

comment:7 by kneufeld, 15 years ago

Interesting use of COALESCE … seems logical, should work.

But, yes, I agree that when a schema is not provided, you should add a 'AND

pg_table_is_visible((gc.f_table_schema
'.' gc.f_table_name)::regclass)' filter

to your first query against the geometry_columns table. There could easily be duplicates in the geometry_columns table when strictly looking at f_table_name and f_geometry_column and you may be pulling the SRID that belongs to a different table.

You shouldn't have to modify the second query since specifying a non-schema qualified table already defaults to looking just in the user's search_path.

comment:8 by robe, 15 years ago

Just noticed I left some conditions that I no longer need since the coalesce will always return a record except on fatal error where table does not exist or connection is lost. I'll take those out later.

Still thinking about Mark's comment about just outputting the .prj regardless. I'll leave like that for now, but still have some reservations. Unlike the tables and shp that rightfully do get overwritten, a .prj file is valid for a table query even from previous dumps. I'll have to reinstall my ArcGIS and play with that.

My concern is if you store your data in a certain projection which is equivalent to another registered in say ArcGIS or mapinfo, but yours isn't. You may manually create a .prj to make <stuff your favorite yuck product here> happy and not want the .prj overwritten though you always want the latest data overritten.

comment:9 by robe, 15 years ago

I haven't tested this in ArcGIS yet, but will after I reinstall on my computer. I did check out does output the .prj in a number of permutations of options.

Accidentally confirmed that if you have a bogus record in your geometry_columns table that references a table with same name as table you are outputting and you don't specify a schema, a projection will not be output and prj creation will return with an error. This I don't consider to be a real issue.

comment:10 by mcayland, 15 years ago

I've quickly reviewed your applied patch and have a couple of minor comments:

  • Even though you've used malloc() instead if snprintf(), I still think the

calculation is wrong. In the code, you have 'if ( schema ) size += strlen(schema)', but then in the first query you reference schema twice. So I'm thinking this should be something like 'size += 2 * strlen(schema)' instead. And this may also be the same for the table name and the geo-column name.

  • Slight typo in your block comment (havshort)
  • protect_quotes_string_noiconv(). I don't think you need this, as code to handle

this is already built into libpq. See http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html#LIBPQ-EXEC-ESCAPE-STRING for some examples.

HTH,

Mark.

comment:11 by robe, 15 years ago

Thanks Mark. I suspected there was such a thing in libq, but couldn't find an example use in the code. I stole much of that code from shp2pgsql but that had a lot of iconv stuff that would have required me bringing too much stuff over. You think the same holds true for that part or we still in the protect_quotes_string for that part because of the need for iconv checks.

comment:12 by mcayland, 15 years ago

Yeah. shp2pgsql is different because it doesn't actually connect to a PostgreSQL server - it just produces a text file, and so it has to handle the encoding itself by allowing the user to specify it on the command line.

For pgsql2shp, the client_encoding is a property of the connection used to extract the geometries, and so PQescapeStringConn() automatically 'just knows' the correct encoding to use when passing strings into the SQL engine.

HTH,

Mark.

comment:13 by robe, 15 years ago

Okay I looked at PQescapeStringConn() and seems kind of cumbersome to use since it takes the string in as a pointer changes it and then just outputs the size of the resulting so I'd have to have silly variables to contend with it seems. Unless I misunderstood that. Couldn't find any examples of its use either.

I'm actually thinking of abandoning the whole printf … query thing and using PQexecParams instead which seems to have existed at least since 8.0. http://www.postgresql.org/docs/8.3/interactive/libpq-exec.html. I think that will also solve the having to make the buffer big enough issue if I use a parameterized query. Does that sound reasonable?

Though no examples of that I could find either so I'm hoping they have some examples in the PostgreSQL source code so I don't actually have to learn C to make sense of the documentation.

comment:14 by mcayland, 15 years ago

According to the arguments, it writes it's output into an existing buffer rather than creating a new string and returning it. So you could do something like:

char *esc_schema; int error;

esc_schema = malloc(2 * strlen(schema) + 1); PQescapeStringConn(conn, esc_schema, schema, strlen(schema), &error); … … Perform SQL query … PQfree(esc_schema);

Then just refer to 'esc_schema' in your snprintf() rather than schema. Note: the correct length for esc_schema is calculated as per the notes in the PostgreSQL documentation and so will not overflow. I would avoid using parameterised queries for this, as it's designed for caching query plans across multiple queries rather than just escaping strings.

HTH,

Mark.

comment:15 by robe, 15 years ago

Mark,

Hmm didn't quite get that from the reading the docs that PQexecParams caches the query plan. It seemed to suggest the only benefit is that it escapes out the text correctly based on data type. If it cached the plan, then what's the point of having PQexecPrepared.

PGresult *PQexecPrepared(PGconn *conn,

const char *stmtName, int nParams, const char * const *paramValues, const int *paramLengths, const int *paramFormats, int resultFormat);

Though it would be nice if they actually had some examples in the docs of these. I have too low of an attention span to actually read stuff :).

comment:16 by robe, 15 years ago

Mark,

I changed the logic to use PQescapeStringConn insteaad and increased malloc. I'm using (char *) malloc(2 * strlen(schema) + 1); syntax instead of esc_schema = malloc(2 * strlen(schema) + 1);

I assume both are just different ways of saying the same thing. I stuck with that since the rest of the code here uses the (char *) and I figure the less people need to think about the better or we could change all the code to the shorter syntax.

Also finally got to test this in ArcGIS in 9.3 and it works beautifully.

I tested with

pgsql2shp testdb 'SELECT name, ST_Transform(the_geom,4269) as newgeom FROM neighborhoods' -f neilonlat

and also a pgsql2shp testdb dnd.rems_survey -f survey

One in long lat and the other in MA state plane feet and when I overlaid them in ArcGIS they reprojected to the first layer I laid on so lined up nicely.

It would be nice to include this in 1.3.5 now that I'm thinking about it and that we will have a 1.3.5anyway. Would anyone have issues if I back-ported to 1.3.5, or do we not consider this a minor enhancement?

comment:17 by mcayland, 15 years ago

Regina,

Looks good to me. I really wouldn't worry about the increased malloc() size (3 times instead of 2) since the PostgreSQL documentation clearly states the output size should be calculated as 2 * strlen(str) + 1. But what is there at the moment isn't dangerous.

Also on reflection you are right to use free() rather than PQfree() since of course we are allocating allocating the memory, and not libpq.

One other minor niggle I see is that you use the value 'm' to indicate a mixed SRID table; but in the code you check this using 'if (srtext[0] == 'm')'. My concern is that if there are any proj4text definitions begin with an 'm' then this condition would be incorrectly triggered; I think using strcmp() to check the complete string would probably be safer here, and possibly in the else section below it (i.e. what would happen if a proj4text field accidentally had a leading space?)

I have no problem with you backporting to 1.3 branch, as an upgrade simple upgrades the executable (i.e. nothing in your patch changes any command line options which may cause existing scripts to break).

ATB,

Mark.

comment:18 by robe, 15 years ago

I have this back-ported to 1.3 branch, but guess I should wait till 1.3.5 is released before I commit. I tested and it seems to work okay with my 1.3.5 install, but take a look and see if it looks fine.

comment:19 by mcayland, 15 years ago

Yeah. We're kinda on the hook to get 1.3.5 out within the next few days, and so let's not make the testing any more difficult than it should be. I don't see any reason not to commit this to 1.3 after 1.3.5 and the code gets some more testing…

ATB,

Mark.

comment:20 by mcayland, 15 years ago

Can we mark this as done? Or have people still not done enough testing in 1.3 branch?

ATB,

Mark.

comment:21 by robe, 15 years ago

I'm all for closing this out. Someone can reopen it if they feel the need. I've tested the 1.3.6 under OpenSUSE, Vista, and Windows 2000 and seems to generate the .prj fine. I've only tested the 1.4 under OpenSUSE, but plan to test 1.4 under Windows 2000, XP and Vista before release.

I have opened both OpenSUSE and Windows generated with ArcGIS 9.3 on a Windows 2000 and that seems to overlay them correctly against other data in another projection. Also had someone else do the same test under Windows XP (ArcGIS 9.3). He said it worked.

comment:22 by robe, 15 years ago

Description: modified (diff)
Milestone: 1.4.01.3.6
Note: See TracTickets for help on using tickets.