Opened 11 years ago

Closed 3 years ago

#343 closed enhancement (invalid)

History Table Enhancements

Reported by: pimpaa Owned by: pimpaa
Priority: medium Milestone: PostGIS Fund Me
Component: postgis Version: master
Keywords: History Tables Cc:

Description

I'm here to propose some enhancements I had in my mind for my History Tables implementation.

I have some interesting ideas for complementary functions inside this "module".

a) a function that allows the user to deactivates the logging of a certain table. All this function does is to delete the specified table rules and clean-up geometry_columns and history_information tables with relevant with information.

a.1) a function that allows the user to activate the logging in history tables again. the function will be the same as the used to initially start the log. the only difference is that now uses a simple exists() test to determine if table already exists, and if so, recreate the rules and all the necessary rows in history_information and geometry_columns.

these two functions are already ready and in the following patch.

b) a function to populate a new history table with all records in the table currently being logged. this should not be hard, but can be cumbersome for large datasets. I appreciate any thoughts on this matter.

Any other requests for patches or enhancements, please, post it here and i'll try my best.

As always, I appreciate any comments or critics.

Attachments (2)

history_table_v4.sql (8.8 KB) - added by pimpaa 11 years ago.
implementation of additional functions to history_tables module
history_table.sql (16.9 KB) - added by pimpaa 9 years ago.
History Table 0.5

Download all attachments as: .zip

Change History (25)

Changed 11 years ago by pimpaa

Attachment: history_table_v4.sql added

implementation of additional functions to history_tables module

comment:1 Changed 11 years ago by mcayland

I think this is an interesting application, however I think that what you're tackling here is a more generic database versioning problem rather than a spatial-specific problem. If you search the PostgreSQL mailing list archives for "time travel", you'll find lots of information on this, plus several related projects on pgfoundry.org which you may find have already solved this particular problem.

HTH,

Mark.

comment:2 in reply to:  1 Changed 11 years ago by pimpaa

Thanks for your opinion mcayland.

But there is some code which already is in trunk. It´s lightweight and can deal with spatial information.

This is not a versioning scenario, just a historic of changes.

Have you tried the module?

Do you have any suggestions to enhance the module to work better with specifics of spatial information?

Thanks for your opinion :)

Replying to mcayland:

I think this is an interesting application, however I think that what you're tackling here is a more generic database versioning problem rather than a spatial-specific problem. If you search the PostgreSQL mailing list archives for "time travel", you'll find lots of information on this, plus several related projects on pgfoundry.org which you may find have already solved this particular problem.

HTH,

Mark.

comment:3 Changed 11 years ago by pramsey

Milestone: PostGIS 1.5.0PostGIS 2.0.0

Rolling this forward.

comment:4 Changed 11 years ago by pramsey

Version: 1.4trunk

comment:5 Changed 11 years ago by pimpaa

Type: patchdefect

If the original table any columns with names equal any of the specified on history template, the function will break.

NOTICE: CREATE TABLE will create implicit sequence "foo2_history_history_id_seq" for serial column "foo2_history.history_id" CONTEXT: SQL statement "CREATE TABLE public.foo2_history(history_id serial not null,date_added timestamp not null default now(),date_deleted timestamp default null,last_operation varchar(30) not null,active_user varchar(90) not null default CURRENT_USER,current_version text not null,like public.foo2,CONSTRAINT foo2_history_pk primary key(history_id));" PL/pgSQL function "postgis_enable_history" line 69 at EXECUTE statement ERROR: column "date_added" specified more than once CONTEXT: SQL statement "CREATE TABLE public.foo2_history(history_id serial not null,date_added timestamp not null default now(),date_deleted timestamp default null,last_operation varchar(30) not null,active_user varchar(90) not null default CURRENT_USER,current_version text not null,like public.foo2,CONSTRAINT foo2_history_pk primary key(history_id));" PL/pgSQL function "postgis_enable_history" line 69 at EXECUTE statement

Error

ERROR: column "date_added" specified more than once SQL state: 42701 Context: SQL statement "CREATE TABLE public.foo2_history(history_id serial not null,date_added timestamp not null default now(),date_deleted timestamp default null,last_operation varchar(30) not null,active_user varchar(90) not null default CURRENT_USER,current_version text not null,like public.foo2,CONSTRAINT foo2_history_pk primary key(history_id));" PL/pgSQL function "postgis_enable_history" line 69 at EXECUTE statement

comment:6 Changed 10 years ago by robe

George,

Sorry its taken me until now to look at this. I hope you haven't lost interest.

I've put what is currently in trunk in the documentation, but I found some issues. Some of which I think you may have already fixed in your latest.

Can you attach your latest version to this ticket and I'll cut in and retest.

http://www.postgis.org/documentation/manual-svn/Extras.html#History_Table

One issue I came across is that the history table didn't seem to log my update correctly. It had put in an INSERT event when I did a ST_SetPoint of a geometry.

Some other things I would like changed.

1) the postgis_install_history function shouldn't return void. It should behave like populate_geometry_columns etc. returning a description of what was created and added. http://www.postgis.org/documentation/manual-svn/Postgis_Install_History.html

2) The postgis_enable_history function. http://www.postgis.org/documentation/manual-svn/Postgis_Enable_History.html I kind of like that it returns a boolean so I'm a bit torn. I think though to be consistent with the rest of the code base, it should return a text description too of the rules that were added, the history table that was created and so forth.

As far as Mark's comments go about reinventing the wheel. I haven't really had a look at pgfoundry and I like the idea of having the generic versioning issue restated with spatial in mind. I don't think the other solutions do that and its a bit of a leap to bring the two together and a common request in PostGIS newsgroup.

I do know that nowadays rules are kind of frowned upon, but I think for bulk loading or massive editing, they are still better than triggers , but I'll have to do some more serious tests.

Also I'm not too concerned about this not being terribly polished. I'm thinking of the extras section of PostGIS (and the manual) as both drop-in small utilities you can use to solve common GIS problems with PostGIS as well as a seed to build more robust solutions to common problems. Once people understand the problem being solved and how you have solved it then only then can people make it better. What better way to voice that purpose than in the official docs.

comment:7 Changed 10 years ago by robe

Just to add to this. Sorry for my slip of terms. I think when people think of spatial versioning they think more along the lines of ESRI ArcSDE versioning and about this QGIS plugin: http://www.kappasys.org/cms/index.php?id=23&L=5#toc12

Which is great but overkill for many use cases. What George has is like he says more of keeping a history that doesn't require change to the application front-end or table structure (though it does introduce side line rules).

Then there is GeoServer?'s implementation of versioning which again looks like its a full blown versioning system though I haven't inspected it closely to see. http://geoserver.org/display/GEOS/Versioning+WFS

I think these are good to document (since people mistake the 2 concepts) to let people know they are there (I'll probably add links to them from the history_table documentation) just so people are clear what the history_table module is and what it is not and so they can make the best decision for their specific use case.

Admittedly I've created systems that manage transaction history and they've looked closer to George's implementation because I only cared about pointing fingers when something went wrong and possibly manually rolling back a change. The other kind of things were overkill for my specific needs and I didn't want to muddy the front-end interface (or didn't have control of it). Both the GeoServer? and QGIS versioning systems while very useful looking for what they do - at a cursory glance seem to require control of the front-end. George's works more or less transparently with any system (well in theory should - more testing and robustness handling will be needed to ensure that).

comment:8 Changed 9 years ago by robe

Milestone: PostGIS 2.0.0PostGIS Future

comment:9 Changed 9 years ago by robe

Milestone: PostGIS FuturePostGIS 2.0.0

comment:10 Changed 9 years ago by pimpaa

Hello everyone,

As I said to Regina, I've must missed the email from trac and stoped watching this ticket. Sorry for the delay :P.

I'll refactor the code and post a patch, addressing the issues Regina mentioned. Also, I will explore other functions such as SetPoint? and try to fix that bug.

The signatures for the functions postgis_install_history and enable_history will change, so they return descriptions of what changes took place.

I'm going to rename the historic table columns, to something more esoteric, so they won't collide with actual column names.

Can you suggest some sort of automated testing I can do? My idea here is to create a few tables, make some fixed changes and try to select them in the historic table. Other than that, any tips on testing?

As usual, I'm open to suggestions.

George

comment:11 Changed 9 years ago by pimpaa

Regina, I think that the update statements were logging the next update as a insert. This was fixed, but I'm refactoring the other functions as well. I'll commit everything when it's done.

comment:12 Changed 9 years ago by robe

sounds great. Looking forward to it.

comment:13 Changed 9 years ago by pimpaa

Heres is the new version for the history table implementation.

Changes:

  • Altered the return of the functions to be a varchar, to comply with other similar functions. My idea here was to do this at the lowest possible level and join all information on the top level functions, but I'm quite busy these days, so I hardcoded some of the return information. There's room for improvement here, as I said, collecting more detailed information and outputting to the user.
  • Fixed the bug reported by Regina. UPDATEs are logged as UPDATED, deletes as DELETED and INSERTS as INSERT (I'm thinking INSERTED here would fit better). I'm open to suggestions.
  • The current_version number in history table should point at the current history version which those old records point to. This is some sort of bug. Now they are pointing to the current primary key on the actual table, so we get a lot of duplicated information. The PK is already listed, so it's not necessary.
  • I've created one single test. The plan is to split it up into more tests. Advice would be great.
  • The functions now are:

postgis_create_history(schema,table,pk field) - this will create the history table for the first time and start logging;

postgis_enable_history(schema,table) - enables history logging for a certain table if this table is already listed on historic_information (ie. postgis_create_history already ran at some point). this will recreate the logging rules.

postgis_disable_history(schema,table) - disables history logging for a table that is listed in historic_information. this function only updates the historic_information table and drops the logging rules.

postgis_drop_history(schema,table) - this will purge all history information for a certain table. this will drop historic table and delete it's records from geometry_columns and historic_information.

I would like to hear your comments and critics.

Thanks

Changed 9 years ago by pimpaa

Attachment: history_table.sql added

History Table 0.5

comment:14 Changed 9 years ago by robe

George,

Sorry for delay on this. I took another look and it still needs work:

1)On your line 61 -- "THE HISTORIC INFORMATION TABLE WAS DROPPED. FUNCTIONS MARKED FOR DELETION.". Needs to be single quoted instead of double-quoted. didn't install without changing to '.

2) Your postgis_enable_history assumes that historic_information etc. will be installed in public. I dont' think we should make this assumption. I would just get rid of the public or if you want this to be always installed in a specific schema, then create one and use that instead of public (similar to wha tis done with topology).

3) Most importantly, this code does not work with PostGIS 2.0 because it tries to write to geometry_columns which is now a view. Best thing to do is change that logic to use

populate_geometry_columns(..)

http://www.postgis.org/documentation/manual-svn/Populate_Geometry_Columns.html

This will do the right thing for both PostGIS 1.4-1.5 and PostGIS 2.0

I see some other issues such as missing use of quote_ident, quote_literal etc. but wasn't able to test those issues because of the aforementioned.

comment:15 Changed 9 years ago by pimpaa

Hello Regina,

It's not a problem. I'm not sure if I mixed versions, but here worked just fine. I'll run some tests and I'll look for the issues you've mentioned, specially 2 and 3;

There is a test script included. Did you see it? It setups a dry run and tries to test all functions.

Thanks for the support.

comment:16 Changed 9 years ago by robe

Ah now I see the test script. Probably better to put that in a separate file since its more like a regression test.

comment:17 Changed 9 years ago by pimpaa

Hello guys,

I've discovered a huge problem with the rule system that it may invalidate this current implementation in favor of triggers.

Here is the outline:

imagine this:

CREATE TABLE test( id serial not null primary key);

SELECT * FROM ADDGEOMETRYCOLUMN('public','test','the_geom',-1,'POINT'2);

SELECT * FROM POSTGIS_CREATE_HISTORY('public','test','the_geom');

insert into test values (1,ST_GEOMFROMTEXT('POINT(1 1)'));

Our rule system works out beautiful in this scenario, because we are passing a known value to our id. The rules will write everything correctly to the database.

but imagine this insert

insert into test values(DEFAULT,ST_GEOMFROMTEXT('POINT(1 1)'));

In this case the value passed to the rules is nextval, therefore, leaping an id, breaking the system (the logged id for this record is 2, instead of 1);

I think we should move on to triggers, since as I know it, there is no way to workaround this rule behavior.

This is not a critical module, but I'm asking for your opinion guys and gals.

I prefer to not have this functionality for one month or two while I rewrite this module with triggers than showcase something that is half-broken.

Thoughts?

comment:18 Changed 9 years ago by robe

George,

Yah I think triggers are better since rules are hard to predict when they become complex. Also 9.1 now supports triggers in views so that's going to be the preferred way to go moving forward.

You want me to take the documentation of history tables off for now until you reimplement?

comment:19 Changed 9 years ago by pimpaa

Hi Regina,

I think that might be better. This is huge disapointment for since it's the first contribuition that i've made, but no problem. Better to take it off now and have less complaints.

I'll try to rewrite it in less than a month. Thanks for all the support folks.

comment:20 Changed 9 years ago by strk

Milestone: PostGIS 2.0.0PostGIS 2.1.0
Type: defectenhancement

Enhancements will have to wait till next round

comment:21 Changed 8 years ago by robe

Milestone: PostGIS 2.1.0PostGIS Future

comment:22 Changed 3 years ago by robe

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

comment:23 Changed 3 years ago by pramsey

Resolution: invalid
Status: newclosed
Note: See TracTickets for help on using tickets.