Opened 15 years ago

Closed 15 years ago

#180 closed enhancement (fixed)

History table example implementation

Reported by: pramsey Owned by: pimpaa
Priority: medium Milestone: PostGIS 1.5.0
Component: postgis Version:
Keywords: Cc: pramsey

Description

Work up example triggers that can be used to manage a _history table that includes date_added and date_deleted fields.

Attachments (3)

v02.rar (2.6 KB ) - added by pimpaa 15 years ago.
initial draft of the history functions
history_tables_v02.sql (5.7 KB ) - added by pimpaa 15 years ago.
history_tables_v03.sql (7.4 KB ) - added by pimpaa 15 years ago.
SQL file with all history table implementation functions

Download all attachments as: .zip

Change History (26)

comment:1 by robe, 15 years ago

Paul you sure you don't want a PostGIS 1.5 (hint hint) — it seems almost a shame to put all this stuff off till 2.0

comment:2 by pimpaa, 15 years ago

Owner: changed from pramsey to pimpaa
Status: newassigned

comment:3 by pimpaa, 15 years ago

Hello everyone,

I'm a medium level user and accepted this ticket to contribute with postgis, which is a great tool.

I was studying what is the best way to get this done, so i've tried RULES and TRIGGERS.

I could think of two options: a set of three rules that can be built, one for insert, one for update and one for delete OR one trigger which threats the operation and resolves what do to.

Heres the structure of both tables:

table test( id serial not null, att1 varchar(20), att2 varchar(20), geom geometry, constraint test_pk primary key(id);

table h_test( id serial not null, att1 varchar(20), att2 varchar(20), geom geometry, history_id serial not null, date_added date default current_date, timeofupdate time default now(), date_removed date default null, operation varchar(20), active_user default current_user, current_version integer, constraint h_teste_pk primary key(history_id));

RULE INSERT would insert a new record to this table and populate it with its values. Its quite simple to automatically generate its code, using NEW.*.

RULE DELETE a little bit more complicated, but it can be done. I used an update to apply the current values to date_removed, operation, active_user, and set a arbitrary number to current_version and id_field (-9999).

RULE UPDATE: this is where i ran into trouble. Since we need to update all fields, i need to know them in advanced or find a way to get them in a plpgsql function. This rule would create a new record, with operation = 'UPDATE', containing all new information, but i need to update the old record, change its current_version and set the arbitrary number to id_field of foo table.

I could in this rule, create the new record and set only the current_version and id_field of the old record (in history table). Would this work?

This would give us a solution where you only have one record for deleted features, multiple features to update features, and one for each insert. And since we would have a current_version attribute, would be easy for the user to identify the original atribute that each feature came from.

The trigger approach is much easier to generate code for, but so far i can only insert one record for each operation. I know they can be updated, but i ran into the same problem as before, i need to know the fields for the UPDATE operation to do this.

I don't see any other options. Since this is my first ticket, and im quite a intermediate user, i would like your opinion.

Whats the best way to approach in this?

I already have a function to create automatically the history tables and a history schema.

Thanks for the attention.

in reply to:  3 ; comment:4 by pramsey, 15 years ago

First of all, use a 'timestamp' not separate date and time columns.

Name your derived table as 'foo_history' rather than 'h_foo'. Actually some comment on that would be useful. The 'h_' notation puts all the history tables in one separate section of a sorted listing, while the '_history' notation puts the history next to the table it relates to. Which is more valuable?

Here's how history works:

On insert to table foo, insert to h_foo, with date_add = now() and date_removed = null.

On delete from table foo, update h_foo, set date_removed = now().

On update to table foo, update current record h_foo setting date_removed = now() and insert new record into h_foo, with new attributes and date_add = now() and date_removed = null.

Because your rules don't know the attribute names at run-time, you need to compose the rules during the history-enablement stage.

in reply to:  4 comment:5 by pimpaa, 15 years ago

Hello pramsey. Thank you for all the info.

I tought of storing all history information on a separate schema, named history (or whatever seems apropriate). All my functions are working in history. Using a different schema is better? worse? I think this would be better, since you can completely separate things.

About the naming of tables, rules and triggers: i sure can use a hand to help me define a proper notation for this.

I will follow your advice about the overflow of history. What about the current version attribute? Do you think it's a valuable asset?

On a side note: i'm creating a table with geometry, and updating geometry_columns, so the historic tables work on GIS softwares. Inserting into geometry_columns is a good approach or i should drop column and recreate it inside the historic table?

Thanks!

comment:6 by pramsey, 15 years ago

Current version attribute is not so useful for people, but can be quite useful for software. Keep it around.

Updating geometry_columns yourself is acceptable.

I don't think a separate schema is wise, because for databases that already have schema partitioning, you'll end up putting all the history tables from different schemas into one place, potentially raising name collisions, etc. I think a simple table naming scheme, either your 'h_' one or a '_history' one (btw, I think 'history_' is better than 'h_', we can spare the extra characters) is preferable.

by pimpaa, 15 years ago

Attachment: v02.rar added

initial draft of the history functions

comment:7 by pimpaa, 15 years ago

Hello everyone,

I´ve came up with a solution for this ticket, altought there is room for improvent.

How this works?

Execute the SQL files in any PostGIS database.

It will create the table historic_information on the public schema (need direction with this).

execute the following sp to create a historic table of any type of geometry: select create_history_table(schema,table_to_log,primary_field,geometry_field,geometry_type,ndims,SRID);

this will create the table foo_history on the same schema as the original table. i will create all the rules necessary to log the updates, inserts and deletes to the original table.

in the history table we have a current version field, which will log which is the ID of the current version of that feature.

I did not tested it out extensively.

Thanks for all the help, and please give the usual comment and critics.

George

comment:8 by pramsey, 15 years ago

Cc: pramsey added

by pimpaa, 15 years ago

Attachment: history_tables_v02.sql added

comment:9 by pramsey, 15 years ago

The point is not to track just changes in geometry, but history globally on teh table. the _history table should include every attribute in the original table, plus the history tracking attributes. You'll probably need to build it either using table inheritance (not sure if that would work) or by querying the system tables to get the information about what columns exist in the table.

comment:10 by robe, 15 years ago

Just my two cents and should probably shut up on this. I woudl create a differnt schema for history tables, kind of like how PostgreSQL has a different schema for toast tables so as not to confuse people and otherwise clutter the schema.

So if the user schema a table is in is say public, call the history schema history_public or soem such thing.

I guess thinking out loud for my scenario — I have say 150 tables in one of my schemas — most I would like to use this history feature. But I won't, cause I don't want to wade thru 300 tables :)

history is a nice thing when you need to look at it interrorogate it, but otherwise you want it to be abstracted away from your common workflow and not to confuse unsuspecting users.

comment:11 by robe, 15 years ago

thinking further out loud, might be better to just have one history schema (like there is only one pgtoast schema) and prefix the table name with the original schema — since creating more schemas may be equally confusing.

somthing like schema —> postgis_history

and tables would be postgis_history.public_mytable_history

the other side benefit of having a single history schema is I can exclude it from backup since history can get very big very faast. I may only want to backup the history once a week or month excluding from my regular backup and that's trivial if its all isolated in a single schema.

Though I guess you'd have to deal with people moving their tables to different schemas and renaming them. Then again maybe I should just shut up.

comment:12 by pimpaa, 15 years ago

Well, to change everyting to a dif schema inst hard at all.

This is what i need, concerning user input. My first tought was to have a diferent schema, 'history'.

This is a discussion that i would like to initiate. I'm here to help :D.

history tables will get big pretty fast, since each update they create a new record. a new fucntion that i'm planning (and i need help, with sugestions) is to create indexes to the history_table. It´s little improvemente, but its a nice improvement. It´s not possible to recreate the indexes that are used on the original table because it will implicate in a primary key index in the "id" field.

The "like" command provides this functionlaity, but i dont think its all that useful. It will cause more trouble then good.

to pramsey: the spatial constraints are perhaps not needed, since they ARE veryfied in the parent table. i'll look into that.

anyhow! i would like suggestions to improve the schema, tables, and everything.

specifying a schema is useful, and not specifying is useful too, so we need to ponderate the benefits on over another.

as i said, im open to suggestions, and theres is tons of improvements over this code. im just a begginer.

thanks to all that had the patiete to go trought it and evaluate it.

george

comment:13 by pimpaa, 15 years ago

Oh, and since there is room for improvement here, a separate schema is a good choice. We going implement other features, such as Check_in, Check_out, based on that schema. I know how these works, but no idea of how to implement it.

So, there's room for improvement. We should think of this as the first step to provide "versioned" geodatabases to users.

2 cents :P

by pimpaa, 15 years ago

Attachment: history_tables_v03.sql added

SQL file with all history table implementation functions

comment:16 by pimpaa, 15 years ago

Hello everyone again.

I´ve tweaked the functions a little bit, and in this present release you don´t need as many parameters as you would in v02.

Now build_history_table() and then create_history_table(schema,table,geometry_field).

I´m requiring the geometry field as a parameter because it is possible to have multiple geometries in the same table. I realize now, that even with multiple geometries the historic will work just fine, but it will not update the geometry_columns for n geometries.

Another constraint is that the function get's the primary key field name from the "parent table", but in the present only works with single field primary keys. If your primary key has more than one field the tracking used in current function will not work also.

These functions are already creating some indexes on the historic table (check last function).

The names of functions are not definitive.

I´m open to suggestions. Now i will try to copy the indexes of the parent table in a easy way. Please let me know what you guys think of this.

George

comment:17 by pimpaa, 15 years ago

Hello everyone,

I'm still waiting for some advice to continue with this. Can anyone suggest or critisize?

George

comment:18 by robe, 15 years ago

George, Sorry about that. I guess with preparing for release of 1.4 and vacations, everyone has been tied up.

I did take a quick look at the code, and will have to test drive it a bit before I can make an informed opinion. My first thought — as much as I love PostgreSQL rules, I have some misgivings about using it for history management.

One of them which is somewhat minor — is that count stats don't return right which is kind of confusing when inserting data if you rely on that. Then again triggers have there own issues like being called for each row as opposed to an SQL rewrite rule.

I guess my misgivings is more not so much misgiving as I haven't weighed the pros and cons of using a rewrite rule vs. a trigger in this context. A trigger would give you more control in running more complex logic etc, but you may not need that level of control anyway.

comment:19 by robe, 15 years ago

George,

One more thing. Probably because I haven't read thru it enough - but I see you have rules to push the data and updates to the history table, but how does data actually get in the main table? I think that was really what was bothering me about the rule approach. Because the tables don't seem to be inherited in the table creation, yet all the inserts/updates are being pushed to history. Unless of course I read it wrong.

Thanks, Regina

comment:20 by pimpaa, 15 years ago

Hello Regina,

The main table is not affected. I tested it quickly when it was first written, and everything was fine. The rules add other actions to INSERTS/DELETES/UPDATES. Heres a examples of a generic insert rule (rewritten, not using the actual function, but the desired output - check line 115).

v_sql:= CREATE OR REPLACE RULE foo_history_insert as ON INSERT TO public.foo
         DO ( 
         INSERT INTO history.foo_history VALUES(
         DEFAULT, --history_id nextval() 
         DEFAULT, --date_added now() 
         NULL,    --date_deleted  
         quote_literal('INSERT'), --operation 
         DEFAULT, 
         NEW.foo_id, 
         NEW.*)
         ); 

All the tables created are inherited from the main table (check line 66 in code).

If the add-in is not showing off desired behavior, please let me know :D, and thanks for your concern.

The debate over triggers x rules are still a question, and i'm up for taking any of those. Just let me know.

Also, this was written in 8.3, and i did not checked 8.4 for 8.4 funcionalities that could help/enhance the implementation.

Thanks regina!

comment:21 by pimpaa, 15 years ago

Checking the documentation, DO [ALSO] is the default behavior of rules events.

Consider all rules to be doing "DO ALSO". I might make this explicit in the code soon enough. Sorry for the disturbance. :D

George

comment:22 by robe, 15 years ago

George, I apologize. I think that was just me scanning the code too quickly. I kept on seeing a DO INSTEAD in my head instead of a DO. Sorry for that. I'll test in 8.4 probably sometime next week.

comment:23 by pramsey, 15 years ago

Resolution: fixed
Status: assignedclosed

I've added a slightly modified version into extras/history_table at r4851.

Note: See TracTickets for help on using tickets.