4

My entire database occasionally has entries which are wrong, but instead of altering the data directly I'd like the ability to keep a revision of changes.

These changes occur very rarely.

Ideally something like this: -

 (original table fields) | revision_version | origin | user | timestamp

So say I had a table called posts with the following schema: -

title | description | timestamp | author

An additional table called posts_revisions would be created thusly: -

title | description | timestamp | author | revision_version | origin | user | timestamp
  • origin being the source of the change, be it a bot, user generated or what have you.

As you can imagine this is a rather large change to the existing database, my current concern is the performance hit of checking the _revisions tables for every query. Is this best practice for this sort of thing?

1
  • Don't be afraid to duplicate origin, user and timestamp in both tables. You might want to delete revisions in a background job. Delete all revisions whose post doesn't exist. In theory you could even lazy-create the revisions with log mining. Bigger transactions and lower amortized cost. Commented Aug 2, 2012 at 13:00

2 Answers 2

2

For this type of problem, I keep a current table and a history table.

The history table has the following additional columns:

  • HistoryID
  • EffectiveDate
  • EndDate
  • VersionNumber
  • CreatedBy
  • CreatedAt

The effective and end dates are the time span where the values are valid. The version is just incremented every time there is a change for a record. The id, CreatedAt, and CreatedBy are columns I put into almost every table in the database.

Generally, I keep the history table up-to-date with nightly jobs, that compare the tables and then use MERGE to combine the data. An alternative is to wrap all changes in stored procedures, and to update both tables there. Another alternative is to use triggers, that detect when a change occurs. However, I shy away from triggers, preferring the first two alternatives.

I must admit that disk space is not a big consideration for these tables. So, there is no problem storing the data twice, once in the results once in the history. It would be just a minor tweak to store only history in the history table, with the current records in the "current" table.

One downside to this approach is changing the structure of the base table. If you want to add a column, you need to add it to the history table as well as the base table.

Sign up to request clarification or add additional context in comments.

Comments

1

If the tables are used for summary purposes (especially by business users if they have some SQL access) I think it is best to remove the data and place it into another table. While flags and revisions are sometimes fine, when you have to do something along the lines of select sum(select someVar where revision_version=max(revision_version and someID=ID)) then it really gets beyond simple.

If you have a table that is being used for quick and nasty data collection, replace the data and if needed, place the old data into a revisions table. If only some application will access it AND it isn't a performance issue then keep it in the main table.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.