Database Design For Revisions

Database Design for Revisions?

  1. Do not put it all in one table with an IsCurrent discriminator attribute. This just causes problems down the line, requires surrogate keys and all sorts of other problems.
  2. Design 2 does have problems with schema changes. If you change the Employees table you have to change the EmployeeHistories table and all the related sprocs that go with it. Potentially doubles you schema change effort.
  3. Design 1 works well and if done properly does not cost much in terms of a performance hit. You could use an xml schema and even indexes to get over possible performance problems. Your comment about parsing the xml is valid but you could easily create a view using xquery - which you can include in queries and join to. Something like this...
CREATE VIEW EmployeeHistory
AS
, FirstName, , DepartmentId

SELECT EmployeeId, RevisionXML.value('(/employee/FirstName)[1]', 'varchar(50)') AS FirstName,

RevisionXML.value('(/employee/LastName)[1]', 'varchar(100)') AS LastName,

RevisionXML.value('(/employee/DepartmentId)[1]', 'integer') AS DepartmentId,

FROM EmployeeHistories

Database Design with Change History

One way is to have a MyTableNameHistory for every table in your database, and make its schema identical to the schema of table MyTableName, except that the Primary Key of the History table has one additional column named effectiveUtc as a DateTime. For example, if you have a table named Employee,

Create Table Employee
{
employeeId integer Primary Key Not Null,
firstName varChar(20) null,
lastName varChar(30) Not null,
HireDate smallDateTime null,
DepartmentId integer null
}

Then the History table would be

Create Table EmployeeHistory
{
employeeId integer Not Null,
effectiveUtc DateTime Not Null,
firstName varChar(20) null,
lastName varChar(30) Not null,
HireDate smallDateTime null,
DepartmentId integer null,
Primary Key (employeeId , effectiveUtc)
}

Then, you can put a trigger on Employee table, so that every time you insert, update, or delete anything in the Employee table, a new record is inserted into the EmployeeHistory table with the exact same values for all the regular fields, and current UTC datetime in the effectiveUtc column.

Then to find the values at any point in the past, you just select the record from the history table whose effectiveUtc value is the highest value prior to the asOf datetime you want the value as of.

 Select * from EmployeeHistory h
Where EmployeeId = @EmployeeId
And effectiveUtc =
(Select Max(effectiveUtc)
From EmployeeHistory
Where EmployeeId = h.EmployeeId
And effcetiveUtc < @AsOfUtcDate)

Database design for create/revise/delete request form

It sounds to me like a revision can change anything about a course, so if you had separate CreateRequest and ReviseRequest tables, pretty much every column would have to be duplicated. I don't see any good reason to do that.

Deletion might arguably be more debatable as most of the data is irrelevant. Still, I don't see anything gained by making a separate table for it. So what if irrelevant fields are null? So it has a bunch of nulls.

Actually I wouldn't create separate columns for "revision reason" and "delete reason". I'd just make one column and call it "request reason" or some such.

You should have some field that tells you which type of request it is. (Or is that what FormTypeID is?)

How to version control a record in a database

Let's say you have a FOO table that admins and users can update. Most of the time you can write queries against the FOO table. Happy days.

Then, I would create a FOO_HISTORY table. This has all the columns of the FOO table. The primary key is the same as FOO plus a RevisionNumber column. There is a foreign key from FOO_HISTORY to FOO. You might also add columns related to the revision such as the UserId and RevisionDate. Populate the RevisionNumbers in an ever-increasing fashion across all the *_HISTORY tables (i.e. from an Oracle sequence or equivalent). Do not rely on there only being one change in a second (i.e. do not put RevisionDate into the primary key).

Now, every time you update FOO, just before you do the update you insert the old values into FOO_HISTORY. You do this at some fundamental level in your design so that programmers can't accidentally miss this step.

If you want to delete a row from FOO you have some choices. Either cascade and delete all the history, or perform a logical delete by flagging FOO as deleted.

This solution is good when you are largely interested in the current values and only occasionally in the history. If you always need the history then you can put effective start and end dates and keep all the records in FOO itself. Every query then needs to check those dates.

Database Design: how to track history?

I prefer to have additional historical table for each versioned table. Same structure as main table with time_from and time_to additional fields.
Transparently filled with triggers. time_to of latest revision set to far far future.

State for specified moment can be retrieved with query like this:

SELECT * FROM user_history 
WHERE time_from >= '2012-02-01' AND time_to <= '2012-02-01'

As for me, storing history within main table is not generally a good idea, as it requires complicated conditions when retrieving or joining current data.

database design

Option 1:

Create a Forms table, that is the central source of Form IDs. When inserting into Tasks/Briefs/etc (I assume there may be many more), you first insert into the Forms table, then use that ID as a PK/FK in Tasks or Briefs.

Then you have FormID in Revisions just reference the FormID in Forms.

You'd probably want to record the form type in the Forms table, and possibly enforce that if you've inserted say "1,Task" into Forms, that the only table you can insert 1 into is the Tasks table. There are ways to do that.

You might also want to hide some of these facts behind views/triggers (so you just insert into Tasks, and behind the scenes it does the insert into Forms, etc).

Option 2:

Create one column per table in the Revisions table. Add a constraint such that exactly one of these columns is not null. This does allow you to use more specific foreign key constraints (unlike option 1, where just because there's a Form ("1,Task") in Forms, you couldn't guarantee there would be a row in the Tasks table). But if you're adding new form types, you have to add more columns to Revisions.

There are some other options, but these are the two that spring to mind.

Database design: Keeping track of tag changes between revisions

Do not overcomplicate your schema and do not optimize prematurely. Pair-wise comparison of revision tags is a very cheap operation (provided you have all data in memory) and it can be done while prepararing view model.



Related Topics



Leave a reply



Submit