How to Best Handle the Storage of Historical Data

How to best handle the storage of historical data?

If the requirement is solely for reporting, consider building a separate data warehouse. This lets you use data structures like slowly changing dimensions that are much better for historical reporting but don't work well in a transactional system. The resulting combination also moves the historical reporting off your production database which will be a performance and maintenance win.

If you need this history to be available within the application then you should implement some sort of versioning or logical deletion feature or make everything fully contra and restate (i.e. transactions never get deleted, just reversed out and restated). Think very carefully about whether you really need this as it will add a lot of complexity. Making a transactional application that can reconstruct historical state correctly is considerably harder than it looks. Financial software (e.g. insurance underwriting sytems) fails to do this a lot more than you might think.

If you need the history solely for audit logging, make shadow tables and audit logging triggers. This is much simpler and more robust than trying to correctly and comprehensively implement audit logging within the application. The triggers will also pick up changes to the database from sources outside the application.

How to Store Historical Data

Supporting historical data directly within an operational system will make your application much more complex than it would otherwise be. Generally, I would not recommend doing it unless you have a hard requirement to manipulate historical versions of a record within the system.

If you look closely, most requirements for historical data fall into one of two categories:

Audit logging: This is better off done with audit tables. It's fairly easy to write a tool that generates scripts to create audit log tables and triggers by reading metadata from the system data dictionary. This type of tool can be used to retrofit audit logging onto most systems. You can also use this subsystem for changed data capture if you want to implement a data warehouse (see below).
Historical reporting: Reporting on historical state, 'as-at' positions or analytical reporting over time. It may be possible to fulfil simple historical reporting requirements by quering audit logging tables of the sort described above. If you have more complex requirements then it may be more economical to implement a data mart for the reporting than to try and integrate history directly into the operational system.

Slowly changing dimensions are by far the simplest mechanism for tracking and querying historical state and much of the history tracking can be automated. Generic handlers aren't that hard to write. Generally, historical reporting does not have to use up-to-the-minute data, so a batched refresh mechanism is normally fine. This keeps your core and reporting system architecture relatively simple.

If your requirements fall into one of these two categories, you are probably better off not storing historical data in your operational system. Separating the historical functionality into another subsystem will probably be less effort overall and produce transactional and audit/reporting databases that work much better for their intended purpose.

What is the best way to store historical data in SQL Server 2005/2008?

it DEPENDS on the applications usage patterns... If usage patterns indicate that the historical data will be queried more often than the current values, then put them all in one table... But if Historical queries are the exception, (or less than 10% of the queries), and the performance of the more common current value query will suffer from putting all data in one table, then it makes sense to separate that data into it's own table...

Database structure for storing historical data

When I've encountered such problems one alternative is to make the order the history table. Its functions the same but its a little easier to follow

orders
------
orderID
customerID
address
City
state
zip

customers
---------
customerID
address
City
state
zip

EDIT: if the number of columns gets to high for your liking you can separate it out however you like.

If you do go with the other option and using history tables you should consider using bitemporal data since you may have to deal with the possibility that historical data needs to be corrected. For example Customer Changed his current address From A to B but you also have to correct address on an existing order that is currently be fulfilled.

Also if you are using MS SQL Server you might want to consider using indexed views. That will allow you to trade a small incremental insert/update perf decrease for a large select perf increase. If you're not using MS SQL server you can replicate this using triggers and tables.

Keeping historical data in the database

In general you should keep all the data in the same table. SQL Server is great at handling large volumes of data and it will make your life a lot easier (reporting, querying, coding, maintenance) if it's all in one place. If your data is indexed appropriately then you'll be just fine with thousands of new records per month.

How to choose the right database for historic data storage for small company

Modern relational databases (MySQL, Postgres) support "JSON" columns, so if your data does not have a known fixed schema, they are a viable option too. Similarly, modern NoSQL databases such as MongoDB have added traditional SQL features such as transactions. So the distinction blurs.

To determine what database fits your needs, you need to think about how the data is accessed:

Do you need efficient updating of records (and if so, are transactions needed), or just want to add new ones?
Do you need to fetch specific records by some key or process lots of records to summarize data (the latter is called "analytical processing")
Do you expect to have multiple tables with queries joining data between them? (it sounds like you currently don't need this, but it pays to think about the future when it comes to databases)

If updating is not needed and you need to aggregate many records, you can use something like AWS Athena / Presto / Drill to query plain files stored on a local server or on something like AWS S3.

Cassandra and HBase are specialized databases that are highly scalable and sacrifice some functionality for that scalability. Seems inappropriate for such a small database.

MongoDB is easy to manage and horizontally scalable but has some limitations, given its NoSQL heritage.

MySQL/Postgres are both easy to manage and will easily handle 10s of GBs. Postgres is somewhat more sophisticated and capable when it comes to analytical processing. MySQL is easier to manage and very performant when it comes to "transaction processing" -- that is, updating and querying specific records (when you have an index quickly leading you to the wanted records)

Best practices with historical data in MySQL database

It's a common mistake to worry about "large" tables and performance. If you can use indexes to access your data, it doesn't really matter if you have 1000 of 1000000 records - at least not so as you'd be able to measure. The design you mention is commonly used; it's a great design where time is a key part of the business logic.

For instance, if you want to know what the price of an item was at the point when the client placed the order, being able to search product records where valid_from < order_date and valid_until is either null or > order_date is by far the easiest solution.

This isn't always the case - if you're keeping the data around just for archive purposes, it may make more sense to create archive tables. However, you have to be sure that time is really not part of the business logic, otherwise the pain of searching multiple tables will be significant - imagine having to search either the product table OR the product_archive table every time you want to find out about the price of a product at the point the order was placed.

What is the best way to store a historical price list in a MySQL table?

Separate the need for historical data from the need for current price. This means:

1) Keep the current price in the products table.

2) When the price changes, insert the new price into the history table with only the start date. You don't really need the end date because you can get it from the previous row. (You can still put it in, it makes querying easier)

Also remember that your order history provides another kind of history, the actual purchases at a given price over time.

How to store historical records in a history table in SQL Server

Logging changes is something I've generally done using triggers on a base table to record changes in a log table. The log table has additional columns to record the database user, action and date/time.

create trigger Table-A_LogDelete on dbo.Table-A
  for delete
as
  declare @Now as DateTime = GetDate()
  set nocount on
  insert into Table-A-History
    select SUser_SName(), 'delete-deleted', @Now, *
      from deleted
go
exec sp_settriggerorder @triggername = 'Table-A_LogDelete', @order = 'last', @stmttype = 'delete'
go
create trigger Table-A_LogInsert on dbo.Table-A
  for insert
as
  declare @Now as DateTime = GetDate()
  set nocount on
  insert into Table-A-History
    select SUser_SName(), 'insert-inserted', @Now, *
      from inserted
go
exec sp_settriggerorder @triggername = 'Table-A_LogInsert', @order = 'last', @stmttype = 'insert'
go
create trigger Table-A_LogUpdate on dbo.Table-A
  for update
as
  declare @Now as DateTime = GetDate()
  set nocount on
  insert into Table-A-History
    select SUser_SName(), 'update-deleted', @Now, *
      from deleted
  insert into Table-A-History
    select SUser_SName(), 'update-inserted', @Now, *
      from inserted
go
exec sp_settriggerorder @triggername = 'Table-A_LogUpdate', @order = 'last', @stmttype = 'update'

Logging triggers should always be set to fire last. Otherwise, a subsequent trigger may rollback the original transaction, but the log table will have already been updated. This is a confusing state of affairs.

How to Best Handle the Storage of Historical Data