Database Structure for Storing Historical Data

Database structure for storing historical data

When I've encountered such problems one alternative is to make the order the history table. Its functions the same but its a little easier to follow

orders
------
orderID
customerID
address
City
state
zip

customers
---------
customerID
address
City
state
zip

EDIT: if the number of columns gets to high for your liking you can separate it out however you like.

If you do go with the other option and using history tables you should consider using bitemporal data since you may have to deal with the possibility that historical data needs to be corrected. For example Customer Changed his current address From A to B but you also have to correct address on an existing order that is currently be fulfilled.

Also if you are using MS SQL Server you might want to consider using indexed views. That will allow you to trade a small incremental insert/update perf decrease for a large select perf increase. If you're not using MS SQL server you can replicate this using triggers and tables.

How to Store Historical Data

Supporting historical data directly within an operational system will make your application much more complex than it would otherwise be. Generally, I would not recommend doing it unless you have a hard requirement to manipulate historical versions of a record within the system.

If you look closely, most requirements for historical data fall into one of two categories:

  • Audit logging: This is better off done with audit tables. It's fairly easy to write a tool that generates scripts to create audit log tables and triggers by reading metadata from the system data dictionary. This type of tool can be used to retrofit audit logging onto most systems. You can also use this subsystem for changed data capture if you want to implement a data warehouse (see below).

  • Historical reporting: Reporting on historical state, 'as-at' positions or analytical reporting over time. It may be possible to fulfil simple historical reporting requirements by quering audit logging tables of the sort described above. If you have more complex requirements then it may be more economical to implement a data mart for the reporting than to try and integrate history directly into the operational system.

    Slowly changing dimensions are by far the simplest mechanism for tracking and querying historical state and much of the history tracking can be automated. Generic handlers aren't that hard to write. Generally, historical reporting does not have to use up-to-the-minute data, so a batched refresh mechanism is normally fine. This keeps your core and reporting system architecture relatively simple.

If your requirements fall into one of these two categories, you are probably better off not storing historical data in your operational system. Separating the historical functionality into another subsystem will probably be less effort overall and produce transactional and audit/reporting databases that work much better for their intended purpose.

Database structure for storing historical data and reporting on data annually

If you're looking for a single year at a time you could do:

select max(string_start) keep (dense_rank last order by create_date) as string_start,
max(string_end) keep (dense_rank last order by create_date) as string_end
from stringlist
where create_date < last_day(to_date(to_char(2017), 'YYYY')) + 1;

STRING_START STRING_END
------------ ----------
.12 5.2

SQL Fiddle

or:

select string_start, string_end
from (
select string_start, string_end, row_number() over (order by create_date desc) as rn
from stringlist
where create_date < last_day(to_date(to_char(2017), 'YYYY')) + 1
)
where rn = 1;

STRING_START STRING_END
------------ ----------
.12 5.2

SQL Fiddle

In both cases you're excluding all data before the start of the following year, and then finding the values for the row that's left with the latest create date.

What is the correct database structure to store historical data?

There's only one sane answer:

team_rating:
team_id, rating, start_date, end_date

Making all ranges closed by using the creation date of the team as the first rating's start_date, and some arbitrarily distant future date (eg 2199-01-01) as the end_date for the current row. all dates being inclusive.

Queries to find the rating at any date are then a simple

select rating
from team_rating
where team_id = $id
and $date between start_date and end_date

and rating history is just

select start_date, rating
from team_rating
where team_id = $id
order by start_date

It's key that both start and end dates are stored, otherwise the queries are trainwrecks.

Implement database schema for organizing historical stock data

(It's not a "stupid problem", just a "novice question".)

PRIMARY KEY(ts_code, trade_date)
INDEX(trade_date)

But have trade_date DATE (not INT)

DECIMAL(6,2) limits you to 9999.99; is that OK?

Use ENGINE=InnoDB

Be cautious of other Questions that are not tagged [mysql] or [mariadb]; they are likely to have syntax and other suggestions that are not good for MySQL.

If you include "time", it is probably better to use a single DATETIME column, not two columns (DATE and TIME). However, this leads to some tricky business when requesting info for a given date.

How to properly organize historical data in the same table?

Assuming

  • A competition is composed of 1 or more rounds,
  • A round is optionally composed of 1 or more groups.

Then I recommend

  • One table containing one row per 'competition'.
  • One table containing one row per 'round'. It should contain a competition_id that is a FK to competition.id.
  • One table containing one row per 'group'. It should contain a round_id that is a FK to round.id.

(Etc.)

Those are examples of doing "1:many" mappings. (Note "0 or more" and "optionally" are merely edge cases of "1:many", and do not require extra effort.)

I say "one table" because "vertical splitting" is rarely unnecessary. Simply put all the attributes for a "competition" in a single table. When some attribute (such as the 'rounds') is repeated, then it cannot be put in the same table.

(The table name competition_rounds, though descriptive, was confusing me.)

A related question... Are all the 'rounds' of a 'competition' played in a single country? I see country_id in competition; I wonder if it should be moved to rounds?

How to choose the right database for historic data storage for small company

Modern relational databases (MySQL, Postgres) support "JSON" columns so if your data does not have a known fixed schema they are a viable option too. Similarly, modern NoSQL databases such as mongodb have added traditional SQL features such as transactions. So the distinction blurs.

To determine what database fits your needs you need to think about how the data is accessed:

  • Do you need efficient updating of records (and if so, are transactions needed) or just want to add new ones?

  • Do you need to fetch specific records by some key or process lots of records to summarize data (the latter is called "analytical processing")

  • Do you expect to have multiple tables with queries joining data between them? (sounds like you currently don't need this but it pays to think about the future when it comes to databases)

If updating is not needed and you need to aggregate may records, you can use something like AWS Athena / Presto / Drill to query plain files stored on a local server or in on something like AWS S3.

Cassandra, HBASE are specialized databases that are highly scalable and sacrifice some functionality for that scalability. Seems inappropriate for such a small database.

Mongodb is easy to manage and horizontally scalable but has some limitations given its NoSQL heritage.

MySQL/Postgres are both easy to manage and will easily handle 10's of GBs. Postgres is somewhat more sophisticated and capable when it comes to analytical processing. MySQL is easier to manage and very performant when it comes to "transaction processing" -- that is, updating and querying specific records (when you have an index quickly leading you to the wanted records)



Related Topics



Leave a reply



Submit