Efficiently Storing 7.300.000.000 Rows

Efficiently storing 7.300.000.000 rows

Use partitioning. With your read pattern you'd want to partition by entity_id hash.

Efficient MySQL schema with partitioning for huge dataset (7.300.000.000 rows and roughly 80 GB of data)

One thing I don't quite understand, is how you plan to prune your data. You have 2M rows per day, but you haven't specified how much data you plan to keep. At some point you will want to expire data by age (in all likelihood).

At this point, you'll want to do it by dropping partitions, NOT by executing a delete which locks up every single partition for an incredibly long time (as it has to do a full table scan to find the rows to delete), then leaves your table no smaller as the partitions are full of holes.

Partitioning by hash of entity_id might seem sensible for searching, but partitioning by time could ease contention when you come to prune old data, and will definitely be a good thing.

MyISAM has a feature called "concurrent insert" which you will almost definitely need to use all the time in order to achieve concurrency and performance; this mandates a "no deletes" rule, meaning that you can only do deletes by dropping partitions.

But dropping partitions is also good because you can get the disc space back.

Having said all of this, 80G isn't that big and I might have been tempted to store it all in a single table, and use InnoDB to enable concurrent access.

Oh yes, and if you did use InnoDB, you could have a primary key of entity_id, date_id, which means it would cluster rows with the same entity_id. You'd probably want a secondary index on date_id to enable efficient pruning.

Please test this with your production data sizes and let us know what you find!

Large primary key: 1+ billion rows MySQL + InnoDB?

The only definitive answer is to try both and test and see what happens.

Generally, MyISAM is faster for writes and reads, but not both at the same time. When you write to a MyISAM table the entire table gets locked for the insert to complete. InnoDB has more overhead but uses row-level locking so that reads and writes can happen concurrently without the problems that MyISAM's table locking incurs.

However, your problem, if I understand it correctly, is a little different. Having only one column, that column being a primary key has an important consideration in the different ways that MyISAM and InnoDB handle primary key indexes.

In MyISAM, the primary key index is just like any other secondary index. Internally each row has a row id and the index nodes just point to the row ids of the data pages. A primary key index is not handled differently than any other index.

In InnoDB, however, primary keys are clustered, meaning they stay attached to the data pages and ensure that the row contents remain in physically sorted order on disk according to the primary key (but only within single data pages, which themselves could be scattered in any order.)

This being the case, I would expect that InnoDB might have an advantage in that MyISAM would essentially have to do double work -- write the integer once in the data pages, and then write it again in the index pages. InnoDB wouldn't do this, the primary key index would be identical to the data pages, and would only have to write once. It would only have to manage the data in one place, where MyISAM would needlessly have to manage two copies.

For either storage engine, doing something like min() or max() should be trivial on an indexed column, or just checking the existence of a number in the index. Since the table is only one column no bookmark lookups would even be necessary as the data would be represented entirely within the index itself. This should be a very efficient index.

I also wouldn't be all that worried about the size of the table. Where the width of a row is only one integer, you can fit a huge number of rows per index/data page.

Large MySQL tables

Whatever solution you use, since you say your database will be write-heavy you need to make sure the whole table doesn't get locked on writes. This rules out MyISAM, which some have suggested. MyISAM will lock the table on an update,delete or insert. That means any client who wants to read from the table will have to wait for the write to finish. Dunno what the INSERT LOW PRIORITY does though, probably some hack around table-locking :-)

If you simply must use MySQL, you'll want InnoDB, which doesn't lock on write. I dunno how MySQL does VACUUM's InnoDB tables (InnoDB is MVCC like PostgreSQL and so needs to clean up)... but you'll have to take that into consideration if you are doing a lot of updates or deletes.



Related Topics



Leave a reply



Submit