Sql Best Practices - Ok to Rely on Auto Increment Field to Sort Rows Chronologically

SQL Best Practices - Ok to rely on auto increment field to sort rows chronologically?

You asked for "best practices", rather than "not terrible practices" so: no, you should not rely on an autoincremented primary key to establish chronology. One day you're going to introduce a change to the db design and that will break. I've seen it happen.

A datetime column whose default value is GETDATE() has very little overhead (about as much as an integer) and (better still) tells you not just sequence but actual date and time, which often turns out to be priceless. Even maintaining an index on the column is relatively cheap.

These days, I always put a CreateDate column data objects connected to real world events (such as account creation).

Edited to add:

If exact chronology is crucial to your application, you can't rely on either auto-increment or timestamps (since there can always be identical timestamps, no matter how high the resolution). You'll probably have to make something application-specific instead.

Questionable SQL practice - Order By id rather than creation time

In typical practice you can almost always assume that an autoincrement id can be sorted to give you the records in creation order (either direction). However, you should note that this is not considered portable in terms of your data. You might move your data to another system where the keys are recreated, but the created_at data is the same.

There is actually a pretty good StackOverflow discussion of this issue.

The basic summary is the first solution, ordering by created_at, is considered best practice. Be sure, however, to properly index the created_at field to give the best performance.

Auto Increment after delete in MySQL

What you're trying to do sounds dangerous, as that's not the intended use of AUTO_INCREMENT.

If you really want to find the lowest unused key value, don't use AUTO_INCREMENT at all, and manage your keys manually. However, this is NOT a recommended practice.

Take a step back and ask "why you need to recycle key values?" Do unsigned INT (or BIGINT) not provide a large enough key space?

Are you really going to have more than 18,446,744,073,709,551,615 unique records over the course of your application's lifetime?

Best practice question for MySQL: order by id or date?

If there is a chance you'll get two added with the same date, you'll probably need:

SELECT balance FROM my_table ORDER BY date_added DESC,id DESC LIMIT 1;

(note the 'descending' clause on both fields).

However, you will need to take into account what you want to happen when someone adds an adjusting entry of the 2nd of February which is given the date 31st January to ensure the month of January is complete. It will have an ID greater than those made on the 1st of February.

Generally, accounting systems just work on the date. Perhaps if you could tell us why the order is important, we could make other suggestions.


In response to your comment:

I would love to hear any other ideas or advice you might have, even if they're off-topic since I have zero knowledge of accounting-type database models.

I would provide a few pieces of advice - this is all I could think of immediately, I usually spew forth much more "advice" with even less encouragement :-) The first two, more database-related than accounting-relared, are:

First, do everything in third normal form and only revert if and when you have performance problems. This will save you a lot of angst with duplicate data which may get out of step. Even if you do revert, use triggers and other DBMS capabilities to ensure that data doesn't get out of step.

An example, if you want to speed up your searches on a last_name column, you can create an upper_last_name column (indexed) then use that to locate records matching your already upper-cased search term. This will almost always be faster than the per-row function upper(last_name). You can use an insert/update trigger to ensure the upper_last_name is always set correctly and this incurs the cost only when the name changes, not every time you search.

Secondly, don't duplicate data even across tables (like your current schema) unless you can use those same trigger-type tricks to guarantee the data won't get out of step. What will your customer do when you send them an invoice where the final balance doesn't match the starting balance plus purchases? That's not going to make your company look very professional :-)

Thirdly (and this is more accounting-related), you generally don't need to worry about the number of transactions when calculating balances on the fly. That's because accounting systems usually have a roll-over function at year end which resets the opening balances.

So you're usually never having to process more than a year's worth of data at once which, unless you're the US government or Microsoft, is not that onerous.

MySQL AUTO_INCREMENT does not ROLLBACK

Let me point out something very important:

You should never depend on the numeric features of autogenerated keys.

That is, other than comparing them for equality (=) or unequality (<>), you should not do anything else. No relational operators (<, >), no sorting by indexes, etc. If you need to sort by "date added", have a "date added" column.

Treat them as apples and oranges: Does it make sense to ask if an apple is the same as an orange? Yes. Does it make sense to ask if an apple is larger than an orange? No. (Actually, it does, but you get my point.)

If you stick to this rule, gaps in the continuity of autogenerated indexes will not cause problems.

Custom SERIAL / autoincrement per group of values

Concept

There are at least several ways to approach this. First one that comes to my mind:

Assign a value for category_id column inside a trigger executed for each row, by overwriting the input value from INSERT statement.

Action

Here's the SQL Fiddle to see the code in action


For a simple test, I'm creating article table holding categories and their id's that should be unique for each category. I have omitted constraint creation - that's not relevant to present the point.

create table article ( id serial, category varchar, category_id int )

Inserting some values for two distinct categories using generate_series() function to have an auto-increment already in place.

insert into article(category, category_id)
select 'stackoverflow', i from generate_series(1,1) i
union all
select 'stackexchange', i from generate_series(1,3) i

Creating a trigger function, that would select MAX(category_id) and increment its value by 1 for a category we're inserting a row with and then overwrite the value right before moving on with the actual INSERT to table (BEFORE INSERT trigger takes care of that).

CREATE OR REPLACE FUNCTION category_increment()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
DECLARE
v_category_inc int := 0;
BEGIN
SELECT MAX(category_id) + 1 INTO v_category_inc FROM article WHERE category = NEW.category;
IF v_category_inc is null THEN
NEW.category_id := 1;
ELSE
NEW.category_id := v_category_inc;
END IF;
RETURN NEW;
END;
$$

Using the function as a trigger.

CREATE TRIGGER trg_category_increment 
BEFORE INSERT ON article
FOR EACH ROW EXECUTE PROCEDURE category_increment()

Inserting some more values (post trigger appliance) for already existing categories and non-existing ones.

INSERT INTO article(category) VALUES 
('stackoverflow'),
('stackexchange'),
('nonexisting');

Query used to select data:

select category, category_id From article order by 1,2

Result for initial inserts:

category    category_id
stackexchange 1
stackexchange 2
stackexchange 3
stackoverflow 1

Result after final inserts:

category    category_id
nonexisting 1
stackexchange 1
stackexchange 2
stackexchange 3
stackexchange 4
stackoverflow 1
stackoverflow 2


Related Topics



Leave a reply



Submit