How to Replace Django's Primary Key with a Different Integer That Is Unique for That Table

How to replace Django's primary key with a different integer that is unique for that table

The Idea

I would recommend to you the same approach that is used by Instagram. Their requirements seems to closely follow yours.

Generated IDs should be sortable by time (so a list of photo IDs, for
example, could be sorted without fetching more information about the
photos) IDs should ideally be 64 bits (for smaller indexes, and better
storage in systems like Redis) The system should introduce as few new
‘moving parts’ as possible—a large part of how we’ve been able to
scale Instagram with very few engineers is by choosing simple,
easy-to-understand solutions that we trust.

They came up with a system that has 41 bits based on the timestamp, 13 o the database shard and 10 for an auto increment portion. Sincce you don't appear to be using shards. You can just have 41 bits for a time based copmonent and 23 bits chosen at random. That does produce an extremely unlikely 1 in 8.3 million chance of getting a conflict if you insert records at the same time. But in practice you are never likely to hit this. Right so how about some code:

Generating IDs

START_TIME = a constant that represents a unix timestamp

def make_id():
'''
inspired by http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
'''

t = int(time.time()*1000) - START_TIME
u = random.SystemRandom().getrandbits(23)
id = (t << 23 ) | u

return id

def reverse_id(id):
t = id >> 23
return t + START_TIME

Note, START_TIME in the above code is some arbitary starting time. You can use time.time()*1000 , get the value and set that as START_TIME

Notice that the reverse_id method I have posted allows you to find out at which time the record was created. If you need to keep track of that information you can do so without having to add another field for it! So your primary key is actually saving your storage rather than increasing it!

The Model

Now this is what your model would look like.

class MyClass(models.Model):
id = models.BigIntegerField(default = fields.make_id, primary_key=True)

If you make changes to your database outside django you would need to create the equivalent of make_id as an sql function

As a foot note. This is somewhat like the approach used by Mongodb to generate it's _ID for each object.

What is the best approach to change primary keys in an existing Django app?

Agreed, your model is probably wrong.

The formal primary key should always be a surrogate key. Never anything else. [Strong words. Been database designer since the 1980's. Important lessoned learned is this: everything is changeable, even when the users swear on their mothers' graves that the value cannot be changed is is truly a natural key that can be taken as primary. It isn't primary. Only surrogates can be primary.]

You're doing open-heart surgery. Don't mess with schema migration. You're replacing the schema.

  1. Unload your data into JSON files. Use Django's own internal django-admin.py tools for this. You should create one unload file for each that will be changing and each table that depends on a key which is being created. Separate files make this slightly easier to do.

  2. Drop the tables which you are going to change from the old schema.

    Tables which depend on these tables will have their FK's changed; you can either
    update the rows in place or -- it might be simpler -- to delete and reinsert
    these rows, also.

  3. Create the new schema. This will only create the tables which are changing.

  4. Write scripts to read and reload the data with the new keys. These are short and very similar. Each script will use json.load() to read objects from the source file; you will then create your schema objects from the JSON tuple-line objects that were built for you. You can then insert them into the database.

    You have two cases.

    • Tables with PK's change changed will be inserted and will get new PK's. These must be "cascaded" to other tables to assure that the other table's FK's get changed also.

    • Tables with FK's that change will have to locate the row in the foreign table and update their FK reference.

Alternative.

  1. Rename all your old tables.

  2. Create the entire new schema.

  3. Write SQL to migrate all the data from old schema to new schema. This will have to cleverly reassign keys as it goes.

  4. Drop the renamed old tables.

 

Composite primary key in django

Try similar below code:

class MyTable(models.Model):
class Meta:
unique_together = (('key1', 'key2'),)

key1 = models.IntegerField(primary_key=True)
key2 = models.IntegerField()

or if you want only unique mixed fields:

class MyTable(models.Model):
class Meta:
unique_together = (('key1', 'key2'),)

key1 = models.IntegerField()
key2 = models.IntegerField()

EDIT: I would like to note that there is a problem with this approach if there are 3 columns. Update queries don't work because it tries to update (puts pk fields right after "SET") the fields that are unique together and obviously fails.

Django 3 primary key issues

If I understand the last (IntegrityError) error you get, I presume it's because you are manually adding an id field as the primary key. Don't do this, as Django will automatically create an auto-incrementing id primary key for you, for any model that you don't 'override' the primary key yourself.

So remove any 'id' or primary key field you've added, and then add your unique_together in the meta as you had it. If you makemigrations and migrate, this should work as expected, and you should be able to BalBalance.objects.get(month=x, year=y, bnr=z) and reliably get just one result. This is effectively the same as having a primary key across all 3 fields as was your original intention.

EDIT: Based on your comment, I now see the underlying issue, I hadn't noticed that you had managed=False in the Meta of the model. This means that django won't create/modify the table for you, and so the changes suggested above will not have had any affect on the structure. Presumably you have this since there is existing data in an existing table that you now wish to see (and potentially manipulate) in the Django admin interface.

So the issue boils down to:

  1. Django will assume (unless you specify a separate field as primary key) that you already have an auto-incrementing id column on the table. You don't have this on your table.
  2. Django doesn't support primary keys across multiple fields, which is what you DO have on your table

So the solution, annoying for you as it probably is, is to make the database match the model structure defined in python, using the structure I suggested earlier.

Therefore: don't specify a PK on the model (Django will assume it has an id column), and then manually manipulate the table in the database to add an auto-incrementing id column (and set it as the primary key, removing the PK from the other 3 fields). Next, specify the unique_together on the other 3 fields in the model, add then go and add a unique key across those fields in the DB.

Once the database reflects the model definition, your pain should be removed, and the admin should work as intended with the code you pasted in your question.

Of course, depending on whether you have other software hooked up to that database table, and how those rely on PKs ... you might have issues with those, but I can't say for sure.



Related Topics



Leave a reply



Submit