Varchar as Foreign Key/Primary Key in Database Good or Bad

VARCHAR as foreign key/primary key in database good or bad?

The problem with VARCHAR being used for any KEY is that they can hold WHITE SPACE. White space consists of ANY non-screen-readable character, like spaces tabs, carriage returns etc. Using a VARCHAR as a key can make your life difficult when you start to hunt down why tables aren't returning records with extra spaces at the end of their keys.

Sure, you CAN use VARCHAR, but you do have to be very careful with the input and output. They also take up more space and are likely slower when doing a Queries.

Integer types have a small list of 10 characters that are valid, 0,1,2,3,4,5,6,7,8,9. They are a much better solution to use as keys.

You could always use an integer-based key and use VARCHAR as a UNIQUE value if you wanted to have the advantages of faster lookups.

Can I use VARCHAR as the PRIMARY KEY?

Of course you can, in the sense that your RDBMS will let you do it. The answer to a question of whether or not you should do it is different, though: in most situations, values that have a meaning outside your database system should not be chosen to be a primary key.

If you know that the value is unique in the system that you are modeling, it is appropriate to add a unique index or a unique constraint to your table. However, your primary key should generally be some "meaningless" value, such as an auto-incremented number or a GUID.

The rationale for this is simple: data entry errors and infrequent changes to things that appear non-changeable do happen. They become much harder to fix on values which are used as primary keys.

Using varchar as the primary key? bad idea? or ok?

It totally depends on the data. There are plenty of perfectly legitimate cases where you might use a VARCHAR primary key, but if there's even the most remote chance that someone might want to update the column in question at some point in the future, don't use it as a key.

Is there a REAL performance difference between INT and VARCHAR primary keys?

You make a good point that you can avoid some number of joined queries by using what's called a natural key instead of a surrogate key. Only you can assess if the benefit of this is significant in your application.

That is, you can measure the queries in your application that are the most important to be speedy, because they work with large volumes of data or they are executed very frequently. If these queries benefit from eliminating a join, and do not suffer by using a varchar primary key, then do it.

Don't use either strategy for all tables in your database. It's likely that in some cases, a natural key is better, but in other cases a surrogate key is better.

Other folks make a good point that it's rare in practice for a natural key to never change or have duplicates, so surrogate keys are usually worthwhile.

Should I design a table with a primary key of varchar or int?

I would definitely recommend using an INT NOT NULL IDENTITY(1,1) field in each table as the
primary key.

With an IDENTITY field, you can let the database handle all the details of making sure it's really unique and all, and the INT datatype is just 4 bytes, and fixed, so it's easier and more suited to be used for the primary (and clustering) key in your table.

And you're right - INT is an INT is an INT - it will not change its size of anything, so you won't have to ever go recreate and/or update your foreign key relations.

Using a VARCHAR(10) or (20) just uses up too much space - 10 or 20 bytes instead of 4, and what a lot of folks don't know - the clustering key value will be repeated on every single index entry on every single non-clustered index on the table, so potentially, you're wasting a lot of space (not just on disk - that's cheap - but also in SQL Server's main memory). Also, since it's variable (might be 4, might be 20 chars) it's harder to SQL server to properly maintain a good index structure.

Marc

Should I better use VARCHAR(3) or BIGINT() for my primary key with PostgreSQL?

Performance won't matter much in this case, and a few bytes of storage more or less won't either.

You should choose the primary key that is most practical. If there is a natural three-character unique identifier, and you are sure that the identifier will never change for an existing table row, then there is no problem with using varchar(3) as a primary key for a table.

Note that there are people who think that you should always use a numeric artificial primary key. These people will probably disagree, and I am not willing to start a holy war here. The only convincing argument for this stance that I have heard is that there are ORMs and other abstraction layers that cannot work with anything but a numerical primary key. But if you are not hobbled by such a tool, you need not worry.

Can we use constraint name of a primary key as foreign key reference?

I don't think so because there's no way that the RDBMS could know whether PK_name is a column or a constraint name so I suggest if you stick with the usual :

create table A (
department_id int,
college_id int,
constraint Pk_name primary key(department_id,college_id)
);

create table B (
student_name varchar(75),
department_id int,
college_id int,
foreign key(department_id,college_id) references A(department_id,college_id)
);

I will update the answer once I find an other answer .

Can a foreign key be the only primary key

Yes. Because of the following reasons.

  1. Making them the primary key will force uniqueness (as opposed to imply it).
  2. The primary key will presumably be clustered (depending on the dbms) which will improve performance for some queries.
  3. It saves the space of adding a unique constraint which in some DBMS also creates a unique index

Is string or int preferred for foreign keys?

Is string or int preferred for foreign keys?

It depends

There are many existing discussions on the trade-offs between Natural and Surrogate Keys - you will need to decide on what works for you, and what the 'standard' is within your organisation.

In the OP's case, there is both a surrogate key (int userId) and a natural key (char or varchar username). Either column can be used as a Primary key for the table, and either way, you will still be able to enforce uniqueness of the other key.

Here are some considerations when choosing one way or the other:

The case for using Surrogate Keys (e.g. UserId INT AUTO_INCREMENT)

If you use a surrogate, (e.g. UserId INT AUTO_INCREMENT) as the Primary Key, then all tables referencing table MyUsers should then use UserId as the Foreign Key.

You can still however enforce uniqueness of the username column through use of an additional unique index, e.g.:

CREATE TABLE `MyUsers` (
`userId` int NOT NULL AUTO_INCREMENT,
`username` varchar(100) NOT NULL,
... other columns
PRIMARY KEY(`userId`),
UNIQUE KEY UQ_UserName (`username`)

As per @Dagon, using a narrow primary key (like an int) has performance and storage benefits over using a wider (and variable length) value like varchar. This benefit also impacts further tables which reference MyUsers, as the foreign key to userid will be narrower (fewer bytes to fetch).

Another benefit of the surrogate integer key is that the username can be changed easily without affecting tables referencing MyUsers.
If the username was used as a natural key, and other tables are coupled to MyUsers via username, it makes it very inconvenient to change a username (since the Foreign Key relationship would otherwise be violated). If updating usernames was required on tables using username as the foreign key, a technique like ON UPDATE CASCADE is needed to retain data integrity.

The case for using Natural Keys (i.e. username)

One downside of using Surrogate Keys is that other tables which reference MyUsers via a surrogate key will need to be JOINed back to the MyUsers table if the Username column is required. One of the potential benefits of Natural keys is that if a query requires only the Username column from a table referencing MyUsers, that it need not join back to MyUsers to retrieve the user name, which will save some I/O overhead.

Does a surrogate (INT) key almost always yield better performance than an unique natural (VARCHAR) key (in MySQL)?

I would use the ISBN as the primary key.

Primary key lookups in MySQL's default storage engine InnoDB are more efficient than lookups by secondary index.

It's true an integer takes less storage space than a 24-character varchar, but in your case, I assume you have to store the ISBN anyway. If you could use an integer instead of the ISBN, that would save storage.

The comment above that natural keys tend to violate uniqueness is a good warning in general. The violations usually come from the marketing department. ;-)

But for a given dataset, you can be sure that the natural key is free of duplicates. If you do experience an error reading the ISBN in your library collection, the librarian will have to resolve that manually. But I don't expect that to happen very often for 500,000 books.

Tip: Define the varchar with a binary collation, and it'll be a bit faster to do string comparisons. For example:

CREATE TABLE Books (
isbn varchar(24) COLLATE utf8mb4_bin,
-- ...other columns...
PRIMARY KEY (isbn)
) DEFAULT CHARSET=utf8mb4;


Related Topics



Leave a reply



Submit