Is Using Char as a Primary/Foreign Key a No No

Is using char as a primary/foreign key a no no?

Performance isn't really the main issue, at least not for me. The issue is more about surrogate vs natural keys.

Country codes aren't static. They can and do change. Countries change names (eg Ethiopia to Eritrea). They come into being (eg the breakup of Yugoslavia or the Soviet Union) and they cease to exist (eg West and East Germany). When this happens the ISO standard code changes.

More in Name Changes Since 1990: Countries, Cities, and More

Surrogate keys tend to be better because when these events happen the keys don't change, only columns in the reference table do.

For that reason I'd be more inclined to create country and currency tables with an int primary key instead.

That being said, varchar key fields will use more space and have certain performance disadvantages that probably won't be an issue unless you're performing a huge number of queries.

For completeness, you may want to refer to Database Development Mistakes Made by AppDevelopers.

VARCHAR as foreign key/primary key in database good or bad?

The problem with VARCHAR being used for any KEY is that they can hold WHITE SPACE. White space consists of ANY non-screen-readable character, like spaces tabs, carriage returns etc. Using a VARCHAR as a key can make your life difficult when you start to hunt down why tables aren't returning records with extra spaces at the end of their keys.

Sure, you CAN use VARCHAR, but you do have to be very careful with the input and output. They also take up more space and are likely slower when doing a Queries.

Integer types have a small list of 10 characters that are valid, 0,1,2,3,4,5,6,7,8,9. They are a much better solution to use as keys.

You could always use an integer-based key and use VARCHAR as a UNIQUE value if you wanted to have the advantages of faster lookups.

char(32) data as primary key?

Test both structures. It's not hard.

Declare Ref.UniqueData as primary key nonclustered, and set its foreign key reference to on update cascade. Load it with several million rows of data, and measure performance. (Load it with more data than you predict you'll have in five years.)

From the relational point of view, there's nothing wrong with having a primary key that's 32 bytes long. And from the relational point of view, there's nothing wrong with updating a primary key value. In the relational model, all values are updatable, and "compensating referential actions" (cascading updates and deletes) are part of the model, too.

From the SQL point of view, there's nothing wrong with having a primary key that's 32 bytes long. SQL also allows updating key values, and SQL supports cascading updates and deletes.

From the SQL Server point of view, there's nothing wrong with having a primary key that's 32 bytes long. SQL Server supports updating key values, and SQL Server supports cascading updates and deletes. Just don't make it a clustered primary key.

When I designed the production database at my previous job, I built two databases--one designed around surrogate keys, and one designed around natural keys. I wrote two sets of queries that I expected to be frequently used. They included some select, insert, update, and delete statements. There were many dozens of these. The two sets were functionally identical. (I think I originally used PostgreSQL 8.4. PostgreSQL doesn't implement clustered keys.)

I ran the test queries against each database. If memory serves, about 80% of the queries were faster using natural keys. In some cases, individual SELECT statements were 35 to 40 times faster. When queries using natural keys were slower, they weren't very much slower, and they were still plenty fast enough for the users. (I've written about these tests several times on SO and on DBA.stackexchange.com.)

I found a tipping point, where the performance of surrogate keys started beating the performance of natural keys. But by my estimates, we wouldn't hit that tipping point for 30 years. And there were plenty of tuning options and hardware improvements that made it unlikely that we'd ever need to use surrogate keys, even if PostgreSQL development stopped altogether.

Char(4) as Primary key

I always tend to use a surrogate primary key in my tables.
That is, a key that has no meaning in the business domain. A primary key is just an administrative piece of data that is required by the database ...

What would be the advantage of using 'admn' as primary key in this case ?

Error when creating foreign key of type CHAR with mysql workbench: Error 1005: Can't create table (errno: 150)

The issue you're facing isn't actually related to collation (though collation can be a cause of the error you're experiencing under different circumstances).

Your FOREIGN KEY constraint is failing because you don't have an index individually on record_status.status. You have that column as part of the composite PRIMARY KEY (record_status_id, status), but for successful foreign key constraint creation, both the referencing table and the referenced table must have indexes on exactly the columns used in the key relationship (in addition to the same data types).

Adding the FOREIGN KEY constraint implicitly creates the necessary index on the referencing table, but you must still ensure you have the corresponding index on the referenced table.

So given what you have now, if you added a single index on record_status.status, the constraint would correctly be created.

CREATE TABLE `record_status` (  
`record_status_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` char(6) NOT NULL,
`status_description` varchar(15) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`record_status_id`,`status`),
-- This would make your relationship work...
KEY (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1

However, I don't think that's the best course of action. I don't see a need for the composite primary key on (record_status_id, status), chiefly because the record_status_id is itself AUTO_INCREMENT and guaranteed to be unique. That column alone could be the PRIMARY KEY, while still adding an additional UNIQUE KEY on status to satisfy the foreign key constraint's indexing requirement. After all, it is not the combination of record_status_id and status which uniquely identifies each row (making a primary key)

CREATE TABLE `record_status` (  
`record_status_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`status` char(6) NOT NULL,
`status_description` varchar(15) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Primary only on record_status_id
PRIMARY KEY (`record_status_id`),
-- Additional UNIQUE index on status
UNIQUE KEY (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1

About the design -- eliminating record_status_id...

Without knowing how the rest of your application currently uses record_status_id, I can't say for sure if it required by your application code. But, if you wish to make the actual status value easily available to other tables, and it is merely CHAR(6), it is possible that you actually have no need for record_status_id as an integer value. After all, if the status string is intended to be unique, then it is perfectly capable of serving as the PRIMARY KEY on its own, without any auto-increment integer key.

In that case, your record_status table would look like below, and your FOREIGN KEY constraint would correctly be added to users.

CREATE TABLE `record_status` (
-- Remove the auto_increment column!!
`status` char(6) NOT NULL,
`status_description` varchar(15) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Status is unique, and therefore can be the PK on its own
PRIMARY KEY (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1

Given this setup, here's a sample showing the successful creation of the tables and addition of the FK constraint.

You asked about performance implications of adding a status FK to other tables as well. It's tough to speculate on that without knowing the purpose, but if other tables share the same status values, then it makes sense to create their FK constraints to link to it in the same say you're doing with users. And if that's the case, I would recommend doing it the same way, wherein the status column is CHAR(6) (or consider changing all of them to VARCHAR(6)). The value of record_status.status still makes sense as the true primary key, and can be used as the FK in as many related tables as necessary.

In all but the most gigantic scale, there should be no appreciable performance difference between using an INT value and a CHAR(6)/VARCHAR(6) value as the foreign key. And the storage size difference between them is equally tiny. It isn't worth worrying about unless you must scale this to positively enormous proportions.

Select primary keys that do not have foreign keys or not have enough foreign keys

You can do this with a group by and having clause:

select char.id
from Charge AS char
left outer join Pay AS p on p.charge_id = char.id
group by char.id, char.confirmation_to_supply
having count(p.charge_id) < confirmation_to_supply

Where is the wrong logic in my Create Table statements?

A foreign key should be referencing the primary key of the table it is referring to.

So I think you want:

CREATE TABLE SITE (
NOC CHAR(3),
CITY VARCHAR2(100),
SEASON VARCHAR2(20),
YEAR CHAR(4),
CONSTRAINT site_pk PRIMARY KEY(NOC),
CONSTRAINT site_country_fk FOREIGN KEY(NOC) REFERENCES OLMP_COUNTRY(NOC)
);

I have no idea why you are repeating CITY in both tables, but the foreign key constraint should be to the primary key. You can look up the city using JOIN. It should not be repeated.



Related Topics



Leave a reply



Submit