Why Is the Foreign Key Part of the Primary Key in an Identifying Relationship

Why is the foreign key part of the primary key in an identifying relationship?

Logically, this would be an identifying relationship, since a form field cannot exist without a form.

No, identifying relationship is about identification, not existence.

Any X:Y relationship where X >= 1 guarantees existence of the left side, whether identifying or not. In your case, a 1:N relationship guarantees existence of form for any given form_field. You could make it identifying or non-identifying and it would still guarantee the same.

Remarks:

  • You would model an identifying relationship by making form_field.form_id part of a key. For example form_field PK could look like: {form_id, label}, which BTW would be quite beneficial for proper clustering of your data (InnoDB tables are always clustered).
  • Just making a PK: {id, form_id} would be incorrect, since this superkey is not a candidate key (i.e. it is not minimal - we could remove form_id from it and still retain the uniqueness).
  • You would model a 0..1:N relationship by making the form_field.form_id NULL-able (but then you wouldn't be able to make it identifying as well - see below).

There are two definitions of the "identifying relationship":

  • Strict definition: A relationship that migrates parent key into child primary key1.
  • Loose definition: A relationship that migrates parent key into child key.

In other words, the loose definition allows migration into alternate key as well (and not just primary).

Most tools2 seem to use the strict definition though, so if you mark the relationship as identifying, that will automatically make the migrated attributes part of the child PK, and none of the PK attributes can be NULL.


1 Which is then either completely comprised from migrated attributes, or is a combination of migrated attributes and some additional attributes.

2 ERwin and Visio do. I haven't used MySQL Workbench for modeling yet, but your description seems to suggest it behaves the same.

Why is it necessary to indicate identifying or non-identifying relationships in an ERD?

A convention used in ER model diagrams is that referencing (foreign key) attributes are not shown at all unless they are part of a primary key. If referencing attributes are required they are supposed to be implied by the existence of a relationship line. Accordingly, there is no standard or generally agreed ER notation for foreign key attributes even when they are part of a primary key. The case where referencing attributes are needed in order to identify instances of an entity is often called out on ER diagrams by using a dotted relationship line. The motivation here is presumably that "primary" key attributes are deemed to be mandatory and significant so their dependence on other things is also significant.

If your diagram shows foreign key attributes in some other way then the distinction between identfiying/non-identifying relationships is unimportant in my view. Whatever notation you use, ultimately what matters is that your audience understands your diagram correctly.

Foreign key on identifying relationships

First, let me express my opinion that the already existent design of the information schema and the sys schema are very elaborate, in-depth and intricate compared to your question which, if we forget for a moment that its CONTENT concerns metadata, is a very simple case of table design. Thus, I doubt you need to make another schema.

To answer your question: I would create an id column on Tables_Metadata, put the PK on id alone, and add a check constraint unique (schema_name,name).

Then, The Columns_Metadata table need only have two fields: Name, and Tables_Metadata_id FK to Tables_Metadata.id

What's the difference between identifying and non-identifying relationships?

  • An identifying relationship is when the existence of a row in a child table depends on a row in a parent table. This may be confusing because it's common practice these days to create a pseudokey for a child table, but not make the foreign key to the parent part of the child's primary key. Formally, the "right" way to do this is to make the foreign key part of the child's primary key. But the logical relationship is that the child cannot exist without the parent.

    Example: A Person has one or more phone numbers. If they had just one phone number, we could simply store it in a column of Person. Since we want to support multiple phone numbers, we make a second table PhoneNumbers, whose primary key includes the person_id referencing the Person table.

    We may think of the phone number(s) as belonging to a person, even though they are modeled as attributes of a separate table. This is a strong clue that this is an identifying relationship (even if we don't literally include person_id in the primary key of PhoneNumbers).

  • A non-identifying relationship is when the primary key attributes of the parent must not become primary key attributes of the child. A good example of this is a lookup table, such as a foreign key on Person.state referencing the primary key of States.state. Person is a child table with respect to States. But a row in Person is not identified by its state attribute. I.e. state is not part of the primary key of Person.

    A non-identifying relationship can be optional or mandatory, which means the foreign key column allows NULL or disallows NULL, respectively.


See also my answer to Still Confused About Identifying vs. Non-Identifying Relationships

Expressing identifying and non-identifying relationships in MySQL

One thing that I realized the next day is that the order in which I declare my primary key matters. (moveID, genID) will sort by moveID first, then genID, whereas (genID, moveID) would sort the other way. Since I mentioned that I wanted the behavior of the latter case in my original post (picking out all move's in a given generation), as opposed to the former case, I felt that I should point out.

Identifying vs Non-Identifying Relationships (Again!!!)

You are over-thinking the linkage between optionality and identity. Until the whole thing comes more naturally to you, it's best to think of them as being completely unrelated.

About optionality, it is important to remember that the optionality is directional. To use your example of employee_equipment: Sure, employees don't need equipment. The one-to-many relationship from employee to employee_equipment is optional. At the same time, looking at it from the opposite perspective, the relationship is mandatory. You can't have a record in employee_equipment unless there is an employee to associate it with.

Identity has nothing to do with optionality, except coincidentally an identifying relationship is mandatory from the child to the parent. Whether it is also mandatory from the parent to the child is neither here nor there as far as identity is concerned.

What makes a relationship identifying is that you have to know what parent you are talking about (as well as some other things) in order to know what child you are talking about. That is, the primary key of the child must include a foreign key to the parent.

Pure intersection tables (e.g. employee_equipment) are good examples of this. The primary key of a pure intersection is the combination of the foreign keys to both parent tables. Note that some people may also add a surrogate key to these kinds of tables. It doesn't matter so much from an identity perspective if there are multiple candidate keys. What is important in determining identity is whether the foreign key is part of a candidate key, whether or not that candidate key happens to be the primary key.

Another good example would be something like a database's metadata catalog, where a column is identified by the table to which it belongs, just as the table is identified by the schema it is in, and so on. Knowing that a column is called NAME doesn't tell you which column it is. Knowing that it is the NAME column in the CUSTOMER table helps. (You'll also have to know which schema CUSTOMER is in, and so forth).

Still Confused About Identifying vs. Non-Identifying Relationships

The technical definition of an identifying relationship is that a child's foreign key is part of its primary key.

CREATE TABLE AuthoredBook (
author_id INT NOT NULL,
book_id INT NOT NULL,
PRIMARY KEY (author_id, book_id),
FOREIGN KEY (author_id) REFERENCES Authors(author_id),
FOREIGN KEY (book_id) REFERENCES Books(book_id)
);

See? book_id is a foreign key, but it's also one of the columns in the primary key. So this table has an identifying relationship with the referenced table Books. Likewise it has an identifying relationship with Authors.

A comment on a YouTube video has an identifying relationship with the respective video. The video_id should be part of the primary key of the Comments table.

CREATE TABLE Comments (
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
PRIMARY KEY (video_id, user_id, comment_dt),
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

It may be hard to understand this because it's such common practice these days to use only a serial surrogate key instead of a compound primary key:

CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

This can obscure cases where the tables have an identifying relationship.

I would not consider SSN to represent an identifying relationship. Some people exist but do not have an SSN. Other people may file to get a new SSN. So the SSN is really just an attribute, not part of the person's primary key.


Re comment from @Niels:

So if we use a surrogate key instead of a compound primary key, there is no notable difference between use identifying or non-identifying relationship ?

I suppose so. I hesitate to say yes, because we haven't changed the logical relationship between the tables by using a surrogate key. That is, you still can't make a Comment without referencing an existing Video. But that just means video_id must be NOT NULL. And the logical aspect is, to me, really the point about identifying relationships.

But there's a physical aspect of identifying relationships as well. And that's the fact that the foreign key column is part of the primary key (the primary key is not necessarily a composite key, it could be a single column which is both the primary key of Comments as well as the foreign key to the Videos table, but that would mean you can store only one comment per video).

Identifying relationships seem to be important only for the sake of entity-relationship diagramming, and this comes up in GUI data modeling tools.

Does using Identifying Many-To-One Relationship means having to have duplicate PRIMARY and FOREIGN keys on same table?

I don't see this happening if I understand your question. In the below, price_history does not have a db engine auto-created index that duplicates.

create table A
( `product_id` INT NOT NULL,
`category_id` INT NOT NULL,
`priceitem_id` INT NOT NULL,
PRIMARY KEY (`product_id`, `category_id`, `priceitem_id`)

)ENGINE=INNODB;

CREATE TABLE IF NOT EXISTS `price_history` (
`amount` DECIMAL NULL,
`date_start` DATE NULL,
`date_end` DATE NULL,
`product_id` INT NOT NULL,
`category_id` INT NOT NULL,
`priceitem_id` INT NOT NULL,
PRIMARY KEY (`product_id`, `category_id`, `priceitem_id`),
FOREIGN KEY `f` (`product_id`, `category_id`, `priceitem_id`) references A(`product_id`, `category_id`, `priceitem_id`)
)ENGINE = InnoDB;

show create table price_history;
CREATE TABLE `price_history` (
`amount` decimal(10,0) DEFAULT NULL,
`date_start` date DEFAULT NULL,
`date_end` date DEFAULT NULL,
`product_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`priceitem_id` int(11) NOT NULL,
PRIMARY KEY (`product_id`,`category_id`,`priceitem_id`),
CONSTRAINT `price_history_ibfk_1` FOREIGN KEY (`product_id`, `category_id`, `priceitem_id`)
REFERENCES `a` (`product_id`, `category_id`, `priceitem_id`)
) ENGINE=InnoDB;

CREATE TABLE IF NOT EXISTS `price_history` (
`amount` DECIMAL NULL,
`date_start` DATE NULL,
`date_end` DATE NULL,
`product_id` INT NOT NULL,
`category_id` INT NOT NULL,
`priceitem_id` INT NOT NULL,
PRIMARY KEY (`product_id`, `category_id`, `priceitem_id`),
FOREIGN KEY `f` (`product_id`, `category_id`, `priceitem_id`) references A(`product_id`, `category_id`, `priceitem_id`)
)ENGINE = InnoDB;

show create table A;
CREATE TABLE `a` (
`product_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`priceitem_id` int(11) NOT NULL,
PRIMARY KEY (`product_id`,`category_id`,`priceitem_id`)
) ENGINE=InnoDB;

CREATE TABLE IF NOT EXISTS `price_historyBBB` (
id int auto_increment primary key,
`amount` DECIMAL NULL,
`date_start` DATE NULL,
`date_end` DATE NULL,
`product_id` INT NOT NULL,
`category_id` INT NOT NULL,
`priceitem_id` INT NOT NULL,
FOREIGN KEY `g` (`product_id`, `category_id`, `priceitem_id`) references A(`product_id`, `category_id`, `priceitem_id`)
)ENGINE = InnoDB;

show create table price_historyBBB;
CREATE TABLE `price_historybbb` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`amount` decimal(10,0) DEFAULT NULL,
`date_start` date DEFAULT NULL,
`date_end` date DEFAULT NULL,
`product_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`priceitem_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `g` (`product_id`,`category_id`,`priceitem_id`),
CONSTRAINT `price_historybbb_ibfk_1` FOREIGN KEY (`product_id`, `category_id`, `priceitem_id`)
REFERENCES `a` (`product_id`, `category_id`, `priceitem_id`)
) ENGINE=InnoDB;

show indexes from price_history;
show indexes from price_historyBBB;

So, if there exists an adequate key (say, composite) left-most chunk adequate enough for re-use, then the db engine will not auto-create a Helper index.

For instance, if you had a key (PK or otherwise) that was a composite of (col1,col2,col3,col5) and your FK called for the use of (col1,col2), then a new index is not auto-gen'd.

If the need for the FK was for (colX,col1,col2) then the above (col1,col2,col3,col5) is not useful (left-most priority) and the db engine will need to create an FK helper index.



Related Topics



Leave a reply



Submit