Polymorphism in SQL Database Tables

Polymorphism in databases

The OO structure of your data

You have identified several classes in your Character population, that are derived from the abstract role , namely Attack, Defense and Support. Each kind of role has different attributes depending on the class.

So you have clearly an OOP design in your mind and want to implement it in a database. Several design patterns could be used :

  • The easiest seems to be the single table inheritance puts all the fields in a single table. These are used/interpreted depending on the concrete role.
  • The class table inheritance puts the data related to each role (and the character itself) in a distinct table. This requires a 1:1 relation between the derived class' table (e.g. Defense) and the parent class table (here Character). This seems an overkill here
  • The concrete table inheritance merges the parent classes with the most derived classes, so you'd end up with a table per role, each having its own name field. Again, this seems an overkill here.

Classes or relations ?

There is another additional model that you could consider. It's a component like design, based on composition (in the SQL schema on relations):

  • You would have one character table with an id, name and role
  • You would have a property table with the character's id, a property-id (or name) and a value.

This could be advised if you want to be very flexible and creative and invent additional properties (e.g "has weapon A", "has weapon B", "armor strength", etc.). However if you intend to stick relatively closely to the current properties, this would be overkill again.

No-SQL

If you'd like to consider a non relational database, typically a No-SQL database, then you could consider document based databases which are perfectly suited to handle structures similar to the single inheritance table.

If you opt however on component design, then key-value stores could also be a choice, but you'd still have to assemble the pieces. That's the cost of the extra flexibility ;-)

You said polymorphism ?

Polymorphism is rather on the behavior that is related to the class rather than the data that describes the objects. As it is not question of behavior here, I guess that you'd meant the handling of the different kind of data (so it's more about classes). Let me know if I'm wrong on this point.

You should however let the polymorphism in the question, because it could help other people who are less aware of OOP terminology to find solutions to similar problems

Polymorphism in SQL database tables?

Right, the problem is you want only one object of one sub-type to reference any given row of the parent class. Starting from the example given by @Jay S, try this:

create table media_types (
media_type int primary key,
media_name varchar(20)
);
insert into media_types (media_type, media_name) values
(2, 'TV series'),
(3, 'movie');

create table media (
media_id int not null,
media_type not null,
name varchar(100),
description text,
url varchar(255),
primary key (media_id),
unique key (media_id, media_type),
foreign key (media_type)
references media_types (media_type)
);

create table tv_series (
media_id int primary key,
media_type int check (media_type = 2),
season int,
episode int,
airing date,
foreign key (media_id, media_type)
references media (media_id, media_type)
);

create table movies (
media_id int primary key,
media_type int check (media_type = 3),
release_date date,
budget numeric(9,2),
foreign key (media_id, media_type)
references media (media_id, media_type)
);

This is an example of the disjoint subtypes mentioned by @mike g.


Re comments by @Countably Infinite and @Peter:

INSERT to two tables would require two insert statements. But that's also true in SQL any time you have child tables. It's an ordinary thing to do.

UPDATE may require two statements, but some brands of RDBMS support multi-table UPDATE with JOIN syntax, so you can do it in one statement.

When querying data, you can do it simply by querying the media table if you only need information about the common columns:

SELECT name, url FROM media WHERE media_id = ?

If you know you are querying a movie, you can get movie-specific information with a single join:

SELECT m.name, v.release_date
FROM media AS m
INNER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?

If you want information for a given media entry, and you don't know what type it is, you'd have to join to all your subtype tables, knowing that only one such subtype table will match:

SELECT m.name, t.episode, v.release_date
FROM media AS m
LEFT OUTER JOIN tv_series AS t USING (media_id)
LEFT OUTER JOIN movies AS v USING (media_id)
WHERE m.media_id = ?

If the given media is a movie,then all columns in t.* will be NULL.

Polymorphic relationships vs separate tables per type

What is "table bloat"? Are you concerned about having too many tables? Many real-world databases I've worked on have between 100 and 200 tables, because that's what it takes.

If you're concerned with adding multiple tables, then why do you have separate tables for User, Appointment, and Task? If you had a multi-valued attribute for User, for example for multiple phone numbers per user, would you create a separate table for phones, or would you try to combine them all into the user table somehow? Or have a polymorphic "things that belong to other things" table for user phones, appointment invitees, and task milestones?

Answer: No, you'd create a Phone table, and use it to reference only the User table. If Appointments have invitees, that gets its own table (probably a many-to-many between appointments and users). If tasks have milestones, that gets its own table too.

The correct thing to do is to model your database tables like you would model object types in your application. You might like to read a book like SQL and Relational Theory: How to Write Accurate SQL Code 3rd Edition by C. J. Date to learn more about how tables are analogous to types.

You already know instinctively that the fact that you can't create a foreign key is a red flag. A foreign key must reference exactly one parent table. This should be a clue that it's not valid relational database design to make a polymorphic foreign key. Once you start thinking of tables and their attributes as concrete types (like described in SQL and Relational Theory), this will become obvious.

If you must create one notes table, you could make it reference one table called "Notable" which is like a superclass of User, Appointment, and Task. Then each of those three tables would also reference a primary key of Notable. This mimics the object-oriented structure of polymorphism, where you can have a class Note have a reference to an object by its superclass type.

But IMHO, that's more complex than it needs to be. I would just create separate tables for UserNotes, AppointmentNotes, and TaskNotes. I'm not troubled by having three more tables, and it makes your code more clear and maintainable.

Polymorphic Associations in SQL

Since Movies and Series have only one column that is different (boxoffice), it makes sense to put the two tables together and let boxoffice be NULL for series. I don't understand the need for Production, but it seems to add as much complexity as it saves. Try to get rid of it.

I suggest that the id for Country be the standard 2-letter codes. This will be more compact and eliminate some stuff.

The is no good reason to have an id for a connection table. See many-to-many tips . Those tips will speed up many of your queries, and save space.

"Polymorphic" and many other neat-things-in-a-textbook don't necessarily work well in Relational Databases.

How do you deal with polymorphism in a database?

There are generally three ways of mapping object inheritance to database tables.

You can make one big table with all the fields from all the objects with a special field for the type. This is fast but wastes space, although modern databases save space by not storing empty fields. And if you're only looking for all users in the table, with every type of person in it things can get slow. Not all or-mappers support this.

You can make different tables for all the different child classes with all of the tables containing the base-class fields. This is ok from a performance perspective. But not from a maintenance perspective. Every time your base-class changes all the tables change.

You can also make a table per class like you suggested. This way you need joins to get all the data. So it's less performant. I think it's the cleanest solution.

What you want to use depends of course on your situation. None of the solutions is perfect so you have to weigh the pros and cons.

Polymorphic Association - Is it always bad?

This all depends on your criteria for determining which solution is better.

The easier solution is to create a single polymorphic association table. This would be initially quicker for the programmer to produce. It fits in with well documented patterns; particularly in some of the MVC type frameworks prevalent today. However, this is actually the more complex solution as there is now a single table that encompasses many concepts. Also, correctness of the data cannot be guaranteed as referential integrity cannot be implemented.

The simpler solution is to create a single table for each relationship so that there is a single table that encompasses a single concept. It allows the use of referential integrity to guarantee correctness and quality of the data and is quicker for the maintainer of the software in assessing the impact of changes to the system. However, it is the harder solution as it requires many more tables to be created initially.

It is now down to you to make a subjective choice as to whether you want to pick the easier or simpler solution.

How to implement polymorphic associations in an existing database

You could use Option 1 but without an additional surrogate Alternate Key.

Instead, extend the existing Primary Key (of each entity), with an EntityType column (say CHAR(1), that would be E for Events, P for Persons, D for Products).

The compound (EntityId, EntityType) will become then the Primary Key of table Entity and the corresponding compounds in the other 3 subtype tables.

(The EntityType is just an auxilary, reference table, with 3 rows):

Polymorphic_Associations



Related Topics



Leave a reply



Submit