When to Use an Enum or a Small Table in a Relational Database

When to use an enum or a small table in a relational database?

Hate to answer a question with a question, but it depends. How often do you expect the values to change, and how often do you release code?

Enum types will require a code change. A pure database table will be much easier to change. Enum types are more convenient for coding.

For infrequent releases, or if you often have new/deleted/changed values, use a database table. For static sets of values, or if you release code all the time, use an enum.

SQL: Advantages of an ENUM vs. a one-to-many relationship?

A disadvantage of using something like an ENUM is that you can't get a list of all the available values if they don't happen to exist in your data table, unless you hard-code the list of available values somewhere. For example, if in your OFFICERS table you don't happen to have an MG on post there's no way to know the rank exists. Thus, when BG Blowhard is relieved by MG Marjorie-Banks you'll have no way to enter the new officer's rank - which is a shame, as he is the very model of a modern Major General. :-) And what happens when a General of the Army (five-star general) shows up?

For simple types which will not change I've used domains successfully. For example, in one of my databases I've got a yes_no_domain defined as follows:

CREATE DOMAIN yes_no_dom
AS character(1)
DEFAULT 'N'::bpchar
NOT NULL
CONSTRAINT yes_no_dom_check
CHECK ((VALUE = ANY (ARRAY['Y'::bpchar, 'N'::bpchar])));

Share and enjoy.

Postgresql enum what are the advantages and disadvantages?

The advantages of enums are:

  • Performance is better. You can just display what you get out of the core table instead of either having a separate lookup table that translates a code to a value or having app logic that translates a code to a value. This can be especially useful in datawarehouse applications.
  • Ad hoc SQL is easier to write

The disadvantages are:

  • Encoding display values into your database ddl is bad form. If you translate the enum value into a different display value in your app code, then you lost a lot of the advantages of using enums.
  • Adding values requires DDL changes
  • Makes language localization difficult
  • Database portability is decreased

Using an enum in Java as an alternative to a small database

I can only agree with the comments so far. A list of countries may change but it will always be small enough to keep the whole list in memory. Any difference in performance doesn't really matter if it is only queried once and then cached.

It probably depends on who is using the software (personal use or a large company) but I would say the biggest concern here is #2, not #3. In code vs. in database should be decided based on who might has to change the values later. Are you really ok that you need a developer to change a value, recompile everything and roll out a new version or patch of the software? Just updating a text file or the database will be much easier.

If it's a personal project that argument probably doesn't matter for you, but even then a text file (e.g. in CSV format) might be the easier to maintain than code. I would still recommend to use a simple database - just for the learning experience. You don't need to setup a huge database system like MySQL. You could use a small embeddable database like h2 or Apache Derby.


Just for reference how often country codes and names can change:

ISO 3611-1 (which defines the language and country codes for existing countries) has been changed 17 times in the years 2007-2014. That's about 2.4 times per year.

What are the implications of storing enum values as string in relational database(SQLServer)

  1. I suppose that there is a finite amount of Status values, so validation in DB can be done using the CHECK constraint. Foreign Key constraint does pretty much the same but using values from the referenced table and locks on them.

    ALTER TABLE Products
    ADD CONSTRAINT CH_PRODUCTS_STATUS CHECK
    (
    Status = 'Production' OR Status = 'Service'
    )
    GO

  2. Yes. More storage anyway. Int value will take 4 bytes, while text value 'Services' will take 8 bytes in varchar datatype and 16 bytes in nvarchar. There is a page compression option in Enterprise Edition (in Standard also since SQL Server 2016 SP1) https://msdn.microsoft.com/en-us/library/cc280464.aspx, As you see her and as far as I understand this compression algorithm, the difference between int and varchar/nvarchar compression will be the header size, so if you have a small finite options for the Status value, compression will give you pretty equal results. I have created two tables (clustered index) with the same structure but with deferent types for the Status column, and insert 100K rows in each of them.

    CREATE TABLE Products
    (
    Id int IDENTITY PRIMARY KEY,
    Product varchar(200) NOT NULL,
    Status varchar(50) NOT NULL,
    Date datetime NOT NULL
    )
    GO

    CREATE TABLE Products2
    (
    Id int IDENTITY PRIMARY KEY,
    Product varchar(200) NOT NULL,
    Status int NOT NULL,
    Date datetime NOT NULL
    )
    GO

    INSERT INTO Products VALUES ('5656', 'Service', GETDATE())
    GO 100000

    INSERT INTO Products2 VALUES ('5656', 1, GETDATE())
    GO 100000

Here is some statistics for them:

Products Table Size: 543 pages (4.2 MB)
Products2 Table Size: 482 pages (3.7 MB)
Products Compressed (page compression) Table Size: 173 pages
Products2 Compressed (page compression) Table Size: 173 pages

Note: varchar datatype used in this demo, nvarchar will require twice more capacity than varchar.


  1. The same size consideration as it was in the previous question. Here is the script for indexes creation:

    CREATE INDEX IX_PRODUCTS_STATUS
    ON Products (Status)
    GO

    CREATE INDEX IX_PRODUCTS2_STATUS
    ON Products2 (Status)
    GO

Products Index Size: 260 pages
Products2 Index Size: 174 pages (3.7 MB)
Products Index (page compression) Table Size: 93 pages
Products2 Index(page compression) Table Size: 93 pages


  1. Here is the main point of consideration, why I have described previous two points with a lot of details. Storing strings in the same table require more resources:

    • Disk space
    • Processor time and disk IO for reading data from the disk
    • RAM for reading and cache the data

Available RAM can be a good indicator for storing the status in the Products table or move it to the Statuses table because reading from RAM is still much faster than reading from Disk. The size difference can be calculated using the length of the values of the status column. If size difference is significant and SQL Server will not be able to hold active data in RAM, he will read data from disk and flush out other, probably important, data from RAM. In this case, it will make sense to move statuses values to the separate table or enable compression (if this option is available).

But at the same time, if you will have separate Statuses table, probably it will make sense to create a foreign key from Products to Statuses table. In this case, data modification operations will be slow a bit by the lookup to the Statuses table. Also, you will have +1 join, and small CPU overhead for the physical join operation (and usually cabe disregarded).

Should I store data in db or should I store it in enum?

I think it is better store it in a database's table.

If you store it in a enum, when you add, change or delete a fruit you have to compile your code again. In case of change, you will have to update all places (tables columns) where this fruit's name is store.

In the database you could opt to save it as a single column table or a two columns table (id, fruit_name).

I will opt the second option. Because if you choose the single column option, and you change a fruit name you will have to change it in all the tables where the fruit name is store (like in enum). A disadvantage with the two columns table is that you will have to make joins with the fruit table to get the fruit name (but make this joins is easy and simple).

Best way to store constants whether enums or database tables keeping internationalization in mind

While this Question is likely too broad for Stack Overflow, a few thoughts.

Enums

You may not fully understand the enum facility in Java. See the Oracle Tutorial, and see the Enum class doc.

An enum is a class, a regular Java class, a subclass of Enum. The only thing special is that syntactic sugar that automatically instantiates the static instances you define and name. Otherwise, they are normal classe:

  • Your enums can carry member variables.
  • Your enums can have constructors, and you can pass arguments to those constructors.
  • Your enums can offer other methods, and you can pass arguments to those methods.
  • You can even pass instances of one enum as arguments to methods of another enum’s instances – just as you might pass instances of an enum to instances of other non-enum classes. Each enum instance is just an object, plain and simple, saved as a static reference on the enum-defining class.

Example:

public enum Food { HERBIVORE, OMNIVORE, CARNIVORE ; }  // Syntactic sugar for automatically instantiating these named static instances of this class type.

…and…

public enum Animal {
RABBIT( Food.HERBIVORE ) ,
DOG( Food.OMNIVORE ) ,
CAT( Food.CARNIVORE ) ;

// Member variables.
public Food eats ;

// Constructor
Animal( Food foodType ) {
this.eats = foodType ; // Assign an instance of another enum to this instance of this enum.
}

}

Limitations of enums

While more powerful and useful than in other languages, there are limitations.

Compile-time

Firstly, enums are defined at compile-time. If your values change at runtime, perhaps you want to add or delete items, then enums are not appropriate.

Permanently in memory

Also, enums are static. This means when first used, all the objects of that enum class are instantiated immediately and held in memory throughout the execution of your app. So they are never retired from memory until program ends. So having an enormous number of them might be a burden on memory.

Understand that your can collect enum instances. See the EnumSet and EnumMap classes for fast-to-execute and low-memory usage collections of enum instances. Search Stack Overflow for much coverage on this topic. And be aware that every enum carries a values() method that returns an array of its values, yet this method is mysteriously not listed in the JavaDoc.

As for your mention inheritance, your enums by definition are subclasses of Enum class. So they cannot inherit from any other class you may have in mind, as Java does not support multiple-inheritance. Your enums can implement one or more interfaces. In later version of Java, an inheritance can carry implementation code by way of new default methods, so you can pass along some code that way.

Internationalization

Internationalization and localization seems to be an orthogonal issue. You can add a method on your enum to generate localized String representation of their value. As an example, see DayOfWeek::getDisplayName and Month::getDisplayName enum methods.

Database

If you want to dynamically define your values at runtime, or you have zillions of them, then a database is the way to go. A serious database such as Postgres is designed to manage memory, handle concurrency, and execute efficiently.

You can even combine enums with the database. For example, localization. You might have enum values defined at compile-time, but their getDisplayName method does a lookup into a database to find the French or Arabic translation. That translation value in the database can be updated during runtime by running SQL INSERT or UPDATE commands via JDBC.

Recursive hierarchical relationships

If you are trying to represent relationships of a hierarchy of arbitrary depth, that is a whole other topic I'll not address here other than to say that is often implemented with recursion. Search Stack Overflow and other sources to learn more.

MySQL enum vs. set

As the MySQL documentation states:

Definition of a ENUM or SET column does act as a constraint on values
entered into the column. An error occurs for values that do not
satisfy these conditions:

An ENUM value must be one of those listed in the column definition, or
the internal numeric equivalent thereof. The value cannot be the error
value (that is, 0 or the empty string). For a column defined as
ENUM('a','b','c'), values such as '', 'd', or 'ax' are illegal and are
rejected.

A SET value must be the empty string or a value consisting only of the
values listed in the column definition separated by commas. For a
column defined as SET('a','b','c'), values such as 'd' or 'a,b,c,d'
are illegal and are rejected.



Related Topics



Leave a reply



Submit