Any Disadvantages to Bit Flags in Database Columns

Any disadvantages to bit flags in database columns?

If you only have a handful of roles, you don't even save any storage space in PostgreSQL. An integer column uses 4 bytes, a bigint 8 bytes. Both may require alignment padding:

  • Making sense of Postgres row sizes
  • Calculating and saving space in PostgreSQL

A boolean column uses 1 byte. Effectively, you can fit four or more boolean columns for one integer column, eight or more for a bigint.

Also take into account that NULL values only use one bit (simplified) in the NULL bitmap.

Individual columns are easier to read and index. Others have commented on that already.

You could still utilize indexes on expressions or partial indexes to circumvent problems with indexes ("non-sargable"). Generalized statements like:

database cannot use indexes on a query like this

or

These conditions are non-SARGable!

are not entirely true - maybe for some others RDBMS lacking these features.

But why circumvent when you can avoid the problem altogether?

As you have clarified, we are talking about 6 distinct types (maybe more). Go with individual boolean columns. You'll probably even save space compared to one bigint. Space requirement seems immaterial in this case.


If these flags were mutually exclusive, you could use one column of type enum or a small look-up table and a foreign key referencing it. (Ruled out in question update.)

Advantages / disadvantages of using a bitwise enum?

The difference is that Flags (bitwise enum) can include more than one State at once.

If you have a Rating system you could use Flags

[Flags] public enum Rating {
Normal = 0, // 00000000
Great = 1, // 00000001
Super = 2, // 00000010
Mega = 4, // 00000100
Legendary = 8 // 00001000
}

Flags checking for bits. Take attention to this fact.

Normal Enums dont take care about this effect. Each Enum is unique

public enum State{
Normal = 0,
Great = 1,
Super = 2,
Mega = 3,
Legendary = 4
}

Now you can check for one State only, not for multiple States.

In your case for UserRoles you could use the Flags, because the Admin will have the permission of the Userand the Moderator and so on.

[Flags] public enum Role{
None = 0, // 00000000
View = 1, // 00000001
Write = 2, // 00000010
Execute = 4, // 00000100
}

You can define the permutation in the class definition as

[Flags] public enum Role{
None = 0, // 00000000
View = 1, // 00000001
Write = 2, // 00000010
Execute = 4, // 00000100
ViewWrite = (View | Write) // 00000011
ViewExecute = (View | Execute) // 00000101
WriteExecute = (Write | Exectue) // 00000110
All = (View | Write | Exectue) // 00000111
}

or in the code:

Role role = Role.View | Role.Write;

Here you can find a great article about what to use.

How to handle a few dozen flags in a database

There are several common solutions:

  • EAV

    Store one flag per row in a child table, with a reference to the user row, the name of the flag, and the value. Disadvantages: Can't guarantee a row exists for each flag. Need to define another lookup table for flag names. Reconstituting a User record with all its flags is a very costly query (requires a join per flag).

  • Bit field

    Store one flag per bit in a single long binary column. Use bitmasking in application code to interpret the flags. Disadvantages: Artificial limit on number of flags. Hard to drop a flag when it becomes obsolete. Harder to change flag values, search for specific flag values, or aggregate based on flag values without resorting to confusing bitwise operators.

  • Normalized design

    Store one BIT column per flag, all in the Users table. Most "correct" design from the perspective of relational theory and normalization. Disadvantages: Adding a flag requires ALTER TABLE ADD COLUMN. Also, you might exceed the number of columns or row size supported by your brand of RDBMS.

What's the optimal way to store binary flags / boolean values in each database engine?

This answer is for ISO/IEC/ANSI Standard SQL, and includes the better freeware pretend-SQLs.

First problem is you have identified two Categories, not one, so they cannot be reasonably compared.

A. Category One

(1) (4) and (5) contain multiple possible values and are one category. All can be easily and effectively used in the WHERE clause. They have the same storage so neither storage nor read performance is an issue. Therefore the remaining choice is simply based on the actual Datatype for the purpose of the column.

ENUM is non-standard; the better or standard method is to use a lookup table; then the values are visible in a table, not hidden, and can be enumerated by any report tool. The read performance of ENUM will suffer a small hit due to the internal processing.

B. Category Two

(2) and (3) are Two-Valued elements: True/False; Male/Female; Dead/Alive. That category is different to Category One. Its treatment both in your data model, and in each platform, is different. BOOLEAN is just a synonym for BIT, they are the same thing. Legally (SQL-wise) there are handled the same by all SQL-compliant platforms, and there is no problem using it in the WHERE clause.

The difference in performance depends on the platform. Sybase and DB2 pack up to 8 BITs into one byte (not that storage matters here), and map the power-of-two on the fly, so performance is really good. Oracle does different things in each version, and I have seen modellers use CHAR(1) instead of BIT, to overcome performance problems. MS was fine up to 2005 but they have broken it with 2008, as in the results are unpredictable; so the short answer may be to implement it as CHAR(1).

Of course, the assumption is that you do not do silly things such as pack 8 separate columns in to one TINYINT. Not only is that a serious Normalisation error, it is a nightmare for coders. Keep each column discrete and of the correct Datatype.

C. Multiple Indicator & Nullable Columns

This has nothing to do with, and is independent of, (A) and (B). What the columns correct Datatype is, is separate to how many you have and whether it is Nullable. Nullable means (usually) the column is optional. Essentially you have not completed the modelling or Normalisation exercise. The Functional Dependencies are ambiguous. if you complete the Normalisation exercise, there will be no Nullable columns, no optional columns; either they clearly exist for a particular relation, or they do not exist. That means using the ordinary Relational structure of Supertype-Subtypes.

Sure, that means more tables, but no Nulls. Enterpise DBMS have no problem with more tables or more joins, that is what they are optimised for. Normalised databases perform much better than unnormalised or denormalised ones, and they can be extended without "re-factoring'. You can ease the use by supplying a View for each Subtype.

If you want more information on this subject, look at this question/answer. If you need help with the modelling, please ask a new question. At your level of questioning, I would advise that you stick with 5NF.

D. Performance of Nulls

Separately, if performance is important to you, then exclude Nulls. Each Nullable column is stored as variable length; that requires additional processing for each row/column. The enterprise databases use a "deferred" handling for such rows, to allow the logging, etc to move thought the queues without impeding the fixed rows. In particular never use variable length columns (that includes Nullable columns) in an Index: that requires unpacking on every access.

E. Poll

Finally, I do not see the point in this question being a poll. It is fair enough that you will get technical answers, and even opinions, but polls are for popularity contests, and the technical ability of responders at SO covers a very range, so the most popular answers and the most technically correct answers are at two different ends of the spectrum.

What are the disadvantages of using a flags enum for permissions?

Three immediate disadvantages:

  • Flags can only contain as many items as there are available bits.
  • Querying from the database now gets a little more annoying. Well, only if you are using SQL manually (a join onto the roles table to determine membership reads a lot nicer).
  • When viewing the data not as flags, is anyone going to remember what a value of 1 in the fourth bit means?

Make life easy and go with a separate list. Is assigned to in a collection could very nicely boil down to myPermissions.Contains(new Permission("CanEdit")). You could then use various conversion routines to convert hardcoded values such as enums or strings into object representations of permissions to achieve myPermissions.Contains("CanEdit") etc.

This is not to say there are performance impacts in choosing flags over separate tables and vice versa, I've no idea what sort of usage you are looking at.

Why use Y/N instead of a bit field in Microsoft SQL Server?

I've seen this practice in older database schemas quite often. One advantage I've seen is that using CHAR(1) fields provides support for more than Y/N options, like "Yes", "No", "Maybe".

Other posters have mentioned that Oracle might have been used. The schema I referred to was in-fact deployed on Oracle and SQL Server. It limited the usage of data types to a common subset available on both platforms.

They did diverge in a few places between Oracle and SQL Server but for the most part they used a common schema between the databases to minimize the development work needed to support both DBs.

What are the disadvantages of using a flags enum for permissions?

Three immediate disadvantages:

  • Flags can only contain as many items as there are available bits.
  • Querying from the database now gets a little more annoying. Well, only if you are using SQL manually (a join onto the roles table to determine membership reads a lot nicer).
  • When viewing the data not as flags, is anyone going to remember what a value of 1 in the fourth bit means?

Make life easy and go with a separate list. Is assigned to in a collection could very nicely boil down to myPermissions.Contains(new Permission("CanEdit")). You could then use various conversion routines to convert hardcoded values such as enums or strings into object representations of permissions to achieve myPermissions.Contains("CanEdit") etc.

This is not to say there are performance impacts in choosing flags over separate tables and vice versa, I've no idea what sort of usage you are looking at.



Related Topics



Leave a reply



Submit