sql performance of a lookup table
Two reasons:
- You have a lookup table to remove data modification anomalies. That is, you can change data in one place only when lookup data changes. Now you have to compile and release
- RDBMS are designed to JOIN. An Enum is still a JOIN just in client code
Note:
You should not have one lookup table in the "One True Lookup table" (OTLT) anti-pattern. You store only one entity in a table.
- Common Lookup Tables
- OTLT and EAV: the two big design mistakes all beginners make
- Google search
(Added Dec 2011):
- How to ensure you have the right lookup value in the right table?
- You will have more than one DB client at some point, don't obfuscate the data with enums
On DBA.SE, there is no support for Enums or OTLTs:
- https://dba.stackexchange.com/q/6962/630
- https://dba.stackexchange.com/q/6987/630
How important are lookup tables?
The answer depends a little if you are limited to freeware such as PostGreSQL (not fully SQL compliant), or if you are thinking about SQL (ie. SQL compliant) and large databases.
In SQL compliant, Open Architecture databases, where there are many apps using one database, and many users using different report tools (not just the apps) to access the data, standards, normalisation, and open architecture requirements are important.
Despite the people who attempt to change the definition of "normalisation", etc. to suit their ever-changing purpose, Normalisation (the science) has not changed.
if you have data values such as {
Open; Closed; etc
} repeated in data tables, that is data duplication, a simple Normalisation error: if you those values change, you may have to update millions of rows, which is very limited design.Such values should be Normalised into a Reference or Lookup table, with a short
CHAR(2)
PK:O Open
C Closed
U [NotKnown]The data values {
Open;Closed;etc
} are no longer duplicated in the millions of rows. It also saves space.the second point is ease of change, if
Closed
were changed toExpired
, again, one row needs to be changed, and that is reflected in the entire database; whereas in the un-normalised files, millions of rows need to be changed.Adding new data values, eg. (
H,HalfOpen
) is then simply a matter of inserting one row.
in Open Architecture terms, the Lookup table is an ordinary table. It exists in the [SQL compliant] catalogue; as long as the
FOREIGN KEY
relation has been defined, the report tool can find that as well.ENUM
is a Non-SQL, do not use it. In SQL the "enum" is a Lookup table.The next point relates to the meaningfulness of the key.
- If the Key is meaningless to the user, fine, use an {
INT;BIGINT;GUID;etc
} or whatever is suitable; do not number them incrementally; allow "gaps". - But if the Key is meaningful to the user, do not use a meaningless number, use a meaningful Relational Key.
- If the Key is meaningless to the user, fine, use an {
Now some people will get in to tangents regarding the permanence of PKs. That is a separate point. Yes, of course, always use a stable value for a PK (not "immutable", because no such thing exists, and a system-generated key does not provide row uniqueness).
{
M,F
} are unlikely to changeif you have used {
0,1,2,4,6
}, well don't change it, why would you want to. Those values were supposed to be meaningless, remember, only a meaningful Key need to be changed.if you do use meaningful keys, use short alphabetic codes, that developers can readily understand (and infer the long description from). You will appreciate this only when you code
SELECT
and realise you do not have toJOIN
every Lookup table. Power users too, appreciate it.
Since PKs are stable, particularly in Lookup tables, you can safely code:
WHERE status_code = 'O' -- Open
You do not have to
JOIN
the Lookup table and obtain the data valueOpen
, as a developer, you are supposed to know what the Lookup PKs mean.
Last, if the database were large, and supported BI or DSS or OLAP functions in addition to OLTP (as properly Normalised databases can), then the Lookup table is actually a Dimension or Vector, in Dimension-Fact analyses. If it was not there, then it would have to be added in, to satisfy the requirements of that software, before such analyses can be mounted.
- If you do that to your database from the outset, you will not have to upgrade it (and the code) later.
Your Example
SQL is a low-level language, thus it is cumbersome, especially when it comes to JOINs
. That is what we have, so we need to just accept the encumbrance and deal with it. Your example code is fine. But simpler forms can do the same thing.
A report tool would generate:
SELECT p.*,
s.name
FROM posts p,
status s
WHERE p.status_id = s.status_id
AND p.status_id = 'O'
Another Exaple
For banking systems, where we use short codes which are meaningful (since they are meaningful, we do not change them with the seasons, we just add to them), given a Lookup table such as (carefully chosen, similar to ISO Country Codes):
Eq Equity
EqCS Equity/Common Share
OTC OverTheCounter
OF OTC/Future
Code such as this is common:
WHERE InstrumentTypeCode LIKE "Eq%"
And the users of the GUI would choose the value from a drop-down that displays
{Equity/Common Share;Over The Counter
},
not {Eq;OTC;OF
}, not {M;F;U
}.
Without a lookup table, you can't do that, either in the apps, or in the report tool.
When is it necessary to add a table in the database for lookup values
is is necessary...
I'd say that in this case it is best practice to have the lookup table.
if you have a simple active/inactive or Yes/No column you can just use a char(1) and a check constraint or a bit, but it is best to have a lookup table if you are representing anything more complex.
You can then use this table for the user form input (populating the select box, etc.)
this will flatten and shrink the column, which will allow more rows per page, and help cache memory usage of the main table.
SQL Server - Query Performance on Table Lookup
Not 100% percent sure about syntactical details, but something like this:
select table1.Name, ISNULL(table2.Test_Score, table1.Test_Score)
from
table1
left outer join table2
on table1.id = table2.Student_ID
AND table2.Test_Date = (
select max(x.Test_Date)
from table2 x
where table1.id = x.Student_ID
group by x.Student_ID)
If the subquery is not allowed where it is, move it to the where clause. (Sorry, i can't try it where i am now.)
The query only works if the Test_Date is unique. If not, you get repeated results. Then you should use a Group By
select table1.Name, min(ISNULL(table2.Test_Score, table1.Test_Score))
from
table1
left outer join table2
on table1.id = table2.Student_ID
AND table2.Test_Date = (
select max(x.Test_Date)
from table2 x
where table1.id = x.Student_ID
group by x.Student_ID)
group by table1.id, table1.Name
Related Topics
Join Versus Exists Performance
When Should You Consider Indexing Your SQL Tables
Recursive Cte Stop Condition for Loops
Cascade Copy a Row with All Child Rows and Their Child Rows, etc
Import CSV File Error:Column Value Containing Column Delimiter
SQL Server Copying Tables from One Database to Another
Convert Timescript to Date in Azure Cosmosdb SQL Query
Elasticsearch Map Two SQL Tables with a Foreign Key
How to Select Column Which Field Name Contains a Dot
Generating Xml File from SQL Server 2008
How to Make a Stored Procedure Return a "Dataset" Using a Parameter I Pass
Cascade Delete in Many-To-Many Self-Reference Table
SQL Server:Find Duplicates in a Table Based on Values in a Single Column
Sql: Parse Comma-Delimited String and Use as Join
Expression Engine SQL Query Entries List by Authors
What Is the Correct Syntax for Using Database.Executesqlcommand with Parameters
T-SQL - Left Outer Joins - Filters in the Where Clause Versus the on Clause