How Do Null Values Affect Performance in a Database Search

How do NULL values affect performance in a database search?

In Oracle, NULL values are not indexed, i. e. this query:

SELECT  *
FROM table
WHERE column IS NULL

will always use full table scan since index doesn't cover the values you need.

More than that, this query:

SELECT  column
FROM table
ORDER BY
column

will also use full table scan and sort for same reason.

If your values don't intrinsically allow NULL's, then mark the column as NOT NULL.

Query Performance with NULL

SQL Server indexes NULL values, so this will most probably just use the Index Seek over an index on QuickPickOrder, both for filtering and for ordering.

Why should I avoid NULL values in a SQL database?

The NULL question is not simple... Every professional has a personal opinion about it.

Relational theory Two-Valued Logic (2VL: TRUE and FALSE) rejects NULL, and Chris Date is one of the most enemies of NULLs. But Ted Codd, instead, accepted Three-Valued Logic too (TRUE, FALSE and UNKNOWN).

Just a few things to note for Oracle:

  1. Single column B*Tree Indexes don't contain NULL entries. So the Optimizer can't use an Index if you code "WHERE XXX IS NULL".

  2. Oracle considers a NULL the same as an empty string, so:

    WHERE SOME_FIELD = NULL

    is the same as:

    WHERE SOME_FIELD = ''

Moreover, with NULLs you must pay attention in your queries, because every compare with NULL returns NULL.
And, sometimes, NULLs are insidious. Think for a moment to a WHERE condition like the following:

WHERE SOME_FIELD NOT IN (SELECT C FROM SOME_TABLE)

If the subquery returns one or more NULLs, you get the empty recordset!

These are the very first few cases that I want to talk about. But we can speak about NULLs for a lot of time...

What affect does null in a database have?

In the databases I'm aware of, a NULL value doesn't consume any more space than a non-NULL one, it still has to allow for a maximum size. However, the fact that a column is NULLable may consume extra storage.

That's because, in addition to the possible values, you also have to store the fact that the column is NULL or not for each row. However, that's pretty efficient in terms of storage.

For a NULLable column, the extra time taken for a query would be minuscule at most.

You should not really care whether it takes more time. The design of a database is driven by the requirements of the data, not how fast it runs. I don't mean you should ignore performance altogether but, if your column needs to store NULL values, then it needs to, regardless of the performance hit.

Does setting NOT NULL on a column in postgresql increase performance?

It's always a good ideal to keep columns from being NULL if you can avoid it, because the semantics of using are so messy; see What is the deal with NULLs? for good a discussion of how those can get you into trouble.

In versions of PostgreSQL up to 8.2, the software didn't know how to do comparisons on the most common type index (the b-tree) in a way that would include finding NULL values in them. In the relevant bit of documentation on index types, you can see that described as "but note that IS NULL is not equivalent to = and is not indexable". The effective downside to this is that if you specify a query that requires including NULL values, the planner might not be able to satisfy it using the obvious index for that case. As a simple example, if you have an ORDER BY statement that could be accelerated with an index, but your query needs to return NULL values too, the optimizer can't use that index because the result will be missing any NULL data--and therefore be incomplete and useless. The optimizer knows this, and instead will do an unindexed scan of the table instead, which can be very expensive.

PostgreSQL improved this in 8.3, "an IS NULL condition on an index column can be used with a B-tree index". So the situations where you can be burned by trying to index something with NULL values have been reduced. But since NULL semantics are still really painful and you might run into a situation where even the 8.3 planner doesn't do what you expect because of them, you should still use NOT NULL whenever possible to lower your chances of running into a badly optimized query.

NULL in MySQL (Performance & Storage)

It depends on which storage engine you use.

In MyISAM format, each row header contains a bitfield with one bit for each column to encode NULL state. A column that is NULL still takes up space, so NULL's don't reduce storage. See https://dev.mysql.com/doc/internals/en/myisam-introduction.html

In InnoDB, each column has a "field start offset" in the row header, which is one or two bytes per column. The high bit in that field start offset is on if the column is NULL. In that case, the column doesn't need to be stored at all. So if you have a lot of NULL's your storage should be significantly reduced.
See https://dev.mysql.com/doc/internals/en/innodb-field-contents.html

EDIT:

The NULL bits are part of the row headers, you don't choose to add them.

The only way I can imagine NULLs improving performance is that in InnoDB, a page of data may fit more rows if the rows contain NULLs. So your InnoDB buffers may be more effective.

But I would be very surprised if this provides a significant performance advantage in practice. Worrying about the effect NULLs have on performance is in the realm of micro-optimization. You should focus your attention elsewhere, in areas that give greater bang for the buck. For example adding well-chosen indexes or increasing database cache allocation.



Related Topics



Leave a reply



Submit