Space Used by Nulls in Database

Space used by nulls in database

Storing a NULL in a column does not specifically cost or save space. For fixed-length data, the entire space is still reserved.

On the other hand, variable-length data requires only the length of the data plus overhead to store the actual length. For example, a VARCHAR(n) will use 2 bytes of data to indicate that actual length, so the storage space required is always n+2.

Additionally, it should be mentioned that if SET ANSI_PADDING ON, a char(n) with a NULL value will behave as a VARCHAR(n).

No matter what, you will not recognize space "savings" from storing NULLs when using SQL Server 2000 or SQL Server 2005. SQL Server 2008 introduces the concept of a sparse column, which can yield savings for columns that are primarily NULL.

How much size Null value takes in SQL Server

If the field is fixed width storing NULL takes the same space as any other value - the width of the field.

If the field is variable width the NULL value takes up no space.

In addition to the space required to store a null value there is also an overhead for having a nullable column. For each row one bit is used per nullable column to mark whether the value for that column is null or not. This is true whether the column is fixed or variable length.


The reason for the discrepancies that you have observed in information from other sources:

  • The start of the first article is a bit misleading. The article is not talking about the cost of storing a NULL value, but the cost of having the ability to store a NULL (i.e the cost of making a column nullable). It's true that it costs something in storage space to make a column nullable, but once you have done that it takes less space to store a NULL than it takes to store a value (for variable width columns).

  • The second link seems to be a question about Microsoft Access. I don't know the details of how Access stores NULLs but I wouldn't be surprised if it is different to SQL Server.

Do null values save storage space?

For fixed width fields like nullable int the storage space required is always the same regardless of whether the value is null or not.

For variable width nullable fields the value ’NULL‘ takes zero bytes of storage space (ignoring the bit to store whether the value is null or not).

How much disk-space is needed to store a NULL value using postgresql DB?

Null columns are not stored. The row has a bitmap at the start and one bit per column that indicates which ones are null or non-null. The bitmap could be omitted if all columns are non-null in a row. So for any given row with one or more nulls, the size added to it would be that of the bitmap(N bits for an N-column table, rounded up).

More in depth discussion from the docs here

How to calculate storage space used by NULL value?

  • For fixed length types such as int NULL, it always use the length of the type (ie 4 bytes for int whether it is set to NULL or NOT NULL).

  • For variable length types, it takes 0 bytes to store the NULL + 2 bytes in the variable length columns offset list. This is used to record where each variable length value is really stored in the row on the page.

In addition, the NULL or NOT NULL flag uses 1 bit for each columns. A table with 12 columns will use 12/8 bytes (=2 bytes NULL bitmap).

This link will give you a lot more information on the subject

Once you know the percentage of NULL, you can look at this link for an estimate of the potential gain. Sparse saves space on null value but will requieres more space for not null values.

NULL storage in Oracle

The NULLs in your table may consume as little as 1.75% of the storage space.

But that number is meaningless, even though it's based on the reproducible test case below. It's more important to understand that NULLs are tiny (just one byte). So tiny that the "real" size should be irrelevant except in extreme cases. So tiny that it is almost always a waste of time to worry about alternate representations.


The best case test case (space usage in practice)

Let's create 1GB of data using your table definition. First, let's create the table.

create table test1(
COL1 NUMBER,
COL2 NUMBER,
COL3 NUMBER,
COL4 NUMBER,
COL5 NUMBER,
COL6 NUMBER,
COL7 VARCHAR2(255),
COL8 DATE,
COL9 DATE,
COL10 DATE,
COL11 VARCHAR2(255)
) pctfree 0 /* Let's assume no updates or deletes, and pack the data tightly */;

Now create one gigabyte of data. Each value uses the largest-possible value for that data type.

begin
for i in 1 .. 15 loop --Magic number to generate exactly 1GB.
insert into test1
select
.0123456789012345678901234567890123456789,
.0123456789012345678901234567890123456789,
.0123456789012345678901234567890123456789,
.0123456789012345678901234567890123456789,
.0123456789012345678901234567890123456789,
.0123456789012345678901234567890123456789,
lpad('A', 255, 'A'),
sysdate,
sysdate,
sysdate,
lpad('A', 255, 'A')
from dual
connect by level <= 95000; --Magic number to generate exactly 1GB.
commit;
end loop;
end;
/

These queries show that it uses 1GB of space for 1,425,000 rows.

select count(*) from test1;
select bytes/1024/1024/1024 gb from user_segments where segment_name = 'TEST1';

Now create a second table, with the same number of rows, but a NULL in every column.

create table test1_null as
select col1+null c1, col2+null c2, col3+null c3, col4+null c4, col5+null c5, col6+null c6,
cast(null as varchar2(255)) c7, col8+null c8, col9+null c9, col10+null c10,
cast(null as varchar2(255)) c11
from test1;

The new segment size is only 0.0175GB, or 1.75%.

select bytes/1024/1024/1024 gb from user_segments where segment_name = 'TEST1_NULL';

Why that test case is misleading

While this may sound like a simple question, to completely answer it would require either an entire book or a crystal ball. Getting real storage sizes is ridiculously complicated. You'll need to think about at least these issues:

  1. Variable width data. Most Oracle data types will only use the amount of space required to store the data. So the percent of storage used for that NULL byte depends on precisely what is in the other columns. Only a few data types use a static amount of storage regardless of data, such as CHAR, NCHAR, DATE, TIMESTAMP, etc.
  2. Trailing nulls. All consecutive NULLs at the end of the row are stored in one byte. Unless basic compression is enabled, then every NULL uses a byte again.
  3. Row overhead. Every row has overhead that depend on the columns and configuration. The skinnier the table, the more the row overhead uses up the space, so the percent used by a NULL will fluctuate.
  4. Block overhead. This depends on the number of rows, settings like PCTFREE, if previous rows were deleted, when the table was last re-organized, block size, etc.
  5. Segment overhead. Space is allocated as chunks of extents. Extent management can use a default algorithm (which I think allocates in chunks of 1MB up to 64MB) or it can be any custom value. This overhead becomes less relevant depending on the amount of data. It's possible a tablespace is set to a huge uniform extent size, such as 10GB, which will probably waste a lot of space regardless of the column values.
  6. Other I/O overhead. Space is probably also wasted by ASM, the operating system, the SAN, etc.

Format of a Row Piece (space usage in theory)

The image below is from the Logical Storage Structures chapter of the Concepts Guide:

Sample Image

The Column Data consists of a series of Column Lengths and Column Values. If the value is NULL, the Column Length is set to 0 and the Column Value does not use any space. This is why a NULL always uses just 1 byte, for the number 0.

Most data types are variable so the length will use at least 1 byte and the value will use at least 1 byte if it's non-NULL. Static data types, like DATE, will still use 1 byte for the length and then 7 bytes for the value. Again, unless the date is NULL, then the length is set to 0 and the value is empty.

This image may also explain the "trailing NULLs" storage trick. When there are trailing nulls, Oracle probably sets the Number of Columns lower, leaves the last Column Length as 0, and infers that the remaining columns are also NULL.

Alternative Representations?

Now I'm getting suspicious. Asking about alternative representations of NULLs brings to mind four kinds of people:

  1. Hopelessly theoretical people who complain about violating the relational model and propose using obscure tools instead of the ones that have been working fine for decades.
  2. Data architects who think a ginormous Entity-Attribute-Value table is always the answer. "Hey, it looks good on my PDF, who cares if it's impossible to query?"
  3. Those who are a bit new to SQL and rightfully frustrated with the way NULLs work.
  4. Stackoverflow users who read too much into questions. (So feel free to add information on the background behind this question if I'm way off!)

Yeah, NULLs are a bit weird. But it will make sense soon. Don't worry too much about the space, or ways to completely avoid NULLs. The price you're paying for NULLs is nothing compared to the price you'd pay for anti-patterns that completely avoid them.

Disk space implications of setting MySQL column value to NULL instead of 0 or ''

No, using NULL will not take up less space than an empty VARCHAR or INT field. In fact it may take up more space. Here's why:

A VARCHAR is stored as a size + value. The number of bytes used for the size depends on the max storage of the VARCHAR. VARCHAR(255) requires one byte, VARCHAR(65536) requires two bytes and so on.

So that VARCHAR(255) column takes up one byte even if you store an empty string. The following table would take a minimum of one byte per row (plus some other possible overhead depending on storage engine).

CREATE TABLE sample (
a VARCHAR(255) NOT NULL
);

To save NULL values, MySQL uses a bitmask for each row. Up to 8 nullable columns can be stored per byte. So if you have a table like this:

CREATE TABLE sample (
a VARCHAR(255) NULL
);

Would take a minimum of two bytes per row. Storing NULL just sets the bit, it's already reserved whether you use it or not. The byte for the VARCHAR's size is still used for each row even if the column is set to NULL.

Is disk space consumed when storing null data?

In SQLite's record format, exactly one byte is need to specify that a value is NULL.

If you normalize your database so that you have one record per non-NULL value, you save those NULL bytes, but you have additional overhead for the non-NULL values.
SQLite's minimum record overhead is:

  • two bytes cell pointer,
  • at least one byte payload length,
  • several bytes for the rowid, and
  • for your case, several bytes for the foreign key that points back to the original record.

Additionally, storing non-NULL values in a separate table might require an index to get efficient lookups.

Effect of NULL values on storage in SQL Server?

This depends on database engine as well as column type.

At least SQLite stores each null column as a "null type" which takes up NO additional space (each record is serialized to a single blob for storage so there is no space reserved for a non-null value in this case). With optimizations like this a NULL value has very little overhead to store. (SQLite also has optimizations for the values 0 and 1 -- the designers of databases aren't playing about!) See 2.1 Record Format for the details.

Now, things can get much more complex, especially with updating and potential index fragmentation. For instance, in SQL Server space may be reserved for the column data, depending upon the type. For instance, a int null will still reserve space for the integer (as well as have an "is null" flag somewhere), however varchar(100) null doesn't seem to reserve the space (this last bit is from memory, so be warned!).

Happy coding.



Related Topics



Leave a reply



Submit