Why Historically Do People Use 255 Not 256 for Database Field Magnitudes

Why historically do people use 255 not 256 for database field magnitudes?

With a maximum length of 255 characters, the DBMS can choose to use a single byte to indicate the length of the data in the field. If the limit were 256 or greater, two bytes would be needed.

A value of length zero is certainly valid for varchar data (unless constrained otherwise). Most systems treat such an empty string as distinct from NULL, but some systems (notably Oracle) treat an empty string identically to NULL. For systems where an empty string is not NULL, an additional bit somewhere in the row would be needed to indicate whether the value should be considered NULL or not.

As you note, this is a historical optimisation and is probably not relevant to most systems today.

Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?

Historically, 255 characters has often been the maximum length of a VARCHAR in some DBMSes, and it sometimes still winds up being the effective maximum if you want to use UTF-8 and have the column indexed (because of index length limitations).

Why do database schemas often contain 32, 64, 128, etc

With varchar columns, the length is stored with the data using unsigned integers in the leading bytes of the data. The fewest number of bytes is used; one byte can store lengths from 0 to 255, two bytes from 0 to 65535, etc. By making the length 255, you get the "most value" out of the minimum one length byte.

In days gone by, single bytes of disk saved per row were worth saving. Although now disk is cheap, the thinking has remained, especially by grey-haired DBAs.

There is no advantage in choosing a length that is a power of 2, for example varchar(64) - it is merely a habit/convention (I even follow it - and I don't know why!).

What's new for no more varchar(255)

What "VARCHAR(255) rule" are you referring to?

Each database vendor is free to implement VARCHAR however they want to. The rules (and guidelines) for VARCHAR isn't necessarily going to be the same for every databsase.

As far as the SQL standard, I haven't really looked into it. It might be pretty loose, so all the VARCHAR implementations are all found to comply with the standard. If the SQL standard for VARCHAR is really strict, then DBMS vendors may either extend the standard, or just may not be compliant. I don't think the actual standard matters all the much. What matters is the actual rules enforced by the DBMS.

As far as a general guideline, specify a VARCHAR length long enough to support the system requirements. If the requirement of the system is to allow no more than 200 characters, then I'd specify the length as VARCHAR(200).

As another general guideline, don't define VARCHAR lengths that that are larger than they need to be. VARCHAR columns declared longer than necessary can have an impact on resources and performance.

Oracle limit for VARCHAR length is 4000 characters. (In previous versions of Oracle, the maximum was 2000. If you need more than 4000 characters, then you could use CLOB datatype.

SQL Server limit to VARCHAR(8000), unless you specify VARCHAR(MAX) which allows a maximum size (in bytes) of 2^32-1.

MySQL has limit of 65,535 for maximum row length limit. So that effectively limits size of VARCHAR to VARCHAR(21844), if using a multibyte characterset like utf8. With a single byte characterset (like latin1), the maximum would be VARCHAR(65532). If you need more characters than that, or you run into the limit on the maximum row length, you could use TEXT datatype instead of VARCHAR.

Most DBMS VARCHAR implementations store a "length" field for a VARCHAR column, along with the value. the length is stored as an integer.

In some DBMS, if the maximum length (in bytes) of a VARCHAR column doesn't exceed 255 bytes, the length field can be implemented as a single byte integer. If the column allows more than 255 bytes, then the length field has to be larger than a single byte.

With dynamic row formats, in terms of row storage, storing 10 characters in a column, it doesn't really matter if the column is defined as VARCHAR(30) or VARCHAR(1000). With fixed row formats, the space for the maximum length of the column will be reserved. The format for row storags is going to depend on the DBMS, and in some cases (MySQL) on the storage engine and the specified row format.

Yes, it's 2016. And we've come a long way since the introduction of the first commercial relational database system.

The database is only one part of the system. There may be limits in the application or other software components. (If the application is written in C, and the application is defining a structure with a byte array for the field, the limit on the size there is going to be important. Increasing the size allowed in the database won't automatically fix the application.

There could also be length limits/restrictions in Javascript code, or in HTML elements of a web page. Or there can be limitations of other software components. For example, some of the really old SQL Server ODBC drivers have a limit of 255 characters (bytes?) for both CHAR and VARCHAR columns.

So the length of a VARCHAR in the database is only part of the story.

With all of that said, I'm still not clear what you mean, when you ask

Can we break the VARCHAR(255) rule?

I'm wondering what "rule" you are referring to. In most every database I'm aware of, it's possible to define VARCHAR columns much longer than 255 bytes, or 255 characters. And doing that doesn't break any rule.

Are there disadvantages to using a generic varchar(255) for all text-based fields?

In storage, VARCHAR(255) is smart enough to store only the length you need on a given row, unlike CHAR(255) which would always store 255 characters.

But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: as rows are copied from the storage engine layer to the SQL layer, VARCHAR fields are converted to CHAR to gain the advantage of working with fixed-width rows. So the strings in memory become padded out to the maximum length of your declared VARCHAR column.

When your query implicitly generates a temporary table, for instance while sorting or GROUP BY, this can use a lot of memory. If you use a lot of VARCHAR(255) fields for data that doesn't need to be that long, this can make the temporary table very large.

You may also like to know that this "padding out" behavior means that a string declared with the utf8 character set pads out to three bytes per character even for strings you store with single-byte content (e.g. ascii or latin1 characters). And likewise utf8mb4 character set causes the string to pad out to four bytes per character in memory.

So a VARCHAR(255) in utf8 storing a short string like "No opinion" takes 11 bytes on disk (ten lower-charset characters, plus one byte for length) but it takes 765 bytes in memory, and thus in temp tables or sorted results.

I have helped MySQL users who unknowingly created 1.5GB temp tables frequently and filled up their disk space. They had lots of VARCHAR(255) columns that in practice stored very short strings.

It's best to define the column based on the type of data that you intend to store. It has benefits to enforce application-related constraints, as other folks have mentioned. But it has the physical benefits to avoid the memory waste I described above.

It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR that is certainly longer than any address. And 255 is customary because it is the maximum length of a VARCHAR for which the length can be encoded with one byte. It was also the maximum VARCHAR length in MySQL older than 5.0.

Should I round my database field sizes to a the nearest multiple of a base 2 number?

In MySQL, the length should really always be 255 or 65,535 (unless there are type-specific reasons for choosing a different length). There are two different ways of storing character strings. For lengths up to 255, the length is stored in one byte rather than two, saving a byte of storage.

In a varchar, the length is the maximum length. Values are stored on the page based on their actual length. So, the maximum length just doesn't affect the storage of anything else, with the exception of 1- or 2-byte lengths (depending on whether the maximum is <= 255 or >= 256). (The length being a power of two -- with the exception of 256 -- has no affect on the storage.)

As for setting lengths as powers of two. I am guilty of this on many occasions. It is an old habit borne of wanting to keep the fields aligned on byte boundaries. The idea was to keep fields aligned on 4- or 8- byte boundaries, because this is more optimal for the CPU (think "C" programming language). This either prevented unnecessary space when an integer or floating point value required 4- or 8- byte alignment (so some bytes would be missed) or unnecessary overhead to copy bytes from unaligned space to aligned space. Of course, as I just noted, this logic has no basis for databases, because the maximum length does not effect the actual storage on the page.

Another reason why this has no significance is that the varchar type actually stores one or two bytes more than the length. The database takes care of the conversion from the physical format on the page to the physical format in memory. Trying to "optimize" this process is way more effort than it is worth.

VARCHARS: 2, 4, 8, 16, etc.? Or 1, 3, 7, 15, etc.?

In other words, if I have a number that I know will never be above 12, should I just go ahead and use a VARCHAR(15) or VARCHAR(16) instead?

No! Use varchar(12) (or maybe even char(12) if the length is fairly constant ).

Once upon a time the varchar type was limited to 255 characters on some systems (including MySql prior to 5.0.3) because the first byte stored indicated the length of the field. Given this restriction, devs wanting to allow a reasonable amount of text would choose 255 rather than going to a different data type altogether.

But if you know the size of your data, definitely use exactly that size for the database.

WHAT is the meaning of Leading Length?

The CHAR() datatype pads the string with characters. So, for 'ORATABLE', it looks like:

'ORATABLE            '
 12345678901234567890

The "leading length" are two bytes at the beginning that specify the length of the string. Two bytes are needed because one byte is not enough. Two bytes allow lengths up to 65,535 units; one byte would only allow lengths up to 255.

The important point both CHAR() and VARCHAR2() use the same internal format, so there is little reason to sue CHAR(). Personally, I would only use it for fixed-length codes, such as ISO country codes or US social security numbers.

Why Historically Do People Use 255 Not 256 for Database Field Magnitudes