Why Does Varchar Need Length Specification

VARCHAR(x) - Does setting a length turn it into fixed-length in MySQL performance terms?

Obviously no, otherwise it wouldn't be called variable-length in the first place. What it simply does is put a cap on the maximum length of the strings that you can store inthat column. On other, more trustable database engines, strings which are longer than the max will produce an error, but MySQL just truncates excess data:

mysql> create table t (a varchar(3));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t values ('abc'), ('defgh');
Query OK, 2 rows affected, 1 warning (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 1

mysql> select * from t;
+------+
| a |
+------+
| abc |
| def |
+------+
2 rows in set (0.00 sec)

Why is 30 the default length for VARCHAR when using CAST?

Why don't you specify the varchar length? ie:

SELECT CAST('the quick brown fox jumped over the lazy dog' AS VARCHAR(45))

As far as why 30, that's the default length in SQL Server for that type.

From char and varchar (Transact-SQL):

When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified when using the CAST and CONVERT functions, the default length is 30.

Why specify a length for character varying types

My understanding is that having constraints is useful for data integrity, therefore I use column sizes to both validate the data items at the lower layer, and to better describe the data model.

Some links on the matter:

  • VARCHAR(n) Considered Harmful
  • CHAR(x) vs. VARCHAR(x) vs. VARCHAR vs. TEXT
  • In Defense of varchar(x)

varchar Fields - Is a Power of Two More Efficient?

No.

In some other uses, there are some advantages to use structures with a power of two size, mostly because you can fit a nice (power of two) number of these inside another power-of-two-sized structure. But this doesn't apply to a DB fieldsize.

The only power-of-two-sizing related to VARCHARs is about the exact type of varchar (or TEXT/BLOB in some SQL dialects): if it's less than 256, it can use a single byte to indicate length. if it's less than 65536 (64KB), two bytes are enough, three bytes work up to 16777216 (16MB), four bytes go to 4294967296 (4GB).

Also, it can be argued that VARCHAR(50) is just as expensive as VARCHAR(255), since both will need n+1 bytes of storage.

Of course that's before thinking of Unicode...

Is there a good reason I see VARCHAR(255) used so often (as opposed to another length)?

Historically, 255 characters has often been the maximum length of a VARCHAR in some DBMSes, and it sometimes still winds up being the effective maximum if you want to use UTF-8 and have the column indexed (because of index length limitations).

What happens when you store a value in a VARCHAR, which is over the limit in SQL?

It depends on server configuration:

If strict SQL mode is not enabled and you assign a value to a CHAR or
VARCHAR column that exceeds the column's maximum length, the value is
truncated to fit and a warning is generated. For truncation of
nonspace characters, you can cause an error to occur (rather than a
warning) and suppress insertion of the value by using strict SQL mode.

More info here: The CHAR and VARCHAR Types

Why does casting text as varchar without specifying a length truncate the text at 30 characters?

Why is 30 the default length for VARCHAR when using CAST?

How does MySQL varchar know how many bytes indicate the length?

It happens at the time of definition. All length prefixes will be the same size in bytes for a particular VARCHAR column. The VARCHAR column will use 2 bytes or the VARCHAR column will use 1 byte, depending on the defined size in characters, and the character set.

All VARCHAR columns defined such that it might require more than 255 bytes use 2 bytes to store the size. MySQL isn't going to use 1 byte for some values in a column and 2 bytes for others.

MySQL documentation on CHAR and VARCHAR Types states this pretty clearly (emphasis mine):

A column uses one length byte if values require no more than 255
bytes, two length bytes if values may require more than 255 bytes.

If you declare a VARCHAR(255) column to use the utf8 character set, it's still going to use 2 bytes for the length prefix, not 1, since the length in bytes may be greater than 255 with utf8 characters.



Related Topics



Leave a reply



Submit