Importance of Varchar Length in MySQL Table

Importance of varchar length in MySQL table

No, in the sense that if the values you're storing in that column are always (say) less than 50 characters, declaring the column as varchar(50) or varchar(200) has the same performance.

Best practices for SQL varchar column length

No DBMS I know of has any "optimization" that will make a VARCHAR with a 2^n length perform better than one with a max length that is not a power of 2.

I think early SQL Server versions actually treated a VARCHAR with length 255 differently than one with a higher maximum length. I don't know if this is still the case.

For almost all DBMS, the actual storage that is required is only determined by the number of characters you put into it, not the max length you define. So from a storage point of view (and most probably a performance one as well), it does not make any difference whether you declare a column as VARCHAR(100) or VARCHAR(500).

You should see the max length provided for a VARCHAR column as a kind of constraint (or business rule) rather than a technical/physical thing.

For PostgreSQL the best setup is to use text without a length restriction and a CHECK CONSTRAINT that limits the number of characters to whatever your business requires.

If that requirement changes, altering the check constraint is much faster than altering the table (because the table does not need to be re-written)

The same can be applied for Oracle and others - in Oracle it would be VARCHAR(4000) instead of text though.

I don't know if there is a physical storage difference between VARCHAR(max) and e.g. VARCHAR(500) in SQL Server. But apparently there is a performance impact when using varchar(max) as compared to varchar(8000).

See this link (posted by Erwin Brandstetter as a comment)

Edit 2013-09-22

Regarding bigown's comment:

In Postgres versions before 9.2 (which was not available when I wrote the initial answer) a change to the column definition did rewrite the whole table, see e.g. here. Since 9.2 this is no longer the case and a quick test confirmed that increasing the column size for a table with 1.2 million rows indeed only took 0.5 seconds.

For Oracle this seems to be true as well, judging by the time it takes to alter a big table's varchar column. But I could not find any reference for that.

For MySQL the manual says "In most cases, ALTER TABLE makes a temporary copy of the original table". And my own tests confirm that: running an ALTER TABLE on a table with 1.2 million rows (the same as in my test with Postgres) to increase the size of a column took 1.5 minutes. In MySQL however you can not use the "workaround" to use a check constraint to limit the number of characters in a column.

For SQL Server I could not find a clear statement on this but the execution time to increase the size of a varchar column (again the 1.2 million rows table from above) indicates that no rewrite takes place.

Edit 2017-01-24

Seems I was (at least partially) wrong about SQL Server. See this answer from Aaron Bertrand that shows that the declared length of a nvarchar or varchar columns makes a huge difference for the performance.

MySQL VARCHAR size?

100 characters.

This is the var (variable) in varchar: you only store what you enter (and an extra 2 bytes to store length upto 65535)

If it was char(200) then you'd always store 200 characters, padded with 100 spaces

See the docs: "The CHAR and VARCHAR Types"

Is there any downside to giving extra size/length to your database columns?

It takes more time and more disk transfers to load larger data items into memory. Defining large maximum sizes for columns increases the size of table rows. For many DBMS servers, table rows are the items transferred. So defining columns that are too fat does slow things down.

This effect is minimal for VARCHAR items. But VARCHAR is quite a bit slower than data types like integers. Eight byte integers take four times as much time to transfer as two byte integers. So, if a database is being designed for ultimate performance, limiting data columns to the range actually required will speed things up.The extent of this effect depends on whether the disk channel is a bottleneck or not.

Another possible bottleneck is the channel that links the server with the client, often a network channel. Bottlenecking in this channel can be reduced by queries that don't ask for data that will never be used, but there's a trade off here between asking for data only when you need it and making too many round trips.

There's also a trade off between designing for optimal performance and over designing in the anticipation of changing requirements.

MySQL: Fields length. Does it really matter?

INT length does nothing. All INTs are 4 bytes. The number you can set, is only used for zerofill (and who uses that!?).

VARCHAR length does more. It's the maxlength of the field. VARCHAR is saved so that only the actual data is stored, so the length doesn't mattter. These days, you can have bigger VARCHARs than 255 bytes (being 256^2-1). The difference is the bytes that are used for the field length. VARCHAR(100) and VARCHAR(8) and VARCHAR(255) use 1 byte to save the field length. VARCHAR(1000) uses 2.

Hope that helps =)

edit

I almost always make my VARCHARs 250 long. Actual length should be checked in the app anyway. For bigger fields I use TEXT (and those are stored differently, so can be much much longer).

edit

I don't know how current this is, but it used to help me (understand): http://help.scibit.com/Mascon/masconMySQL_Field_Types.html

Does VARCHAR size limit matter?

In general, for a VARCHAR field, the amount of data stored in each field determines its footprint on the disk rather than the maximum size (unlike a CHAR field which always has the same footprint).

There is an upper limit on the total data stored within all fields of an index of 900 bytes (900 byte index size limit in character length).

The larger you make the field, the more likely people will try to use for purposes other than what you intended - and the greater the screen real-estate required to show the value - so its good practice to try to pick the right size, rather than assuming that if you make it as large as possible it will save you having to revisit the design.

Optimizing MySQL table for variety of VarChar length between 20 and 4,000 characters

MySQL would do fine as would most every RDBMS for that matter. When you specify a field as type CHAR() the number of characters is always used regardless of how many characters are in your string. For instance: If you have Char(64) field and you insert 'ABCD' then the field is still 64 bytes (assuming non-unicode).

When using VARCHAR(), however, the cell only uses as many bytes as are in the string, plus the number of bytes necessary to store the size of the string. So: If you have VARCHAR(64) and insert 'ABCD' you will only use 5 bytes. 4 for the characters 'ABCD' and one for the number of characters '4'.

Your extremely varying string lengths are exactly the reason we have VARCHAR(), so feel free to use VARCHAR(4500) and rest assured you will only be using as much space as necessary to store the characters in the string, and a little bit extra for the length.

Somewhat related: This is why it's not a great idea to use VARCHAR() for fields that don't have varying length strings being inserted. You are wasting space storing the size of the string when it's already known. For instance, telephone numbers of the form x-xxx-xxx-xxxx should just use Char(14) since it will always take up 14 characters and only 14 bytes are necessary. If you used VARCHAR(14) instead you would actually end up using 15 bytes.

Should VARCHAR columns be put at the end of table definitions in MySQL?

Asking that question about "MySQL" is not helpful, as MySQL relegates storage to storage engines, and they implement storage in very different ways. It makes sense to ask this question for any individual storage engine.

In the MEMORY engine, variable length data types do not exist. A VARCHAR is silently changed into a CHAR. In the context of your question: It does not matter where in a table definition you put your VARCHAR.

In the MyISAM engine, if a table has no variable length data whatsoever (VARCHAR, VARBINARY or any TEXT or BLOB type) it is of the FIXED variant of MyISAM, that is, records have a fixed byte length. This can have performance implications, especially if data is deleted and inserted repeatedly (i.e. the table is not append only). As soon as any variable length data type is part of a table definition it becomes the DYNAMIC variant of MyISAM, and MyISAM internally changes any but the shortest CHAR type internally to VARCHAR. Again, position and even definition of CHAR/VARCHAR do not matter.

In the InnoDB engine, data is stored in pages of 16 KB size. A page has a page footer with a checksum, and a page header, with among other things a page directory. The page directory contains for each row the offset of that row relative to the beginning of the page. A page also contains free space, and all I/O is done in pages.

Hence InnoDB can, as long as there is free space in a page, grow VARCHAR in place, and move rows around inside a page, without incurring any additional I/O. Also, since all rows are being addressed as (pagenumber, page directory entry), movement of a row inside a page is localized to the page and not visible from the outside.

It also means that for InnoDB too, the order of columns inside a row does not matter at all.

These are the three storage engines that are most commonly used with MySQL, and order of columns does not matter for any of these three. It may be that other, more exotic storage engines exist for which this is not true.



Related Topics



Leave a reply



Submit